Reviewing Python Part 1

Python was the first programming language that I attempted to learn by myself, but it has been more than a year since I have used it consistently, so I am going to try to quickly go through the basics of python, numpy, and pandas to make machine learning code easier to understand

DOWNLOAD MARKDOWN

2 559

Reviewing Python with Learning Python, 5th ed., by Mark Lutz

Python Core Data Types

Core datatype are built into the language
Numbers
- integers, floating point numbers, complex numbers, decimals, rationals, and sets
- The python math module contains more advanced numeric tools as functions
Strings
- text information and arbitrary collections of bytes
- They are a sequence - a positionally ordered collection of other objects
- Strings are sequences of one character strings
- Strings are immutable - they cannot be changed in place after they are not created (every string operation below produces a new string)
- Encode and decode strings with the encode() and decode() methods
Lists
- Lists are the most general sequence provided by the language
- Lists are positionally ordered collections of arbitrarily types objects, and they have no fixed size
- They are mutable
- Python includes operation known as list comprehension expression which are good for processing structures like a matrix
- List coprehensions, when wrapped in parethenses, return generators that can be called with the next() function on demand to produce the next result
- referencing a non existent index in a list throws an error
Dictionaries
- Python dictionaries are mappings. Mappings are collections of other objects that store objects by key instead of relative position
- Dictionaries are mutable
- Fetching a nonexistent key in a dictionary throws an error
Tuples
- The tuple object is roughly like a list that cannot be changed. Tuples are sequences like lists, but they are immutable, like strings.
- Functionally, they are used to represent fixed collections of items: the components of a specific calender date, for instance
- Why use tuples? Primarily for their immutability which provides consistency
Files
- Python code's main interface to external files on your computer
- The second argument to the built in open function is called the processing mode, which can be
  - w: write (create file if doesn't exist)
  - r: read
  - a: append
  - wb: write binary
  - rb: read binary
- A file's content are always string in the stript, regardless of the type of data it contains
- The best way to read a file today is to not read it at all - files provide an iterator that automatically reads line by line in for loops and other contexts
- You can specify the keyword argument encoding with the built in open function to read / write with specific encoding
Sets
- Unordered collections of unique and immutable objects
- Can create a set using the built in set() function or by using {}
Other Core Types: Booleans, types None
- Fixed Precision Floating Point Numbers
- Fractions
- Booleans bool
- None
- type() object
Program Unit Types
Implementation-related types (Compiled Code, stack tracebacks)

General

Every Object in python is classified as either immutable or not. In terms of the core types, numbers, strings, and tuples are immutable; lists, dictionariesm and sets are not
Python has a feature called garbage collection that cleans up unused memory as your program runs and frees you from having to manage your details in code
An object is iterable if it is either a physically stored sequence in memory, or an object that generated one item at a time in the context of an iteration operation - s ort of "virtual" sequence
- Both of these types of objects are considered iterable because they support the iteration protocol - they respond to the iter call with an object that advacnes in response to next calls and raises an exception when finished producing values
In python, we code to object interfaces, bot to types. That is, we care about what an object does, not what it is

Definitions

Polymorphism - the meaning of an operation depends on the objects being operated on

Numeric Types

Objects

Integers and Floating point objects
- Integers are written as strings of decimal digits. Floating point numbers have a decimal point and/or an optrional signed exponent introduced by an e or E followed by an optional sign
- Integer types: normal (32 bit), and long (unlimited precision, long integers may end in and l or L to force it top become a long integer) All integers are long integers in Python 3.X
- Hexidecimal, octal, and binary literals
  - Integres may be coded in decimal (base 10), hexidecimal (base 16), octal (base 8), or binary (base 2)
  - Hexidecimals start with 0x followed by a string of hexidecimal digits (0-9,A-F)
  - Octals start with 0o followed by a string of octal digits (0-7)
  - Binary literals are written 0b
Complex Number objects
- Python complex literals are written real + imaginaryj
- Complex numbers can also be created with the built in complex(real,img) call
Decimal: fixed precision objects
Fraction: rational number objects
Sets: collections with numeric operations
Booleans: true and false
Built in functions and modules: round, math, rnadom
Expressions; unlimited integer precision; bitwise operations; hex, octal, and binary functions
Third party extensions: vectors, libraries, visualization, plotting, etc.

Expressions:

Expression operators:
- +,-,*,/,>>,**,&,etc.
- yield x: Generator function send protocol
- lambda args: expression: Anonymous function generation
- x if y else z: Ternary selection (x is evaluated only if y is true)
- x or y: Logical OR
- x and y: Logical AND
- not x: Logical negation
- x in y, x not in y: Membership (iterables, sets)
- x is y, x is not y: Object identity tests
- x < y, x<= y, x>= y, x >y: Magnitude comparisons
- x==y,x!=y: value equality comparisons
- x | y: bitwise OR, set union
- x^y: Bitwise XOR, set symmetric difference
- x & y: bitwise AND, set intersection
- x << y, x >> y: shift x left or right by y bits
- x + y: addition, concatenation
- x - y: subtraction, set difference
- x * y : multiplication, repetition
- x % y: remainder, format
- x / y, x//y: division, true and floor
- -x, +x: negation, identity
- x**y: power
- x[i]: indexing (sequence, mapping, other)
- x[i:j:k]: slicing
- x(...): call (function, method, class, other callable)
- x.attr: Attribute reference
- (...): Tuple, expression, generator expression
- [...]: List, list comprehension
- {...}: Dictionary, set, set and dictionary comprehension

How does python know which operation to perform first? The answer is operator precedence. Python groups expressions with one or more operators according to precedence rules, and this grouping determins the order in which expression's parts are completed

Operators low in the list above have a higher precedence
Ex: X + Y * X => Python performs multiplication before addition because multiplication has higher precedence
You can forget about precedence completely if you are careful about parentheses

Mixed types are converted to the more complicated operand, which is readonable. This appies generally only to numeric types

All python operators can be overloaded (implemented) by Python classes and C extension types to work on objects you create
Floating point number comparisons sometimes require massaging to become meaningful

Divsion

Python 3 always performs true division with the / operator (keeps floating point)
// == floor divison -> always rounds down versus towards 0 (think negative numbers)

Built In Functions

pow and abs
math module
- math.pi and math.e
- math.sqrt()
- min()
- max()
- sum()
- math.floor(), math.trunc(), round()
random module
- random.random() == Math.random()
- random.randint(1,10)
- random.choice(['a','b','c'])
- rnadom.shuffle(iterable)
Built in functions:
- pow,abs,ound,int,hex,bin,etc.
Utility modules
- random, math
  Decimals
- Can results in greater precision
- can set precision globally decimal.getcontext().prec = 2
- you can use the context manager to use different precisions throughout the code
  Fractions
- Declare the numerator and denominator in the constructor
  Sets
- To make a set object, pass in a sequence or other iterable object to the build it setr function -> returns a set object which contains all the items of the object passed in
- Sets can only contain immutable (hashable) object types -> lists and dictionaries cannot be embedded in sets, but tuples can be if you need to store compound values
- Uses:
  - Sets canb be used to filter duplicates out of other collections
  - Sets can be used to isolate differences in lists, strings, and other iterable objects (with trhe difference - operator)

c = set(a) - set(b)
"""
c only contains values that are in set a, but not in set b
"""

- Sets can be used to perform *order neutrality equality test* -> two sets are equal if every element in each set is in the other.

Booleans

Some would argue that booleans are numeric and nature given that they have two values - True or False (1 and 0)

Definitions:

Expression: a combination of numbers and operators that computes a value when executed by Python
Variables: names that are used to keep track of your information in a program

Dynamic Typing

Python's types are determined automatically at runtimem not in response to declarations in the code -> dynamic typing

A Variable is created when your code first assigns it a value. Future assignments change the value of the already created name.
A variable never has any type information or constraints associated with it. The notion of type lives within objects, not names.
When a variable appears in an expression, it is imeediately replaces with the object that it currently refers to, whatever that may be
Variables are entries in a system table, with spaces for links to objects
Objects are pieces of allocated memory, with enough space to represent the values for which they stand
References are automatically followed pointers from variables to objects
Variables don't carry types, objects do
Objects are garbage collected
- Whenever a name is assigned to a new object, the space held by the prior object is reclaimed if it is not referenced by any other name or object
- The automatic reclamation of objects' space is known as garbage collection
Shared References are when multiple names reference the same object:

a = 3 
b = a # b and a both point to the same 3 object
a = 'spam' # b is still equals 3

Changes to mutable objects and in-place changes alter the situation somewhat
- In-Place Changes: changing the item at a specific index in a list or changing the value of a key in a dictionary
If two variables reference the same mutable object and an in-place change occurs, then both variables will still reference the same object with the change

a = [1,2,3]
b = a
a[1] = 5
print(b) # [1,5,3]

If you don't want this behavior, you can use the built in copy module or slice from start to finish

import copy
X = copy.copy(Y) # Make top level "shallow copy of object Y
X = copy.deepcopy(Y) # Make deep copy of any object Y: copy all nested parts

Two ways to check for equality in Python code:

L is M # same values, common
L == M # same objects, rare

String Fundamentals

In Python 3, there are three string types: str is used for Unicode text, bytes is used for binary data, and bytearray is a mutable variant of bytes. Files work in two modes: text, which represents content as str and implements unicode encodings and binary, which deals in raw bytes and does no data translation

Common String operations

s= '': Empty string
s = "spam's": double quotes
s = s\np\ta\x00m': escape sequences
s = """... multiline ..."""
S = r'\temp\spam': raw strings
S = b'sp\xc4m' Byte strings
U = u'sp\u00c4m' Unicode strings
S1 + S2: concatenation
S1*4: repeat
S[i] Index
S[i:j] Slicing
len(S) Length
"a %s parrot" String formatting expression
"a {0} parrot".format(kind)
S.find("pa") String methods
S.rstrip(): remove whitespace
S.split(',') split on delimiter
S.replace('pa','xx') Replacement
S.isdigit(): content test
s.lower(): case conversion
s.endswith('spam'): end test
'spam'.join(strlist): delimiter join
S.encode('latin-1'): unicode encoding
B.decode('utf8') Unicode decoding
for x in S: print(X) Iteration
'spam' in S membership
[c*2 for c in S]
map(ord,S)
re.match('spa(.*)am',line) # pattern matching module

String Notes

Raw strings suppress escapes
Slicing is a generalized form of indexing that returns an entire section, not just a single item
Extended slicing S[i][j][k] accepts a step, k, that allows for skipping items and reversing order - as seen in the next section
repr() function to convert to as-code string
Character code conversions with ord() and chr()
To change a string, make a new string (immutable)
Attribute Fetches: an expression of the form object.attribute means fetch the value of attribute in object
Call Expressions: an expression of the form function(arguments) means "invoke the code of the function, passing zero or more comma separated argument to it, and return function's result value

Lists and Dictionaries

lists are ordered arrays of objects
list slice assignment can best be thought of as a combination of deletion and insertion
Beware: append and sort change the list object in place, but don't return the list object as a result
Dictionaries are unordered collections of arbitrary objects, where items are stored and fetched by key instead of positional offset. Dictionaries are hash tables
Assigning to new indices in Dictionaries means adding items
Keys need not always be strings: any immutable objects work as well: integers, tuples
Since attempting to 'get' a nonexistent key from a dictionary throws an error, it probably best to use the get method
Dictionary keys and values (returns from keys() and values() methods) are set-like

Common List Literals and Operations

L = []
L = [123,'abc',1.23,{}]
L = [123,'abc',1.23,['dev','mgr']]
L = list('spam') # ['s','p','a','m']
L = list(range(-4,4)) # List of iterable items
Growing
- L.append(4)
- L.extend([5,6,7])
- L.insert(i,X) # Insert X at position i
Searching
- L.index(x)
- L.count(x)
Sorting, Reversing
- L.sort()
- L.reverse()
Copying, clearing
- L.copy()
- L.clear()
Shrinking
- L.pop(i) # delete and return the last item
- L.remove(i)
- del L[i] # Delete one item
- del L[i:j] # delete an entire section
- L[i:j] = []
Index / Slice Assignment
- L[i] = 3
- L[i:j] = [4,5,6] # Delete items from slice i -> j, insert [4,5,6] at i
List Comprehension and Maps
- L = [x**2 for x in range(5)]
- L = list(map(ord,'spam'))

Common Dictionary Literals and Operations

D = {}
D = {'name':'Bob'}

D = {}
D = {'name':'Bob','age':40}
E = {'cto':{'name':'Bob', 'age': 40}}
D = dict(name="Bob",age=40)
D = dict([('name','Bob'),('age',40)])
keyslist = ['name','age']
valueslist = ['Bob',40]
D = dict(zip(keyslist,valueslist))
D = dict.fromkeys(['name','age']) # {'name':None,'age': None}
D = dict.fromkeys(['name','age'],0) # {'name':0,'age': 0}
D['name'] # indexing by key
E['cto']['age'] # indexing by key
'age' in D # membership: key present test
D.keys() # all keys
D.values() # all values
D.items() # all key vaue tuples
D.clear() # clear (remove all items)
D.copy() # copy (top level)
D2 = {}
D.update(D2) # merge by keys
default = 0
key = 'name'
D.get(key,default)
D.pop(key,default)
D.setdefault(key,default)
D.popitem()
len(D)
D[key] = 42
del D[key]
list(D.keys())
D1 = {'first': 'Frank', 'age': 2}
D2 = {'last': 'Brown', 'age': 2}
D1.keys() & D2.keys()
D.viewkeys(), D.viewvalues()
D = {x: x**2 for x in range(10)}

Tuples and Everything else

Tuples construct simple groups of objects that work exactly like lists except that tuples cant be changes in place and are usually written asd a series of items in parentheses, not square brackets
Tuples: ordered collections of arbitrary objects, addressed by offset, immutable sequences, fixed length, heterogeneous, and arbitrarily nestable, arrays of object References
y=40,;type(y) => tuple
You can change mutables inside a tuple
namedtuple allows tuples to be indexed by both position and attribute name

Common Tuple Literals and Operations

T = () # empty tuple
T = (0,) # A one item tuple
T = (0,'Mi',1.2,3) # A 4 item tuple
T = 0,'Mi',1.2,3 # A 4 item tuple
T = ('Bob',('dev','mgr')) # nested tuple
j = 1
i = 0
T[i]
T[1][j]
T[i:j]
len(T)
T1 = (1,2,3)
T2 = (4,5,6)
T1 + T2 # Concatenation
T1*3 # Repeat
for x in T1: print(x)
2 in T1 # Membership 
[x**2 for x in T2] # Comprehension
T.index('Bob')
T.count('Bob')
from collections import namedtuple
namedtuple('Emp',['name','job']) # Named Tuple Extension type

Files

The built in open function creates a Python file object, which serves as a link to a file residing on your machine. After calling open, you can transfer strings of data to and from the associated external file by calling the returned file objects method
File Usage Notes
File iterators are best for reading lines
Content is strings, not objects
Files are buffered and seekable
- buffered means that text you write may not be transferred from memory to disk immediately - closing a file, or using the flush() method, forces the buffered data to disk
When file objects are reclaimed, pytrhon automatically closes the files if they are still open
Text Files represent content as normal str strings, perform unicode encoding and decoding automatically, and perform end of line translation by default
Binary Files represent content as a special bytes string type and allow programs to access file content unaltered
Files context manager support allows us to wrap file processing code in a logic layer that the file will be closed automatically on exit, instead of relying on the auto-close during garbage collection

# Common File Operations

output = open(r'c:\spam','w') # create an output file
inp = open('data','r') # Create an input file 
inp = open('data') # Same as previous line ('r' is the deault)
aString = inp.read() # Read an entire file into a single string
aStrinbg = inp.readline() # Read next line
aList = inp.readlines() # Read entire file into list of line wstrings (with \n)
output.write(aString)
output.writelines(aList)
output.close() # Manual Close
output.flish() # Flush output bufer to disk without closing 
inp.seek(N) # Change file position to offset N for next operation 
for line in open('data'): print(line) # use line
open('f.txt',encoding="latin-1") 
open('f.bin','rb') # read binary file

The pickle module is an advanced tool that allows us to store almost any Python object ina file directly, with no to- or from- string conversion requirement on our part
- To and from string conversions would require eval (or complicated string parsing) which could introduce vulnerabilities
The translation to and from JSON is automated by the json standard library module

User Comments

There are currently no comments for this article.