Reviewing Python Part 1
Python was the first programming language that I attempted to learn by myself, but it has been more than a year since I have used it consistently, so I am going to try to quickly go through the basics of python, numpy, and pandas to make machine learning code easier to understand
Reviewing Python with Learning Python, 5th ed., by Mark Lutz
Python was the first programming language that I attempted to learn by myself, but it has been more than a year since I have used it consistently, so I am going to try to quickly go through the basics of python, numpy, and pandas to make machine learning code easier to understand
Python Core Data Types
- Core datatype are built into the language
- Numbers
- integers, floating point numbers, complex numbers, decimals, rationals, and sets
- The python math module contains more advanced numeric tools as functions
- Strings
- text information and arbitrary collections of bytes
- They are a sequence - a positionally ordered collection of other objects
- Strings are sequences of one character strings
- Strings are immutable - they cannot be changed in place after they are not created (every string operation below produces a new string)
- Encode and decode strings with the encode() and decode() methods
- Lists
- Lists are the most general sequence provided by the language
- Lists are positionally ordered collections of arbitrarily types objects, and they have no fixed size
- They are mutable
- Python includes operation known as list comprehension expression which are good for processing structures like a matrix
- List coprehensions, when wrapped in parethenses, return generators that can be called with the next() function on demand to produce the next result
- referencing a non existent index in a list throws an error
- Dictionaries
- Python dictionaries are mappings. Mappings are collections of other objects that store objects by key instead of relative position
- Dictionaries are mutable
- Fetching a nonexistent key in a dictionary throws an error
- Tuples
- The tuple object is roughly like a list that cannot be changed. Tuples are sequences like lists, but they are immutable, like strings.
- Functionally, they are used to represent fixed collections of items: the components of a specific calender date, for instance
- Why use tuples? Primarily for their immutability which provides consistency
- Files
Python code's main interface to external files on your computer
The second argument to the built in open function is called the processing mode, which can be
- w: write (create file if doesn't exist)
- r: read
- a: append
- wb: write binary
- rb: read binary
A file's content are always string in the stript, regardless of the type of data it contains
The best way to read a file today is to not read it at all - files provide an iterator that automatically reads line by line in for loops and other contexts
You can specify the keyword argument encoding with the built in open function to read / write with specific encoding
- Sets
- Unordered collections of unique and immutable objects
- Can create a set using the built in set() function or by using {}
- Other Core Types: Booleans, types None
- Fixed Precision Floating Point Numbers
- Fractions
- Booleans bool
- None
- type() object
- Program Unit Types
- Implementation-related types (Compiled Code, stack tracebacks)
General
- Every Object in python is classified as either immutable or not. In terms of the core types, numbers, strings, and tuples are immutable; lists, dictionariesm and sets are not
- Python has a feature called garbage collection that cleans up unused memory as your program runs and frees you from having to manage your details in code
- An object is iterable if it is either a physically stored sequence in memory, or an object that generated one item at a time in the context of an iteration operation - s ort of "virtual" sequence
- Both of these types of objects are considered iterable because they support the iteration protocol - they respond to the iter call with an object that advacnes in response to next calls and raises an exception when finished producing values
- In python, we code to object interfaces, bot to types. That is, we care about what an object does, not what it is
Definitions
- Polymorphism - the meaning of an operation depends on the objects being operated on
Numeric Types
Objects
- Integers and Floating point objects
- Integers are written as strings of decimal digits. Floating point numbers have a decimal point and/or an optrional signed exponent introduced by an e or E followed by an optional sign
- Integer types: normal (32 bit), and long (unlimited precision, long integers may end in and l or L to force it top become a long integer) All integers are long integers in Python 3.X
- Hexidecimal, octal, and binary literals
- Integres may be coded in decimal (base 10), hexidecimal (base 16), octal (base 8), or binary (base 2)
- Hexidecimals start with 0x followed by a string of hexidecimal digits (0-9,A-F)
- Octals start with 0o followed by a string of octal digits (0-7)
- Binary literals are written 0b
- Complex Number objects
- Python complex literals are written real + imaginaryj
- Complex numbers can also be created with the built in complex(real,img) call
- Decimal: fixed precision objects
- Fraction: rational number objects
- Sets: collections with numeric operations
- Booleans: true and false
- Built in functions and modules: round, math, rnadom
- Expressions; unlimited integer precision; bitwise operations; hex, octal, and binary functions
- Third party extensions: vectors, libraries, visualization, plotting, etc.
Expressions:
- Expression operators:
- +,-,*,/,>>,**,&,etc.
- yield x: Generator function send protocol
- lambda args: expression: Anonymous function generation
- x if y else z: Ternary selection (x is evaluated only if y is true)
- x or y: Logical OR
- x and y: Logical AND
- not x: Logical negation
- x in y, x not in y: Membership (iterables, sets)
- x is y, x is not y: Object identity tests
- x < y, x<= y, x>= y, x >y: Magnitude comparisons
- x==y,x!=y: value equality comparisons
- x | y: bitwise OR, set union
- x^y: Bitwise XOR, set symmetric difference
- x & y: bitwise AND, set intersection
- x << y, x >> y: shift x left or right by y bits
- x + y: addition, concatenation
- x - y: subtraction, set difference
- x * y : multiplication, repetition
- x % y: remainder, format
- x / y, x//y: division, true and floor
- -x, +x: negation, identity
- x**y: power
- x[i]: indexing (sequence, mapping, other)
- x[i:j:k]: slicing
- x(...): call (function, method, class, other callable)
- x.attr: Attribute reference
- (...): Tuple, expression, generator expression
- [...]: List, list comprehension
- {...}: Dictionary, set, set and dictionary comprehension
How does python know which operation to perform first? The answer is operator precedence. Python groups expressions with one or more operators according to precedence rules, and this grouping determins the order in which expression's parts are completed
- Operators low in the list above have a higher precedence
- Ex: X + Y * X => Python performs multiplication before addition because multiplication has higher precedence
- You can forget about precedence completely if you are careful about parentheses
Mixed types are converted to the more complicated operand, which is readonable. This appies generally only to numeric types
All python operators can be overloaded (implemented) by Python classes and C extension types to work on objects you create
Floating point number comparisons sometimes require massaging to become meaningful
Divsion
- Python 3 always performs true division with the / operator (keeps floating point)
- // == floor divison -> always rounds down versus towards 0 (think negative numbers)
Built In Functions
- pow and abs
- math module
- math.pi and math.e
- math.sqrt()
- min()
- max()
- sum()
- math.floor(), math.trunc(), round()
- random module
- random.random() == Math.random()
- random.randint(1,10)
- random.choice(['a','b','c'])
- rnadom.shuffle(iterable)
- Built in functions:
- pow,abs,ound,int,hex,bin,etc.
- Utility modules
- random, math
Decimals - Can results in greater precision
- can set precision globally decimal.getcontext().prec = 2
- you can use the context manager to use different precisions throughout the code
Fractions - Declare the numerator and denominator in the constructor
Sets - To make a set object, pass in a sequence or other iterable object to the build it setr function -> returns a set object which contains all the items of the object passed in
- Sets can only contain immutable (hashable) object types -> lists and dictionaries cannot be embedded in sets, but tuples can be if you need to store compound values
- Uses:
- Sets canb be used to filter duplicates out of other collections
- Sets can be used to isolate differences in lists, strings, and other iterable objects (with trhe difference - operator)
- random, math
c = set(a) - set(b)
"""
c only contains values that are in set a, but not in set b
"""
- Sets can be used to perform *order neutrality equality test* -> two sets are equal if every element in each set is in the other.
Booleans
- Some would argue that booleans are numeric and nature given that they have two values - True or False (1 and 0)
Definitions:
- Expression: a combination of numbers and operators that computes a value when executed by Python
- Variables: names that are used to keep track of your information in a program
Dynamic Typing
Python's types are determined automatically at runtimem not in response to declarations in the code -> dynamic typing
A Variable is created when your code first assigns it a value. Future assignments change the value of the already created name.
A variable never has any type information or constraints associated with it. The notion of type lives within objects, not names.
When a variable appears in an expression, it is imeediately replaces with the object that it currently refers to, whatever that may be
Variables are entries in a system table, with spaces for links to objects
Objects are pieces of allocated memory, with enough space to represent the values for which they stand
References are automatically followed pointers from variables to objects
Variables don't carry types, objects do
Objects are garbage collected
- Whenever a name is assigned to a new object, the space held by the prior object is reclaimed if it is not referenced by any other name or object
- The automatic reclamation of objects' space is known as garbage collection
Shared References are when multiple names reference the same object:
a = 3
b = a # b and a both point to the same 3 object
a = 'spam' # b is still equals 3
- Changes to mutable objects and in-place changes alter the situation somewhat
- In-Place Changes: changing the item at a specific index in a list or changing the value of a key in a dictionary
- If two variables reference the same mutable object and an in-place change occurs, then both variables will still reference the same object with the change
a = [1,2,3]
b = a
a[1] = 5
print(b) # [1,5,3]
- If you don't want this behavior, you can use the built in copy module or slice from start to finish
import copy
X = copy.copy(Y) # Make top level "shallow copy of object Y
X = copy.deepcopy(Y) # Make deep copy of any object Y: copy all nested parts
Two ways to check for equality in Python code:
L is M # same values, common
L == M # same objects, rare
String Fundamentals
In Python 3, there are three string types: str is used for Unicode text, bytes is used for binary data, and bytearray is a mutable variant of bytes. Files work in two modes: text, which represents content as str and implements unicode encodings and binary, which deals in raw bytes and does no data translation
Common String operations
- s= '': Empty string
- s = "spam's": double quotes
- s = s\np\ta\x00m': escape sequences
- s = """... multiline ..."""
- S = r'\temp\spam': raw strings
- S = b'sp\xc4m' Byte strings
- U = u'sp\u00c4m' Unicode strings
- S1 + S2: concatenation
- S1*4: repeat
- S[i] Index
- S[i:j] Slicing
- len(S) Length
- "a %s parrot" String formatting expression
- "a {0} parrot".format(kind)
- S.find("pa") String methods
- S.rstrip(): remove whitespace
- S.split(',') split on delimiter
- S.replace('pa','xx') Replacement
- S.isdigit(): content test
- s.lower(): case conversion
- s.endswith('spam'): end test
- 'spam'.join(strlist): delimiter join
S.encode('latin-1'): unicode encoding - B.decode('utf8') Unicode decoding
- for x in S: print(X) Iteration
- 'spam' in S membership
- [c*2 for c in S]
- map(ord,S)
- re.match('spa(.*)am',line) # pattern matching module
String Notes
- Raw strings suppress escapes
- Slicing is a generalized form of indexing that returns an entire section, not just a single item
- Extended slicing S[i][j][k] accepts a step, k, that allows for skipping items and reversing order - as seen in the next section
- repr() function to convert to as-code string
- Character code conversions with ord() and chr()
- To change a string, make a new string (immutable)
- Attribute Fetches: an expression of the form object.attribute means fetch the value of attribute in object
- Call Expressions: an expression of the form function(arguments) means "invoke the code of the function, passing zero or more comma separated argument to it, and return function's result value
Lists and Dictionaries
- lists are ordered arrays of objects
- list slice assignment can best be thought of as a combination of deletion and insertion
- Beware: append and sort change the list object in place, but don't return the list object as a result
- Dictionaries are unordered collections of arbitrary objects, where items are stored and fetched by key instead of positional offset. Dictionaries are hash tables
- Assigning to new indices in Dictionaries means adding items
- Keys need not always be strings: any immutable objects work as well: integers, tuples
- Since attempting to 'get' a nonexistent key from a dictionary throws an error, it probably best to use the get method
- Dictionary keys and values (returns from keys() and values() methods) are set-like
Common List Literals and Operations
- L = []
- L = [123,'abc',1.23,{}]
- L = [123,'abc',1.23,['dev','mgr']]
- L = list('spam') # ['s','p','a','m']
- L = list(range(-4,4)) # List of iterable items
- Growing
- L.append(4)
- L.extend([5,6,7])
- L.insert(i,X) # Insert X at position i
- Searching
- L.index(x)
- L.count(x)
- Sorting, Reversing
- L.sort()
- L.reverse()
- Copying, clearing
- L.copy()
- L.clear()
- Shrinking
- L.pop(i) # delete and return the last item
- L.remove(i)
- del L[i] # Delete one item
- del L[i:j] # delete an entire section
- L[i:j] = []
- Index / Slice Assignment
- L[i] = 3
- L[i:j] = [4,5,6] # Delete items from slice i -> j, insert [4,5,6] at i
- List Comprehension and Maps
- L = [x**2 for x in range(5)]
- L = list(map(ord,'spam'))
Common Dictionary Literals and Operations
- D = {}
- D = {'name':'Bob'}
D = {}
D = {'name':'Bob','age':40}
E = {'cto':{'name':'Bob', 'age': 40}}
D = dict(name="Bob",age=40)
D = dict([('name','Bob'),('age',40)])
keyslist = ['name','age']
valueslist = ['Bob',40]
D = dict(zip(keyslist,valueslist))
D = dict.fromkeys(['name','age']) # {'name':None,'age': None}
D = dict.fromkeys(['name','age'],0) # {'name':0,'age': 0}
D['name'] # indexing by key
E['cto']['age'] # indexing by key
'age' in D # membership: key present test
D.keys() # all keys
D.values() # all values
D.items() # all key vaue tuples
D.clear() # clear (remove all items)
D.copy() # copy (top level)
D2 = {}
D.update(D2) # merge by keys
default = 0
key = 'name'
D.get(key,default)
D.pop(key,default)
D.setdefault(key,default)
D.popitem()
len(D)
D[key] = 42
del D[key]
list(D.keys())
D1 = {'first': 'Frank', 'age': 2}
D2 = {'last': 'Brown', 'age': 2}
D1.keys() & D2.keys()
D.viewkeys(), D.viewvalues()
D = {x: x**2 for x in range(10)}
Tuples and Everything else
- Tuples construct simple groups of objects that work exactly like lists except that tuples cant be changes in place and are usually written asd a series of items in parentheses, not square brackets
- Tuples: ordered collections of arbitrary objects, addressed by offset, immutable sequences, fixed length, heterogeneous, and arbitrarily nestable, arrays of object References
- y=40,;type(y) => tuple
- You can change mutables inside a tuple
- namedtuple allows tuples to be indexed by both position and attribute name
Common Tuple Literals and Operations
T = () # empty tuple
T = (0,) # A one item tuple
T = (0,'Mi',1.2,3) # A 4 item tuple
T = 0,'Mi',1.2,3 # A 4 item tuple
T = ('Bob',('dev','mgr')) # nested tuple
j = 1
i = 0
T[i]
T[1][j]
T[i:j]
len(T)
T1 = (1,2,3)
T2 = (4,5,6)
T1 + T2 # Concatenation
T1*3 # Repeat
for x in T1: print(x)
2 in T1 # Membership
[x**2 for x in T2] # Comprehension
T.index('Bob')
T.count('Bob')
from collections import namedtuple
namedtuple('Emp',['name','job']) # Named Tuple Extension type
Files
- The built in open function creates a Python file object, which serves as a link to a file residing on your machine. After calling open, you can transfer strings of data to and from the associated external file by calling the returned file objects method
File Usage Notes - File iterators are best for reading lines
- Content is strings, not objects
- Files are buffered and seekable
- buffered means that text you write may not be transferred from memory to disk immediately - closing a file, or using the flush() method, forces the buffered data to disk
- When file objects are reclaimed, pytrhon automatically closes the files if they are still open
- Text Files represent content as normal str strings, perform unicode encoding and decoding automatically, and perform end of line translation by default
- Binary Files represent content as a special bytes string type and allow programs to access file content unaltered
- Files context manager support allows us to wrap file processing code in a logic layer that the file will be closed automatically on exit, instead of relying on the auto-close during garbage collection
# Common File Operations
output = open(r'c:\spam','w') # create an output file
inp = open('data','r') # Create an input file
inp = open('data') # Same as previous line ('r' is the deault)
aString = inp.read() # Read an entire file into a single string
aStrinbg = inp.readline() # Read next line
aList = inp.readlines() # Read entire file into list of line wstrings (with \n)
output.write(aString)
output.writelines(aList)
output.close() # Manual Close
output.flish() # Flush output bufer to disk without closing
inp.seek(N) # Change file position to offset N for next operation
for line in open('data'): print(line) # use line
open('f.txt',encoding="latin-1")
open('f.bin','rb') # read binary file
- The pickle module is an advanced tool that allows us to store almost any Python object ina file directly, with no to- or from- string conversion requirement on our part
- To and from string conversions would require eval (or complicated string parsing) which could introduce vulnerabilities
- The translation to and from JSON is automated by the json standard library module