Python / Matplotlib Review
One of my goals for 2025 was to complete a machine learning / deep learning project every day. I missed the first three days after being a little wary about getting into Python after having not used it too much for data analytics in a few months (One of the reasons I created this goal was to keep up with my Python / data analytics skills). I am creating this Jupyter Notebook to review Python / NumPy / Matplotlib before getting back into doing problems.
References
Python Review
Python's standard library is very extensive, offering a wide range of facilities. The library contains built-in modules that provide access to system functionality such as file I/O that would otherwise be inaccessible to Python programmers, as well as modules written in Python that provide standardized solutions for many problems that occur in everyday programming. In addition to the standard library, there is an active collection of hundreds of components (from individual programs and modules to packages and entire application development frameworks), available from the Python Package Index.
Introduction
The "Python library" contains several different kinds of components. It contains data types that would normally be consisted part of the "core" of a language, such as numbers and lists. For these types, the Python language core defines the form of literal and places some constraints on their semantics, but does not fully define the semantics. The library also contains built in functions and exceptions - objects that can be used by all Python code without the need of an import statement. The bulk of the library contains a collection of modules. Some modules are written in C and built into the Python interpreter; others are written in Python and imported in source form.
Built-In Functions
I will list some built-in functions that may be of common use here.
- abs(x:int|float): Returns the absolute value of a number
- all(iterable): Return True if all elements of the iterable are true
- any(iterable): Return True if any element of the iterable is true
- bool(object=False): return a boolean value
- callable(object): returns True if the object argument appears callable
- chr(i): Returns a string whose Unicode code point is the integer i
- dict(**kwarg|mapping,**kwarg|iterable,**kwarg): creates a dictionary
- enumerate(iterable): Return an enumerate object.
- filter(function,iterator): Construct an iterator from those elements of iterable for which function is true
- float(number|string): Returns a floating-point number constructed from a number or string
- input(prompt=None): If the prompt argument is present, it is written to standard output without a trailing newline. The function then reads a line from the input, converts it to a string, and returns that
- int(number|string): Returns an integer
- isinstance(object,classinfo): Returns True if the object is an instance of the classinfo argument, or a subclass thereof.
- iter(object): Return an iterator object.
- len(s): Returns the length of an object
- locals(): Returns a mapping object representing the current local symbol table
- map(function,iterable): Return an iterator that applied a function to every item of iterable
- max(iterable): Return the largest item in an iterable or the largest of two or more arguments
- min(iterable): Return the smallest item in an iterable or the smallest of two or more arguments
- next(iterator): Retrieve the next item from the iterator by calling its __next__() method
- open(file,**karg): Open file and return a corresponding file object
- ord(c): Given a string represnting one Unicode character, return an integer representing the Unicode code point of the character
- pow(base,exp): Return base to the power exp; if mod is present, return base to the power exp
- print(*objects,sep='',end='\n',file=None,flush=False)
- reversed(seq): Return a reversed iterator
- round(number,ndigits=None): Return a number rounded to ngits precision after the decimal point
- slice(start=None,stop,step=None): Return a slice object representing the set of indices specified by range(start,stop.step)
- sorted(iterable,key=None,reverse=False): Return a new sorted list from the items in iterable
- sum(iterable,start=None): Sums start and the items of an iterable from left to right and returns the total
- tuple(iterable): return tuple immutable sequence type
- zip(\*iterables): Iterate over several iterables in parallel, producing tuples with an item from each one
Built-In Constants
- False
- True
- None
- NotImplemented: special value which should be returned by the binary special methods to indicate that the operation is not implemented with respect to the other type
- Ellipses: The same as the ellipses literal "...".
- quit(code=None), exit(code=None): Objects that when called, raise SystemExit with the specified exit code
Built-In Types
The principle built-in types are numerics, sequences, mappings, classes, instances, and exceptions. Some collection classes are mutable. Some operations are supported by several object types.
Truth Value Testing
Any object can be tested for truth value, for use in an if or while condition or as operand of the Boolean operations below. By default, an object is considered to be true unless its class defines either a __bool__() method that returns False or a __len__() method that returns zero, when called with the object. Objects considered false:
- objects defined to be false: None, False
- zero of any numeric type: 0, 0.0, 0j, Decimal(0), Fraction(0,1)
- empty sequences and collections: '', (), [], {}, set(), range(0)
Boolean Operations
Operation | Result | Note |
---|---|---|
x or y | if x is false, then x, else y | Breaks on First True |
x and y | if x is false, then x, else y | Breaks on First False |
not x | if x is false, then True, else False | not has a lower priority than non-Boolean operators, so not a==b is implemented as not (a==b) |
Comparisons
Eight comparison operators in Python:
- <: less than
- <=: less than or equal to
- >: greater than
- >=: greater than or equal to
- ==: equal
- !=: not equal
- is: object identity
- is not: negated object identity
Objects of different types, except different numeric types, never compare equal.
Numeric Types
There are three distinct numeric types: integer, floating-point, and complex numbers. Integers have unlimited precision. Floating-point numbers are usually implemented using double in C. Complex numbers have a real and imaginary part, each floating point numbers. Access real/imaginary of complex number z with z.real and z.imag
- x // y: Floored arithmetic
- x % y: modulus
- math.trunc(x): x truncated to Integral
- math.floor(x): the greatest Integral <= x
- math.ceil(x): the east integral
Boolean Type - bool
The bool type has exactly two constant instances: True and False. bool is a subclass of int
Iterator Types
Python supports a concept of iteration over containers. This is implemented using two distinct methods; these are used to allow user-defined classes to support iteration. Sequence always support the iteration methods.
Sequence Types - list, tuple, range
Three basic sequence types list, tuple, and range.
Operation | Result |
---|---|
x in s | True if an item of s is equal to x, else False |
x not in s | False if an item of s is equal to x, else True |
s + t | the concatenation fo s and t |
s * n or n * s | equivalent to adding s to itself n times |
s[i] | ith item of s, origin 0 |
s[i:j] | slice of s form i to j |
s[i:j:k] | slice of s form i to j with step k |
len(s) | length of s |
min(s) | min of s |
max(s) | max of s |
s.index(x,[, i[, j]]) | index of the first occurrence of x in s (at r after index i and before index j) |
s.count(x) | total number of occurrences of x in s |
Sequences of the same type also support comparisons. This means that to compare equal, every element must compare equal and the two sequences must be of the same types and have the same length. Forward and reversed iterators over mutable sequences access values using an index. That index will continue to march forward even if the underlying sequence is mutated.
Mutable Sequence Type Operations
Operation | Result |
---|---|
s[i] = x | |
s[i:j] = t | Slice of s from i to j is replaced by the contents of the iterable t |
del s[i:j] | same as s[i:j] = [] |
s[i:j:k] = t | the elements of s[i:j:k] are replaced by those of t |
del s[i:j:k] | removes the elements of s[i:j:k] from the list |
s.append(s) | appends x to the end of the sequence |
s.clear() | removes all items from s |
s.copy() | creates a shallow copy of s |
s.extend(t) or s+=t | extends s with the contents of t |
s *= n | updates s with its contents repeated n times |
s.insert(i,x) | inserts x into s at the index given by i |
s.pop() or s.pop(i) | retreivs the item at i and also removes it from s |
s.remove(x) | Removes the first item from s where s[i] is equal to x |
s.reverse() | reverses the items of s in place |
Lists
Lists are mutable sequences, usually used to store collections of homogeneous items. Lists may be constructed using many ways: empty list ([]), separated items with commas [a,b,c], list comprehension [x for x in iterable], type constructor: list() or list(iterable).
- list.sort(*,key=None,reverse=False): sorts the list in place, using only the < comparisons between items
Tuples
Tuples are immutable sequences, typically used to stroe collections of heterogeneous data. Tuples may be constructed many ways: empty tuple (()), trailing comma for singleton tuple (a, or (a,)), separating items with commas (a,b,c or (a,b,c)), using the tuple() built-in
Ranges
The range type represents an immutable sequence of numebrs and is commonly used for looping a specific number of times in for loops. range(stop) or range(start,stop,[,step])
Text Sequence Type - str
Textual data in Python is handled with str objects, or strings. Strings are immutable sequences of Unicode code points. String literals can be written with single quotes, double quotes, or triple quoted. The r (raw) prefix disables most escape sequence processing. str(object='') or str(obj=b'',encoding='utf-8',errors='strict')
- str.count(sub[, start[,end]]): count instances of sub
- str.encode(encoding='utf-8'): encode with the provided encoding
- str.endswith(suffix[, start[, end]]): boolean self-explanatory
- str.find[sub[, start[, end]]]: Return the lowest index in the string where the substring sub is found within the slice [start:end]
- str.isalpha(), str.isdecimal(), str.isdigit(), str.isnumeric(), str.lower(), str.isspace(), str.isupper(): self-explanatory
- str.join(iterable): Return a string in the concatenation of the strings in iterable
- str.lower(): Return a copy of string with all cased characters converted to lowercase
- str.replace(old,new,count=-1): Returns a copy of the string with all occurences of substring replaced by new
- str.split(sep=None): split string into list of strings based on sep
- str.splitlines(): split string into list of strings based on newlines
- str.startswith(prefix): boolean self-explanatory
- str.strip(): strips whitespace from left and right
- str.upper(): Returns a copy of the string with all the cased characters converted to uppercase
Binary Sequence Types - bytes, bytearray, memoryview
The core built-in types for manipulating binary data are bytes and bytearray. The bytes objects are immutable sequences of single bytes. The bytearray objects are a mutable counterpart to bytes objects.
Set Types - set, frozenset
A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. set is mtable - the contents can be changed using add(), remove() and the like. The frozenset is immutable and hashable. Sets can be created usinga comma separated list of elements with braces: {'jack', 'sjoerd' }, using set comprehension {c for c in 'abracadabra' if c not in 'abc' }, and using the type constructor set(), set('foobar'), set(['a','b','foo'])
- len(s): Return the number of elements in set s
- x in s: Test x for membership in s
- x not in s: Test x for non-membership in x
- isdisjoint(other): Returns True if the set has no elements in common with other. Sets are disjoint if and only if their intersection is the empty set.
- issubset(other) or set <= other: Test whether every element in the set is in other
- set < other: Test whther the set is a proper subset of other
- issuperset(other) or set >= other: Test whether every element in other is in the set
- set > other : Test whether the set if a proper subset of other
- union(*others) or set | other | ...: Return a new set with elements from the set and all others
- intersection(*others) or set & other & ...: Return a new set with elements common to the set and all others
- difference(*others) or set - other - ...: Return a new set with elements in the set that are not in others
- symmetric_difference(other) or set ^ other: returns in a new set with elements in either set but not in both
- copy(): creates a shallow copy of s
- update(*others) or set |= other | ...: Update the set, adding elements from all others
- intersection_update)*others_ or set &- other & ...: Update the set, keeping only elements found in it and all others
- difference_update(*others) or set -= other | ...: : Update the set, removing elements found in others
- symmetric_difference_update(other) or set ^= other: Update the set, keeping only elements found in either set, but not in both
- add(elem): Add element to the set
- remove(elem): Remove element from the set (throws Error)
- discard(elem): Remove element from the set if present
- pop(): Remove and return an arbitrary element from the set
- clear() Remove all elements from the set
Mapping Types - dict
A mapping object maps hashable values to arbitrary objects. Mappings are mutable objects. There is currently only one standard mapping type, the dictionary. dict(**kwargs), dict(mapping). Create dict with comma separated list of key:value pairs within braces: {'jack': 4098, 'sjoerd': 4127} or {4098: 'jack', 4127: 'sjoerd'}, use a dict comprehension {}, {x: x**2 for x in range(10)}, or using a type constructor dict(), dict([('foo',10).('bar',200)])
- list(d): Return a list of all the keys used in dictionary d
- len(d): Return the number of items in d
d[key]: Return an item of d with key key. Raises KeyError if key not present in the map - d[key] = value: Set d[key] to Value
- del d[key]: Remove d[key] from d
- key in d: Return True if d has the key key
- key not in d
- clear(): Remove all items from the dictionary
- copy(): Return a shallow copy of the dictionary
Methods
- get(key,default=None):
- items(): Return a new view of the dictionary's items (key,value) pairs
- keys(): Return a new view of the dictionary's keys
- pop(key[,default]): If key is in the dictionary, remove it and return its value, else return default
- popitem(): Remove an return (key,value) pair from the dictionary. Pairs are returned in LIFO order.
- reversed(): Return a reverse iterator over the keys of the dictionary
- update([other]): Update the dictionary with key/value pairs from other
- values(): Return a new view of the dictionary's values
Operators
- d | other: Creates a dictionary with the merged keys and values of d and other
- d |= other: Update the dictionary d with keys and values from other
Built-In Exceptions
In Python, all exceptions must be instances of a class that derives from BaseException. In a try statement with an except clause that mentions a particular class, that clause also handles any exception classes derived from that class (but not exception classes from which it is derived). Two exception classes that are not related via subclassing are never equivalent, even if they have the same name.
NumPy User Guide
Getting Started
What is NumPy?
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
At the core of the NumPy package is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. Several important aspects of ndarrays:
- NumPy arrays have a fixed size at creation
- All elements in NumPy array are required to be the same data type, and thus each element will have the same size in memory
- NuMpy arrays facilitate advanced mathematical and other types of operations on large numbers of data
- A growing plethora of scientific and mathematic Python-based packages are using NumPy arrays
Why is NumPy fast?
Vectorization describes the absence of any explicit looping, indexing, etc, in the code - these things are taking place "behind the scenes" in pre-optimized C code. Vectorized code has many advantages:
- easier to read
- fewer lines of code
- the code closely resembles mathematical notation
- vectorization results in more "Pythonic" code
Broadcasting is the term used to describe the implicit element-by-element behavior of operations; generally speaking, in NumPy all operations, not just arithmetic operations, but logical, bit-wise, functional, etc., behave in this implicit element-by-element fashion, i.e. they broadcast.
NumPy Quickstart
NumPy's main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy, dimensions are called axes. NumPy's array class is called ndarray. The most important attributes of ndarray are:
- ndarray.ndim: number of axes
- ndarray.shape: tuple of integers indicating the size of each array in each dimension
- ndarray.size: the total number of elements in the array
- ndarray.dtype: an object describing the type of elements in the array
- ndarray.itemsize: the size in bytes of each element in the array
- ndarray.data: buffer containing the actual elements of the array
There are several ways to create an array. np.array takes a single sequence as an argument. It transforms sequences of sequences into two-dimensional arrays. The type of the array can be explicitly specified at creation time. Often on array creation, the size of the array is known, but the elements are unknown: zeros creates an array full of zeros, ones creates an array full of ones, empty creates an array whose initial content is random. By default, the dtype of the created array is float64. The built in arange function is like the range Python built-in. The linspace argument receives an argument the number of elements we want, instead of the step.
When you rpint an array, NumPy displays it in a similar way to nested lists, but with the following layout:
- the last axis printed left to right
- the second-to-last is printed from top-to-bottom
- the rest are also printed from top to bottom, with each slice separated form the next by an empty line
import numpy as np
a = np.arange(15).reshape(3, 5)
print(a.ndim)
print(a.shape)
print(a.size)
print(a.dtype)
print(a.itemsize)
print(a.data)
a = np.array([2,3,4]) # Pass array by passing in list of tuple, the type is deduced
print(a)
b = np.ones((2,3,4),dtype=np.int16)
print(b)
x = np.linspace(0,2*np.pi,100)
print(x)
Arithmetic operators on arrays are applied elementwise. A new array is created and filled with the result. Unlike in other matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product can be performed using the @ operator or the dot function or method. Some operations, such as += and *=, act in place to modify an existing array rather than create a new one. When operating with arrays of different types, the type of the resulting array corresponds to the more general or precise one (a behavior known as upcasting). Many unary operations, such as computing the sum of all the elements in the array (ndarray.sum()), are implemented as methods of the ndarray class. You can apply these operations to the array as though it were a list of numbers, regardless of its shape. However, by specifying the axis parameter you can apply an operation along the specified axis of an array. NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called "universal functions" (ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output. One-dimensional arrays can be indexed, sliced, and iterated over like lists and other Python sequences. Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas. When fewer indices are provided than the number of axes, the missing indices are considered complete slices :. NuMpy also allows you to write this using dots. The dots ... represent as many colons as needed to produce a complete indexing tuple. Iterating over Multidimensional arrays is done with respect to the first axes. If one wants to perform an operation on each element in the array, one can use the flat attribute which is an iterator overall the elements of the array.
a = np.arange(8)**3
print(a[2])
print(a[2:5])
print(a[::-1]) # Reversed
np.random.seed(42)
b = np.random.randint(0,10,size=(10,10))
print(b[0:5,5:9])
print(b[-1]) # Last Row
for i, el in enumerate(b.flat):
if i < 10:
print(el)
An array has a shape given by the number of elements along each axis. The shape of an array can be changes with various commands. These commands typically return a modified array, but do not change the original array. The order of the elements in the array resulting from ravel is normally “C-style”.
rg = np.random.default_rng(32)
a = np.floor(10 * rg.random((3, 4)))
a
a.ravel()
a.reshape(6, 2) # Returns the array with a modified shape
a.T # Transposes the array
Several arrays can be stacked together along different axes.
a = np.floor(10 * rg.random((2, 2)))
b = np.floor(10 * rg.random((2, 2)))
print(np.vstack((a, b)))
print(np.hstack((a, b)))
The copy() method can be used to make a deep copy of the ndarray. Broadcasting allows universal functions to deal in a meaningful way with inputs that do not have exactly the same shape. The first rule of broadcasting is that if all input arrays fo not have the same number of dimensions, a "1" will be repeatedly prepended to the shapes of the smaller arrays until all the arrays have the same number of dimensions. The second rule of broadcasting ensures that arrays with a size of 1 along a particular dimension act as if they had the size of the array with the largest shape along that dimension. The value of the array element is assumed to be the same along that dimension for the "broadcast" array.
NumPy offers more indexing facilities than regular Python sequences. In addition to indexing, as we saw before, arrays can be indexed by arrays of integers and arrays of booleans.
a = np.arange(12)**2 # the first 12 square numbers
i = np.array([1, 1, 3, 8, 5]) # an array of indices
print(a[i]) # the elements of `a` at the positions `i`
j = np.array([[3, 4], [9, 7]]) # a bidimensional array of indices
a[j] # the same shape as `j`
When the indexed array a is multidimensional, a single array of indices refers to the first dimension of a. The following example shows this behavior by converting an image of labels into a color image using a palette.
palette = np.array([[0, 0, 0], # black
[255, 0, 0], # red
[0, 255, 0], # green
[0, 0, 255], # blue
[255, 255, 255]]) # white
image = np.array([[0, 1, 2, 0], # each value corresponds to a color in the palette
[0, 3, 4, 0]])
palette[image] # the (2, 4, 3) color image
When we index arrays with arrays of (integer) indices we are providing the list of indices to pick. With boolean indices the approach is different; we explicitly choose which items in the array we want and which ones we don’t.
The most natural way one can think of for boolean indexing is to use boolean arrays that have the same shape as the original array:
a = np.arange(12).reshape(3, 4)
b = a > 4
print(b)
print(a[b]) # 1d array with the selected elements
Matplotlib Quickstart Guide
This tutorial covers some basic usage patterns and best practices to help you get started with Matplotlib.
import matplotlib.pyplot as plt
Matplotlib graphs your data on Figures, each of which can contain one or more Axes, an area where points can be specified in terms of x-y coordinates (or $ \theta $-r in a polar plot, or x-y-x in a 3D plot, etc.). The simplest way of creating a figure with an Axes is using pyplot.subplots. We can then use Axes.plot to draw sime data on the Axes, and show to display the figure.
fig, ax = plt.subplots()
ax.plot(list(range(1,5)),[1,4,2,3])
ax.set_title("First Plot Back")
ax.set_xlabel("Range")
ax.set_ylabel("Array")
plt.show()
Parts of a Figure
Figure
The whole figure. The Figure keeps track of all the child Axes, a group of 'special' Artists (titles,figure legends, colorbars, etc.), and even nested subfigures. Typically, you'll create a Fogure through one of the following functions:
fig = plt.figure() # an empty figure with no Axes
fig, ax = plt.subplots() # a figure with a single Axes
fig, axs = plt.subplots(2, 2) # a figure with a 2x2 grid of Axes
# a figure with one Axes on the left, and two on the right:
fig, axs = plt.subplot_mosaic([['left', 'right_top'],
['left', 'right_bottom']])
Axes
An Axes is an Artist attached to a Figure that contains a region for plotting data, and usually includes two (or three in the case of 3D) Axis objects that provide ticks and tick labels to provide scales for the data in the Axes. Each Axes also has a title, x-label, and y-label. The Axes methods are the primary interface for configuring most parts of your plot (adding data, controlling axis scales and limits, adding labels, etc.).
Axis
These objects set the scale and limits and generate ticks (the marks on the Axis) and ticklabels (strings labeling the ticks). The location of the ticks is determined by a Locator object and ticklabel strings are formatted by the Formatter. The combination of the correct Locator and Formatter gives very fine control over the tick location and labels.
Artist
Basically, everything visible on the Figure is an Artist (even Figure, Axes, and Axis). This includes Text objects, Line2D objects, collections objects, Patch objects, etc. When the Figure is rendered, all of the Artists are drawn to the canvas. Most Artists are tied to an Axes; such an Artist cannot be shared by multiple Axes, or moved form one to another.
np.random.seed(19680801) # seed the random number generator.
data = {'a': np.arange(50),
'c': np.random.randint(0, 50, 50),
'd': np.random.randn(50)}
data['b'] = data['a'] + 10 * np.random.randn(50)
data['d'] = np.abs(data['d']) * 100
fig, ax = plt.subplots(figsize=(5, 2.7), layout='constrained')
ax.scatter('a', 'b', c='c', s='d', data=data)
ax.set_xlabel('entry a')
ax.set_ylabel('entry b')
As noted above, there are essentially two ways to use Matplotlib:
- Explicitly create Figures and Axes, and call methods on them (the "object-oriented (OO) style").
- Rely on pyplot to implicitly create and manage the Figures and Axes, and use pyplot functions for plotting.
<aside>
Element
<details>
Element
Comments
You have to be logged in to add a comment
User Comments
There are currently no comments for this article.