Numpy, Pandas, and Matplotlib Notes

Read a textbook about numpy, pandas, and matplotlib to get a better understanding of the libraries. Here are the notes my that.

Python for Data Analysis, 3E, by Wes McKinney

Introduction

NumPy Basics: Arrays and Vectorized Computation

ndarray (N-Dimenional Array)

mask = (names =="Bob") | (names=="Will")

Getting Started with Pandas

Introduction

Essential Functionality

Statistics

Data Loading Storage and File Formats

Reading data and making it accessible, called data loading is neccessary first step for using most of the tools in this book. The term parsing is also sometime used to descibe loading text data and interpreting it as tables and different data types.

Data Cleaning and Preperation

During the course of doing daata analysis and modeling, a significant amount of time is spent on data preparation: loading, cleaning, transforming, and rearranging. Such tasks are often reported to take up to 80% or more of an analysts time.

Data Wrangling: Join, Combine, and Reshape

Plotting and Visualization

Data Aggregation and Group Operations

Hadley Wickham, an author of many popular packages for the R programming
language, coined the term split-apply-combine for describing group operations. In the
first stage of the process, data contained in a pandas object, whether a Series, Data‐
Frame, or otherwise, is split into groups based on one or more keys that you provide.
The splitting is performed on a particular axis of an object. For example, a DataFrame
can be grouped on its rows (axis="index") or its columns (axis="columns"). Once
this is done, a function is applied to each group, producing a new value. Finally,
the results of all those function applications are combined into a result object. The
form of the resulting object will usually depend on what’s being done to the data. See
Figure 10-1 for a mockup of a simple group aggregation.