Jupyter Notebooks
I want to keep track of some things that I have learned about Jupyter notebooks here. I also want to take notes on Jupyter Notebooks and nbconvert in general to maybe improve my Jupyter Notebook to HTML converter.
References
- Project Jupyter Home
- Kernels
- Education
- Jupyter Notebooks offer exciting and creative possibilities in education. The following subprojects are focused on supporting the use of Jupyter Notebook in a variety of educational settings.
- nbgrader
- Tools for managing, grading, and reporting of notebook based assignments
- jupyter4edu
- GitHub organization hosting community resources for Jupyter in education.
- Execution
- Notebooks can be run outside of the browser interface with the following utility subprojects
- nbclient
- NBClient lets you execute notebooks from different contexts, including the command-line.
- Deployment and Infrastructure
- To serve a variety of users and use cases, these subprojects are being developed to support notebook deployment in various contexts, including multiuser capabilities and secure, scalable cloud deployments.
- jupyterhub
- Multi-user notebook for organizations with pluggable authentication and scalability.
- nbviewer
- Share notebooks as static HTML on the web.
- Binder
- Turn a Git repo into a collection of interactive notebooks
- dockerspawner
- Deploy notebooks for 'jupyterhub' inside Docker containers
- docker-stacks
- Stacks of Jupyter applications and kernels as Docker containers
- Formatting and Conversion
- Core Building Blocks
- Project Documentation
Definitions
- Jupyter Notebook is a simplified notebook authoring application and is part of Project Jupyter, a large umbrella project centered around the goal of providing tools (and standards) for interactive computing with computational notebooks.
- Kernels are programming language specific processes that run independently and interact with the Jupyter Applications and their user interfaces. ipykernel is the reference Jupyter kernel built on top of IPython, providing a powerful environment for interactive computing in Python.
Jupyter Notebook Notes
A notebook is a shareable document that combines computer code, plain language descriptions, data, rich visualizations like 3D models, charts, graphs and figures, and interactive controls. A notebook, along with an editor (like JupyterLab), provides a fast interactive environment for prototyping and explaining code, exploring and visualizing data, and sharing ideas with others.
IPython
IPython provides a rich architecture for interactive computing with:
- A powerful interactive shell
- A kernel for Jupyter
- Support for interactive data visualization and use of GUI toolkits
- Flexible, embeddable interpreters to load into your own projects
- Easy to use, high performance tools for parallel computing
Architecture
The Jupyter Notebook Format
Jupyter Notebooks are structured data that represent your code, metadata, content, and outputs. When saved to disk, the notebook uses the extension .ipynb
, and uses a JSON structure.
The Jupyter Notebook Interface
Jupyter Notebook and its flexible interface extends the notebook beyond code to visualization, multimedia, collaboration, and more. In addition to running your code, it stores code and output, together with markdown notes, in an editable document called a notebook. When you save it, this is sent from your browser to the Jupyter server, which saves it on disk as a JSON file with a .ipynb
extension.
Exporting Jupyter Notebooks to Other Formats
The nbconvert
tool in Jupyter converts notebook files to other formats, such as HTML, LaTeX, or reStructuredText. This conversion goes through a series of steps:
- Preprocessors modify the notebook in memory
- An exporter converts the notebook to another file format. Most of the exporters use templates for this.
- Postprocessors work on the file produced by exporting.
The nbviewer website uses nbconvert
with the HTML exporter. When you give it a URL, it fetches the notebook from that URL, converts it to HTML, and serves that HTML to you.
Nbconvert
I want to take notes on nbconvert
because I may want to use it to improve my presentation of output cells in my Jupyter Notebook to HTML converter. This Jupyter Notebook, showing an IPython Notebook to analyze a global crisis, demonstrates some ways in which nbconvert
might best my current .ipynb
to HTML converter (see how the provided example displays a pandas' DataFrame as a table instead of as text).
Installation
$ pip install nbconvert # nbconvert is packaged for both pip and conda
$ sudo apt-get install pandoc # nbconvert uses Pandoc to convert markdown to formats OTHER THAN HTML
$ # For converting notebooks to PDF, nbconvert makes use of LaTeX and XeTeX as the rendering engine
$ sudo apt-get install texlive-xetex texlive-fonts-recommended texlive-plain-generic
Conversion
The command-line syntax to run the nbconvert
script can be seen below. It converts the Jupyter notebook file notebook.ipynb
into the output format given by the FORMAT
string.
$ jupyter nbconvert --to FORMAT notebook.ipynb
Supported Output Formats
- HTML
- LaTeX
- WebPDF
- Reveal.js HTML Slideshow
- Markdown
- Ascii
- reStructuredText
- executable script
- notebook
HTML
--to html
- HTML Export.
--template lab
(default)- A full static HTML render of the notebook. This looks very similar to the JupyterLab interactive view. It supports the extra
--theme
option, which defaults tolight
, but can bedark
or some other custom theme.
- A full static HTML render of the notebook. This looks very similar to the JupyterLab interactive view. It supports the extra
--template classic
- Simplified HTML, using the classic jupyter look and feel
--template basic
- Base HTML, rendering with minimal structure and styles
--embed-images
- If this option is provided, embed images as base64 urls in the resulting HTML file.
Using nbconvert as a library
High Level Overview of converting a notebook to another format:
- Retrieve the notebook and it's accompanying resources
- Feed the notebook into the
Exporter
, which: - Sequentially feeds the notebook into an array of
Preprocessor
s. Preprocessors only act on the structure of the notebook, and have unrestricted access to it. - Feeds the notebook into the Jinja templating engine, which converts it to a particular format depending on which template is selected.
- Sequentially feeds the notebook into an array of
- The exporter returns the converted notebook and other relevant resources as a tuple.
- You write the data to the disk using the built-in
FilesWriter
(which writes the notebook and any extracted files to disk), or elsewhere using a customWriter
.
Common Tasks
Merging Jupyter Notebooks
>> pip install nbmerge
>> pip install nbformat
>> nbmerge file1.ipynb file 2.ipynb > merged.ipynb
Downloading Kaggle Datasets in Google Colab
- The code snippets below should be executed in different cells in a Jupyter Notebook / Colab.
- Note: If you get a
403 (Forbidden)
error when downloading the data, it just means that you need to accept the terms associated with the dataset before you download it.
# 1. Download the Kaggle key from Kaggle website (settings)
from google.colab import files files.upload()
# Execute the following in Google Colab
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod kaggle.json ~/.kaggle/kaggle.json
# Download the data of the dataset, in this case, dogs-vs-cats
!kaggle competitions download -c dogs-vs-cats
# Uncompress (unzip) the training data, which comes in a zip folder, silently (-qq)
!unzip -qq train.zip
About Google Colab
- Manage Colab Subscription
- Note that switching to GPU changed from 6s per step to 170ms per step of epoch
- Look over the Jupyter notebook that comes with purchasing google Colab monthly subscription
- Google Colab Markdown
Faster GPUs
- Users who have purchased one of Colab's paid plans have access to faster GPUs and more memory. You can upgrade your notebook's GPU settings in
Runtime > Change runtime type
in the menu to select from several accelerator options, subject to availability. - The free of charge version of Colab grants access to Nvidia's T4 GPUs subject to quota restrictions and availability.
- You can see what GPU you've been assigned at any time by executing the following cell. If the execution result of running the code cell below is "Not connected to a GPU", you can change the runtime by going to
Runtime > Change runtime type
in the menu to enable a GPU accelerator, and then re-execute the code cell.
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
print('Not connected to a GPU')
else:
print(gpu_info)
- In order to use a GPU with your notebook, select the `Runtime > Change runtime type` menu, and then set the hardware accelerator to the desired option.
More Memory
- Users who have purchased one of Colab's paid plans have access to high-memory VMs when they are available. More powerful GPUs are always offered with high-memory VMs.
- You can see how much memory you have available at any time by running the following code cell. If the execution result of running the code cell below is "Not using a high-RAM runtime", then you can enable a high-RAM runtime via
Runtime > Change runtime type
in the menu. Then select High-RAM in the Runtime shape toggle button. After, re-execute the code cell.
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))
if ram_gb < 20:
print('Not using a high-RAM runtime')
else:
print('You are using a high-RAM runtime!')
Longer runtimes
- All Colab runtimes are reset after some period of time (which is faster if the runtime isn't executing code). Colab Pro and Pro+ users have access to longer runtimes than those who use Colab free of charge.
Background Execution
- Colab Pro+ users have access to background execution, where notebooks will continue executing even after you've closed a browser tab. This is always enabled in Pro+ runtimes as long as you have compute units available.
Relaxing Resource Limits in Colab Pro
- Your resources are not unlimited in Colab. To make the most of Colab, avoid using resources when you don't need them. For example, only use a GPU when required and close Colab tabs when finished.
- If you encounter limitations, you can relax those limitations by purchasing more compute units via Pay As You Go. Anyone can purchase compute units via Pay As You Go; no subscription is required.
More Resources
Working with Notebooks in Colab
- Overview of Colab
- Guide to Markdown
- Importing libraries and installing dependencies
- Saving and loading notebooks in GitHub
- Interactive forms
- Interactive widgets
Working with Data
- Loading data: Drive, Sheets, and Google Cloud Storage
- Charts: visualizing data
- Getting started with BigQuery
Conclusion
- Look at nbviewer for examples of Jupyter notebooks being rendered using
nbconvert
- The output HTML from
nbconvert
looks better than my current implementation of Jupyter Notebooks to HTML, especially when it comes to denoting code and the output of code cells. - You will need to use Python to use
nbconvert
in a way that works well, so you should implement it with a Flask backend maybe.
Comments
There are currently no comments to show for this article.