Jupyter Notebooks

I want to keep track of some things that I have learned about Jupyter notebooks here. I also want to take notes on Jupyter Notebooks and nbconvert in general to maybe improve my Jupyter Notebook to HTML converter.

Date Created:
Last Edited:
2 448

References


  • Project Jupyter Home
  • Kernels
    • IPython
      • Interactive computing in Python.
    • ipykernel
      • The wrapper around IPython which enables using IPython as a kernel
    • Xeus
      • Library facilitating the implementation of kernels for Jupyter. It implements the Jupyter Kernel protocol so developers can focus on implementing the interpreter part of the kernel.
  • Education
    • Jupyter Notebooks offer exciting and creative possibilities in education. The following subprojects are focused on supporting the use of Jupyter Notebook in a variety of educational settings.
    • nbgrader
      • Tools for managing, grading, and reporting of notebook based assignments
    • jupyter4edu
      • GitHub organization hosting community resources for Jupyter in education.
  • Execution
    • Notebooks can be run outside of the browser interface with the following utility subprojects
    • nbclient
      • NBClient lets you execute notebooks from different contexts, including the command-line.
  • Deployment and Infrastructure
    • To serve a variety of users and use cases, these subprojects are being developed to support notebook deployment in various contexts, including multiuser capabilities and secure, scalable cloud deployments.
    • jupyterhub
      • Multi-user notebook for organizations with pluggable authentication and scalability.
    • nbviewer
      • Share notebooks as static HTML on the web.
    • Binder
      • Turn a Git repo into a collection of interactive notebooks
    • dockerspawner
      • Deploy notebooks for 'jupyterhub' inside Docker containers
    • docker-stacks
      • Stacks of Jupyter applications and kernels as Docker containers
  • Formatting and Conversion
    • nbconvert
      • Convert dynamic notebooks to static formats such as HTML, Markdown, LaTeX/PDF, and reStructuredText
    • nbformat
      • Work with notebook documents programmatically
  • Core Building Blocks
  • Project Documentation


Definitions


  • Jupyter Notebook is a simplified notebook authoring application and is part of Project Jupyter, a large umbrella project centered around the goal of providing tools (and standards) for interactive computing with computational notebooks.
  • Kernels are programming language specific processes that run independently and interact with the Jupyter Applications and their user interfaces.  ipykernel is the reference Jupyter kernel built on top of IPython, providing a powerful environment for interactive computing in Python.


Jupyter Notebook Notes


A notebook is a shareable document that combines computer code, plain language descriptions, data, rich visualizations like 3D models, charts, graphs and figures, and interactive controls. A notebook, along with an editor (like JupyterLab), provides a fast interactive environment for prototyping and explaining code, exploring and visualizing data, and sharing ideas with others.


IPython


IPython provides a rich architecture for interactive computing with:

  • A powerful interactive shell
  • A kernel for Jupyter
  • Support for interactive data visualization and use of GUI toolkits
  • Flexible, embeddable interpreters to load into your own projects
  • Easy to use, high performance tools for parallel computing


Architecture


The Jupyter Notebook Format

Jupyter Notebooks are structured data that represent your code, metadata, content, and outputs. When saved to disk, the notebook uses the extension .ipynb, and uses a JSON structure.

The Jupyter Notebook Interface

Jupyter Notebook and its flexible interface extends the notebook beyond code to visualization, multimedia, collaboration, and more. In addition to running your code, it stores code and output, together with markdown notes, in an editable document called a notebook. When you save it, this is sent from your browser to the Jupyter server, which saves it on disk as a JSON file with a .ipynb extension.

Exporting Jupyter Notebooks to Other Formats

The nbconvert tool in Jupyter converts notebook files to other formats, such as HTML, LaTeX, or reStructuredText. This conversion goes through a series of steps:

Exporting Jupyter Notebooks to Other Formats

  1. Preprocessors modify the notebook in memory
  2. An exporter converts the notebook to another file format. Most of the exporters use templates for this.
  3. Postprocessors work on the file produced by exporting.

The nbviewer website uses nbconvert with the HTML exporter. When you give it a URL, it fetches the notebook from that URL, converts it to HTML, and serves that HTML to you.


Nbconvert


I want to take notes on nbconvert because I may want to use it to improve my presentation of output cells in my Jupyter Notebook to HTML converter. This Jupyter Notebook, showing an IPython Notebook to analyze a global crisis, demonstrates some ways in which nbconvert might best my current .ipynb to HTML converter (see how the provided example displays a pandas' DataFrame as a table instead of as text).

Installation

$ pip install nbconvert # nbconvert is packaged for both pip and conda
$ sudo apt-get install pandoc # nbconvert uses Pandoc to convert markdown to formats OTHER THAN HTML
$ # For converting notebooks to PDF, nbconvert makes use of LaTeX and XeTeX as the rendering engine
$ sudo apt-get install texlive-xetex texlive-fonts-recommended texlive-plain-generic

Conversion

The command-line syntax to run the nbconvert script can be seen below. It converts the Jupyter notebook file notebook.ipynb into the output format given by the FORMAT string.

$ jupyter nbconvert --to FORMAT notebook.ipynb

Supported Output Formats

  • HTML
  • LaTeX
  • PDF
  • WebPDF
  • Reveal.js HTML Slideshow
  • Markdown
  • Ascii
  • reStructuredText
  • executable script
  • notebook

HTML

  • --to html
    • HTML Export.
    • --template lab (default)
      • A full static HTML render of the notebook. This looks very similar to the JupyterLab interactive view. It supports the extra --theme option, which defaults to light, but can be dark or some other custom theme.
    • --template classic
      • Simplified HTML, using the classic jupyter look and feel
    • --template basic
      • Base HTML, rendering with minimal structure and styles
    • --embed-images
      • If this option is provided, embed images as base64 urls in the resulting HTML file.

Using nbconvert as a library

High Level Overview of converting a notebook to another format:

  1. Retrieve the notebook and it's accompanying resources
  2. Feed the notebook into the Exporter, which:
    1. Sequentially feeds the notebook into an array of Preprocessors. Preprocessors only act on the structure of the notebook, and have unrestricted access to it.
    2. Feeds the notebook into the Jinja templating engine, which converts it to a particular format depending on which template is selected.
  3. The exporter returns the converted notebook and other relevant resources as a tuple.
  4. You write the data to the disk using the built-in FilesWriter (which writes the notebook and any extracted files to disk), or elsewhere using a custom Writer.


Common Tasks


Merging Jupyter Notebooks


>> pip install nbmerge
>> pip install nbformat
>> nbmerge file1.ipynb file 2.ipynb > merged.ipynb

Downloading Kaggle Datasets in Google Colab


  • The code snippets below should be executed in different cells in a Jupyter Notebook / Colab.
  • Note: If you get a 403 (Forbidden) error when downloading the data, it just means that you need to accept the terms associated with the dataset before you download it.
# 1. Download the Kaggle key from Kaggle website (settings)
from google.colab import files files.upload()


# Execute the following in Google Colab
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod kaggle.json ~/.kaggle/kaggle.json
# Download the data of the dataset, in this case, dogs-vs-cats
!kaggle competitions download -c dogs-vs-cats


# Uncompress (unzip) the training data, which comes in a zip folder, silently (-qq)
!unzip -qq train.zip


About Google Colab


Faster GPUs


  • Users who have purchased one of Colab's paid plans have access to faster GPUs and more memory. You can upgrade your notebook's GPU settings in Runtime > Change runtime type in the menu to select from several accelerator options, subject to availability.
  • The free of charge version of Colab grants access to Nvidia's T4 GPUs subject to quota restrictions and availability.
  • You can see what GPU you've been assigned at any time by executing the following cell. If the execution result of running the code cell below is "Not connected to a GPU", you can change the runtime by going to Runtime > Change runtime type in the menu to enable a GPU accelerator, and then re-execute the code cell.
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)
  • In order to use a GPU with your notebook, select the `Runtime > Change runtime type` menu, and then set the hardware accelerator to the desired option.

More Memory


  • Users who have purchased one of Colab's paid plans have access to high-memory VMs when they are available. More powerful GPUs are always offered with high-memory VMs.
  • You can see how much memory you have available at any time by running the following code cell. If the execution result of running the code cell below is "Not using a high-RAM runtime", then you can enable a high-RAM runtime via Runtime > Change runtime type in the menu. Then select High-RAM in the Runtime shape toggle button. After, re-execute the code cell.
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

Longer runtimes


  • All Colab runtimes are reset after some period of time (which is faster if the runtime isn't executing code). Colab Pro and Pro+ users have access to longer runtimes than those who use Colab free of charge.

Background Execution


  • Colab Pro+ users have access to background execution, where notebooks will continue executing even after you've closed a browser tab. This is always enabled in Pro+ runtimes as long as you have compute units available.

Relaxing Resource Limits in Colab Pro


  • Your resources are not unlimited in Colab. To make the most of Colab, avoid using resources when you don't need them. For example, only use a GPU when required and close Colab tabs when finished.
  • If you encounter limitations, you can relax those limitations by purchasing more compute units via Pay As You Go. Anyone can purchase compute units via Pay As You Go; no subscription is required.

More Resources


Working with Notebooks in Colab


Working with Data


Conclusion


  • Look at nbviewer for examples of Jupyter notebooks being rendered using nbconvert
  • The output HTML from nbconvert looks better than my current implementation of Jupyter Notebooks to HTML, especially when it comes to denoting code and the output of code cells.
  • You will need to use Python to use nbconvert in a way that works well, so you should implement it with a Flask backend maybe.

Comments

You have to be logged in to add a comment

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Insert Chart

ESC

Use the search box below

Upload Previous Version of Article State

ESC