Reading About Various Python Sandboxing Options

I am finally getting around to implementing sandboxed python. I am going to read about three options for sandboxing Python that I read about in the "Sandboxing Python and Linux Jailing" note.

1 8

Restricted Python

The idea behind Restricted Python

Python is a turing complete programming language. To offer a Python interface for users in web context is a potential security risk. Web frameworks and Content Management Systems (CMS) want to offer their users as much extensibility as possible through the web (TTW). This also means to have permissions to addition functionality via a Python script. There should be additional preventative measures taken to ensure integrity of the application and the server itself, according to information security best practice and unrelated to RestrictedPython.

RestrictedPython defines a safe subset of the Python programming language. Defining a secure subset of the language involves restricting the EBNF elements and explicitly allowing or disallowing language features. Much of the power of a programming language derives from its standard and contributed libraries, so any calling of these methods must also be checked and potentially restricted. RestrictedPython generally disallows calls to any library that is not explicit whitelisted.Any Python code that should be executed has to be explicitly checked before executing the generated byte code by the interpreter. Python itself offers three methods that provide such a workflow:

Restricted Python offers a replacement for the python builtin function compile(). This Python function is defined as:

compile(source, filename, mode [, flags [, dont_inherit]])

The definition of the compile() method has changed over time, but its relevant parameters source and mode still remain. There are three valid string values for mode: exec, eval, and single. For RestrictedPython this compile() method is replaced by:

RestrictedPython.compile_restricted(source, filename, mode [, flags [, dont_inherit]])

The primary parameter source has to be a string or ast.AST instance. Both methods return either compiled byte code that the interpreter can execute or raise exceptions if the provided source code is invalid. As compile and compile_restricted just compile the provided source code to byte code, it is not sufficient as a sandboxed environment, as calls to libraries are still available. Two methods / statements, exec / exec() and eval / eval(), have two parameters: globals and locals which are referenced to the Python builtins. By modifying and restricting the available modules, methods and constants from globals and locals we can limit possible calls. RestrictedPython offers a way to define a policy which allows developers to protect access to attributes. This works by defining a restricted version of: print, getattr, setattr, and import. Also, RestrictedPython provides three predefined, limited versions of Python's __builtins__: safe_builtins, limited_builtins (provides restricted sequence types), and utilities_builtins (provides access for standard modules math, random, string, and for sets).

Install Restricted Python

$ pip install RestrictedPython

Basic Usage

The general workflow to execute Python code that is loaded within a Python program is:

source_code = """
def do_something():
    pass
"""

byte_code = compile(source_code, filename='<inline code>', mode='exec')
exec(byte_code)
do_something()

With RestrictedPython, that workflow should be as straightforward as possible:

from RestrictedPython import compile_restricted

source_code = """
def do_something():
    pass
"""

byte_code = compile_restricted(
    source_code,
    filename='<inline code>',
    mode='exec'
)
exec(byte_code)
do_something()

Providing defined dictionaries for exec() should be used in the context of RestrictedPython. compile_restricted uses a predefined policy that checks and modify the source code and checks against a restricted subset of the Python language. The compiled source code is still executed against the full available set of library modules and methods. The Python exec() takes three parameters: code, which is the compiled byte code, globals, which is global dictionary, and locals, which is the local dictionary. By limiting the entires in the globals and locals dictionaries you restrict the access to the available library modules and methods. Typically there is a defined set of allowed modules, methods, and constants used in that context. RestrictedPython provides three predefined built-ins fro that: safe_builtins, limited_builtins, and utility_builtins. So you end up using:

from RestrictedPython import compile_restricted

from RestrictedPython import safe_builtins
from RestrictedPython import limited_builtins
from RestrictedPython import utility_builtins

source_code = """
def do_something():
    pass
"""

try:
    byte_code = compile_restricted(
        source_code,
        filename='<inline code>',
        mode='exec'
    )
    exec(byte_code, {'__builtins__': safe_builtins}, None)
except SyntaxError as e:
    pass

A common advanced usage would be to define an own restricted builtin dictionary. RestrictedPython requires some predefined names in globals in order to work properly.

To use classes in Python 3: __metaclass__ must be set. Set it to type to use no custom metaclass. __name__ must be set. As classes need a namespace to be defined in. It is the name of the module the class is defined in. You might set it to an arbitrary string.

To use for statements and comprehensions: _getiter_ must point to an iter implementation. As an unguarded variant you might use RestrictedPython.Eval.default_guarded_getiter(). _iter_unpack_sequence_ must point to RestrictedPython.Guards.guarded_iter_unpack_sequence().

To use getattr: You have to provide an implementation for it. RetrictedPython.Guards.safer_getattr() can be a starting point.

Usage in frameworks and Zope

One major issue using compile_restricted directly in a framework is, that you have to use try-except statements to handle problems and it might be a bit harder to provide useful information to the user. RestrictedPython provides four specialized compile_restricted methods: compile_restricted_exec, compile_restricted_eval, compile_restricted_single, and compile_restricted_function. These four methods return a named tuple (CompileResult) with four elements: code, &lt;code&gt; object or None if errors is not empty, errors, a tuple with error messages, warnings, a list with warnings, and used_names, a dictionary mapping collected used names to True. These details can be used to inform the user about the compiled source code. Modifying the builtins is straight forward, it is just a dictionary containing the available library elements. Modification normally means removing elements from existing builtins or adding allowed elements by copying from globals. For frameworks, it could possibly also be used to chnage the handling of specific Python language elements.

Policies and Builtins

RestricedPython provides a way to define policies, by redefining restricted versions of print, getattr, setattr, import, etc. As shortcuts it offers three stripped down versions of Python's __builtins__:

Predefined builtins

safe_globals is a shortcut for { '__builtins__': safe_ builtins } as this is the way globals have to be provided to the exec function to actually restrict the access to the builtins provided by Python

Guards

RestricedPython predefines several guarded access and manipulation methods:

Those and additional methods rely on a helper construct full_write_guard, which is intended to help implement immutable and semi mutable objects and attributes.

Implementing a Policy

RestrictedPython only provides the raw material for restricted execution. To actually enforce any restrictions, you need to supply a policy implementation by providing restricted versions of print, getattr, setattr, import, etc. These restricted implementations are hooked up by proving a set of specifically named objects in the global dict that you use for execution of code. Specifically:

  1. _print_ is a callable objects that returns a handler for print statements. This handler must have a write method that accepts a single string argument, and must return a string when called. RestrictedPython.PrintCollector.PrintCollector is a suitable implementation.
  2. _write_ is a guard function taking a single argument. If the object passed to it, it should be returned, otherwise the guard function should raise an exception. _write_ is typically called on an object before a setattr operation.
  3. _getattr_ and _getitem_ are guard functions, each of which takes two arguments. The first is the base object to be accessed, while the second is the attribute name or item index that will be read. The guard function should return the attribute or subitem, or raise an exception.
  4. __import__ is the normal Python import hook, and should be used to control access to Python packages and modules
  5. __builtins__ is the normal Python builtins dictionary, which should be weeded down to a set that cannot be used to get around your restrictions. A usable "safe" set is RestrictedPython.Guards.safe_builtins

PyPy

PyPy is a fast compliant alternative implementation of the Python language. PyPy is a replacement for CPython. It is built using the RPython language that was co-developed with it. The main reason to use it instead of CPython is speed: it runs generally faster. PyPy implements Python 2 and Python 3. It supports all of teh core languages. It supports most of the commonly used Python standard library modules. Main Features:

To run the sandboxed process, you need to get the full sources and build pypy-sandbox from it (see Building from source). These instructions give you a pypy-c that you should rename to pypy-sandbox to avoid future confusion. Then run:

$ cd pypy/sandbox
$ pypy_interact.py path/to/pypy-sandbox
$ # don't confuse it with pypy/goal/pyinteractive.py!

To get a fully sandboxed interpreter, in its own filesystem hierarchy (try os.listdir('/')). For example, you would run the untrusted script as:

$ mkdir virtualtmp
$ cp untrusted.py virtualtmp/
$ pypy_interact.py --tmp=virtualtmp pypy-sandbox /tmp/untrusted.py

Goals and Architecture Overview

PyPy aims to provide a compliant, flexible and fast implementation of the Python language which uses the RPython toolchain to enable new advanced high-level features without having to encode the low-level details. This Python implementation is written in RPython as a relatively simple interpreter, in some respects easier to understand than CPython, the C reference implementation of Python. PyPy uses its high level and flexibility to quickly experiment with features or implementation techniques in ways that would, in a traditional approach, require pervasive changes to the source code. PyPy's Python Interpreter is written in RPython and implements the full Python language. This interpreter very closely emulates the behavior of CPython. It contains the following key components:

The bytecode compiler is the preprocessing phase that produces a compact bytecode format via a chain of flexible passes (tokenizer, lexer, parser, abstract syntax tree builder, bytecode generator). The bytecode evaluator interprets this bytecode. It does most of its work by delegating all actual manipulations of user objects to the object space. The latter can be thought of as the library of built-in types. It defines the implementation of the user objects, like integers and lists, as well as the operations between them, like addition or truth-value-testing.

Layers

RPython

RPython is the language in which we write interpreters. Not the entire PyPy project is written in RPython, only the parts that are compiled in the translation process. The interesting poit is that RPython as no parser, it's compiled from the live python objects, which makes it possible to do all kinds of metaprogramming during import time. The RPython standard library can be found in the rlib subdirectory.

Translation

The translation toolchain - this is the part that takes care of translating RPython to flow graphs cna then to C. There is more in the architecture document written about it. It lives in teh rpython directory: flowspace, annotator, and rtyper.

PyPy Interpreter

This is in the pypy directory. pypy/interpreter is a standard interpreter for Python written in RPython. The fact that it is RPython is not apparent at first. Built-in modules are written in pypy/modules/*. Some modules that CPython implements in C are simply written in pure Python; they are in the top-level lib_pypy directory. The standard library of Python (with a few changes to accomodate PyPy) is in lib-python.

JIT Compiler

Just-in-Time Compiler (JIT): we have a tracing JIT that traces the interpreter written in RPython, rather than the user program that it interprets. As a result it applies to any interpreter, i.e. any language. But getting it to work correctly is not trivial: it requires a small number of precise "hints" and possibly some small refactorings of the interpreter. The JIT itself also has several almost independent parts: the tracer itself in rpython/jit/metainterp, the optimizer in rpython/jit/metainterp/optimizer that optimizes a list of residual operations, and the backend in rpython/jit/backend/&lt;machine-name&gt; that turns it into machine code.

Garbage Collectors

Garbage Collectors (GC): as you may notice if you are used to CPython's C code, there are no Py_INCREF/PyDECRF equivalents in the RPython code. Garbage Collection in RPython is inserted during tanslation.

Downloading and Installing PyPy

Just like CPython, you need a base interpreter environment and then can install extra packages. The choices for installing the base intepreter are:

There are instructions for installing additional modules and installing PyPy inside an env can be found on this page.

You cannot import any extension module in a sandboxed PyPy. Event the built-in modules available are very limited. Sandboxing in PyPy is a good proof of concept. It currently requires some work form a motivated developer. However, until then it can only be used for "pure Python" example: programs that import mostly nothing (or only pure Python modules, recursively).

Jupyter Notebook

The notebook extends the console-based approach to interactive computing in a qualitatively new direction, providing a web-based application suitable for capturing the whole computation process: developing, documenting, and executing code, as well as communicating the results. The Jupyter Notebook combines two components:

*Main Features of the Web Application

Notebook Documents

Notebook documents contain the inputs and outputs of an interactive session as well as additional text that accompanies the code but is not meant for execution. In this way, \notebook files can serve as a complete computational record of a session, interleaving executable code with explanatory text, mathematics, and rich representations of resulting objects. These documents are internally JSON files and are saved with the .ipynb extension. Since JSON is a plain text format, they can be version-controlled and shared with colleagues. Notebooks may be exported to a range of static formats, including HTML via the nbconvert command. Any .ipynb notebook document available form a public URL can be shared via the Jupyter Notebook Viewer (nbviewer). This service loads the notebook document from the URL and renders it as a static web page. In effect, nbviewer is simply nbconvert as a web service.

You can start running a notebook server from teh command line using the following command:

$ jupyter notebook 

This will print some information about the notebook server in your console, and open a web browser to the URL of the web application (by default, http://127.0.1:8888). The landing page of the Jupyter notebook web application, the dashboard, shows the notebooks currently available in the notebook directory. When starting a notebook server from the command line, you can also open a particular notebook directly, bypassing the dashboard, with jupyter notebook my_notebook.ipynb.

An open notebook has exactly one interactive session connected to an IPython kernel, which will execute code by the user and communicate back results. This kernel remains active if the web browser window is closed, and reopening the same notebook from the dashboard will reconnect the web application to the same kernel. In the dashboard, notebooks with an active kernel have a Shutdown button next to them, whereas notebooks without an active kernel will have a Delete button in its place.

A code cell allows you to edit and write new code, with full syntax highlighting and tab completion. By default, the language associated to a code cell is Python, but other languages, such as Julia and R, can be handled using cell magic commands. When a code cell is executed, code that it contains is sent to the kernel associated with the notebook. The results that are returned from this computation are then displayed in the notebook at the cell's output. The output is not limited to text, with many other possible forms of output are also possible, including matplotlib figures and HTML tables (pandas). This ks known as IPython's rich display capability.

Raw cells provide a place in which you can write output directly. Raw cells are not evaluated by the notebook.

The normal workflow in a notebook is quite similar to a standard IPython session, with the difference that you can edit cells in-place multiple times until you obtain the desired results, rather than having to separate scripts with the %run magic command.

Running a Jupyter Notebook Server

The Jupyter notebook web application is based on a server-client structure. The notebook server uses a two-process kernel architecture based on ZeroMQ, as well as Tornado for serving HTTP requests.This document describes how you can secure a notebook server and how to run it on a public interface. This is not the multi-user server you are looking for. This should only be used by someone who wants access to their personal machine. You can protect your notebook server with a simple single password by configuring the NotebookApp.password setting in jupyter_notebook_config.py. You can create a Jupyter notebook config file with the following command:

$ jupyter notebook --generate-config

You can prepare a hashed password using the function notebook.auth.security.passwd(), and you can add that password to your jupyter_notebook_config.py. The default location for this file jupyter_notebook_config.py is in your Jupyter folder in your home directory, ~/.jupyter.

When using a password, it is a good idea to also use SSL with web certificate so that your hashed password is not sent unencrypted by your browser.

Security in Jupyter Notebook server

Since access to the Jupyter notebook server means access to running arbitrary code, it is important to restrict access to the notebook server. For this reason, notebook 4.3 introduces token-based authentication that is on by default. When token authentication is enabled, the notebook uses a token to authenticate requests. As Jupyter notebooks become more popular for sharing and collaboration, the potential for malicious people to attempt to exploit the notebook for their nefarious purposes increases. IPython 2.0 introduced a security model to prevent execution of untrusted code without explicit user input. Thw whole point of Jupyter is arbitrary code execution. We have no desire to limit what can be done with a notebook, which would negatively impact its utility. Unlike other programs, a Jupyter notebook document includes output. Unlike other documents, that output exists in a context that can execute code (via JavaScript). The security problem we need to solve is that node code should ever execute just because a user has opened a notebook that they did not write. Like any other program, once a user decides to execute code in a notebook, it is considered trusted, and should be allowed to do anything.

"Our" Security Model

When a notebook is executed, a signature is computed from a digest of the notebook's contents plus a secret key. This is stored in a database, writeable only by the current user. Byy default, this is located at:

~/.local/share/jupyter/nbsignatures.db  # Linux
~/Library/Jupyter/nbsignatures.db       # OS X
%APPDATA%/jupyter/nbsignatures.db       # Windows

Each signature represents a series of outputs which were produced by the code the current user executed, and are therefore trusted. When you open a notebook, the server computes its signature, and checks if it's in the database. If a match is found, HTML and JavaScript output in the notebook will be trusted at load, otherwise it will be untrusted. Any output generated during an interactive session is trusted. A notebook's trust us updated when the notebook is saved. Users can explicitly trust a notebook with the trust option: jupyter trust /path/to/notebook.ipynb.

Styling the notebook can only be done via either custom.css or CSS in HTML output. The latter only have an effect if the notebook is trusted, otherwise the output will be sanitized just like Markdown.

Configuring the Notebook Frontend

This document is a rough explanation on how you can persist some configuration options for the notebook JavaScript. The frontend configuration system works as follows:

The example below shows how to change the default setting indentUnit for CodeMirror Code Cells:

var cell = Jupyter.notebook.get_selected_cell();
var config = cell.config;
var patch = {
      CodeCell:{
        cm_config:{indentUnit:2}
      }
    }
config.update(patch)

You can enter the previous snippet in your browser’s JavaScript console once. Then reload the notebook page in your browser. Now, the preferred indent unit should be equal to two spaces. The custom setting persists and you do not need to reissue the patch on new notebooks.

Under the hood, Jupyter will persist the preferred configuration settings in ~/.jupyter/nbconfig/&lt;section&gt;.json, with &lt;section&gt; taking various value depending on the page where the configuration is issued. &lt;section&gt; can take various values like notebook, tree, and editor. A common section contains configuration settings shared by all pages.

Extending the Notebook

Certain subsystems of the notebook server are designed to be extended or overridden by users. These documents explain these systems, and how to override the notebook's defaults with custom behavior:

Contents API

The Jupyter Notebook web application provides a graphical interface for creating, opening, renaming, and deleting files in a virtual filesystem. The ContextManager class defines an abstract API for translating these interactions on a particular storage medium. The default implementation, FileContentsManager, uses the local filesystem of the server for storage and straightforwardly serializes notebooks into JSON. Users can override these behaviors by supplying custom subclasses of ContentsManager. This section describes the interface implemented by ContentsManager subclasses. We refer to this interface as the Contents API.

ContentsManager methods represent virtual filesystem entities as dictionaries, which we refer to as models. ContentsManager methods represent the locations of the filesystem resources as API-style paths. Such paths are interpreted as relative to the root directory of the notebook server. The default ContentsManager is designed for users running the notebook as an application on a personal computer.

File Save Hooks

You can configure functions that are run wherever a file is saved. There are two hooks available:

Custom Request Handlers

The notebook webserver can be interacted with using a well defined RESTful API. You can define a custom RESTful API handlers in addition to the ones provided by the notebook. The notebook webserver is written in Python, hence your server extension should be written in Python too.

Comments

You have to be logged in to add a comment

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Insert Chart

ESC

Use the search box below

Upload Previous Version of Article State

ESC