Reading About Various Python Sandboxing Options
I am finally getting around to implementing sandboxed python. I am going to read about three options for sandboxing Python that I read about in the "Sandboxing Python and Linux Jailing" note.
Restricted Python
The idea behind Restricted Python
Python is a turing complete programming language. To offer a Python interface for users in web context is a potential security risk. Web frameworks and Content Management Systems (CMS) want to offer their users as much extensibility as possible through the web (TTW). This also means to have permissions to addition functionality via a Python script. There should be additional preventative measures taken to ensure integrity of the application and the server itself, according to information security best practice and unrelated to RestrictedPython.
RestrictedPython defines a safe subset of the Python programming language. Defining a secure subset of the language involves restricting the EBNF elements and explicitly allowing or disallowing language features. Much of the power of a programming language derives from its standard and contributed libraries, so any calling of these methods must also be checked and potentially restricted. RestrictedPython generally disallows calls to any library that is not explicit whitelisted.Any Python code that should be executed has to be explicitly checked before executing the generated byte code by the interpreter. Python itself offers three methods that provide such a workflow:
- compile() which compiles the source code to byte code
- exec / exec() which executes the byte code in the interpreter
- eval / eval() which executes a byte code expression
Restricted Python offers a replacement for the python builtin function compile(). This Python function is defined as:
compile(source, filename, mode [, flags [, dont_inherit]])
The definition of the compile() method has changed over time, but its relevant parameters source and mode still remain. There are three valid string values for mode: exec, eval, and single. For RestrictedPython this compile() method is replaced by:
RestrictedPython.compile_restricted(source, filename, mode [, flags [, dont_inherit]])
The primary parameter source has to be a string or ast.AST instance. Both methods return either compiled byte code that the interpreter can execute or raise exceptions if the provided source code is invalid. As compile and compile_restricted just compile the provided source code to byte code, it is not sufficient as a sandboxed environment, as calls to libraries are still available. Two methods / statements, exec / exec() and eval / eval(), have two parameters: globals and locals which are referenced to the Python builtins. By modifying and restricting the available modules, methods and constants from globals and locals we can limit possible calls. RestrictedPython offers a way to define a policy which allows developers to protect access to attributes. This works by defining a restricted version of: print, getattr, setattr, and import. Also, RestrictedPython provides three predefined, limited versions of Python's __builtins__: safe_builtins, limited_builtins (provides restricted sequence types), and utilities_builtins (provides access for standard modules math, random, string, and for sets).
Install Restricted Python
$ pip install RestrictedPython
Basic Usage
The general workflow to execute Python code that is loaded within a Python program is:
source_code = """
def do_something():
pass
"""
byte_code = compile(source_code, filename='<inline code>', mode='exec')
exec(byte_code)
do_something()
With RestrictedPython, that workflow should be as straightforward as possible:
from RestrictedPython import compile_restricted
source_code = """
def do_something():
pass
"""
byte_code = compile_restricted(
source_code,
filename='<inline code>',
mode='exec'
)
exec(byte_code)
do_something()
Providing defined dictionaries for exec() should be used in the context of RestrictedPython. compile_restricted uses a predefined policy that checks and modify the source code and checks against a restricted subset of the Python language. The compiled source code is still executed against the full available set of library modules and methods. The Python exec() takes three parameters: code, which is the compiled byte code, globals, which is global dictionary, and locals, which is the local dictionary. By limiting the entires in the globals and locals dictionaries you restrict the access to the available library modules and methods. Typically there is a defined set of allowed modules, methods, and constants used in that context. RestrictedPython provides three predefined built-ins fro that: safe_builtins, limited_builtins, and utility_builtins. So you end up using:
from RestrictedPython import compile_restricted
from RestrictedPython import safe_builtins
from RestrictedPython import limited_builtins
from RestrictedPython import utility_builtins
source_code = """
def do_something():
pass
"""
try:
byte_code = compile_restricted(
source_code,
filename='<inline code>',
mode='exec'
)
exec(byte_code, {'__builtins__': safe_builtins}, None)
except SyntaxError as e:
pass
A common advanced usage would be to define an own restricted builtin dictionary. RestrictedPython requires some predefined names in globals in order to work properly.
To use classes in Python 3: __metaclass__ must be set. Set it to type to use no custom metaclass. __name__ must be set. As classes need a namespace to be defined in. It is the name of the module the class is defined in. You might set it to an arbitrary string.
To use for statements and comprehensions: _getiter_ must point to an iter implementation. As an unguarded variant you might use RestrictedPython.Eval.default_guarded_getiter(). _iter_unpack_sequence_ must point to RestrictedPython.Guards.guarded_iter_unpack_sequence().
To use getattr: You have to provide an implementation for it. RetrictedPython.Guards.safer_getattr() can be a starting point.
Usage in frameworks and Zope
One major issue using compile_restricted directly in a framework is, that you have to use try-except statements to handle problems and it might be a bit harder to provide useful information to the user. RestrictedPython provides four specialized compile_restricted methods: compile_restricted_exec, compile_restricted_eval, compile_restricted_single, and compile_restricted_function. These four methods return a named tuple (CompileResult) with four elements: code, <code> object or None if errors is not empty, errors, a tuple with error messages, warnings, a list with warnings, and used_names, a dictionary mapping collected used names to True. These details can be used to inform the user about the compiled source code. Modifying the builtins is straight forward, it is just a dictionary containing the available library elements. Modification normally means removing elements from existing builtins or adding allowed elements by copying from globals. For frameworks, it could possibly also be used to chnage the handling of specific Python language elements.
Policies and Builtins
RestricedPython provides a way to define policies, by redefining restricted versions of print, getattr, setattr, import, etc. As shortcuts it offers three stripped down versions of Python's __builtins__:
Predefined builtins
- safe_builtins
- a safe set of builtin modules and functions
- limited_builtins
- restricted sequence types (range, list, and tuple)
- utility_builtins
- access to standard modules like math, random, string and set
safe_globals is a shortcut for { '__builtins__': safe_ builtins } as this is the way globals have to be provided to the exec function to actually restrict the access to the builtins provided by Python
Guards
RestricedPython predefines several guarded access and manipulation methods:
- safer_geattr
- guarded_setattr
- guarded_delattr
- guarded_iter_unpack_sequence
- guarded_unpack_sequence
Those and additional methods rely on a helper construct full_write_guard, which is intended to help implement immutable and semi mutable objects and attributes.
Implementing a Policy
RestrictedPython only provides the raw material for restricted execution. To actually enforce any restrictions, you need to supply a policy implementation by providing restricted versions of print, getattr, setattr, import, etc. These restricted implementations are hooked up by proving a set of specifically named objects in the global dict that you use for execution of code. Specifically:
- _print_ is a callable objects that returns a handler for print statements. This handler must have a write method that accepts a single string argument, and must return a string when called. RestrictedPython.PrintCollector.PrintCollector is a suitable implementation.
- _write_ is a guard function taking a single argument. If the object passed to it, it should be returned, otherwise the guard function should raise an exception. _write_ is typically called on an object before a setattr operation.
- _getattr_ and _getitem_ are guard functions, each of which takes two arguments. The first is the base object to be accessed, while the second is the attribute name or item index that will be read. The guard function should return the attribute or subitem, or raise an exception.
- __import__ is the normal Python import hook, and should be used to control access to Python packages and modules
- __builtins__ is the normal Python builtins dictionary, which should be weeded down to a set that cannot be used to get around your restrictions. A usable "safe" set is RestrictedPython.Guards.safe_builtins
PyPy
PyPy is a fast compliant alternative implementation of the Python language. PyPy is a replacement for CPython. It is built using the RPython language that was co-developed with it. The main reason to use it instead of CPython is speed: it runs generally faster. PyPy implements Python 2 and Python 3. It supports all of teh core languages. It supports most of the commonly used Python standard library modules. Main Features:
- Speed: The main executable comes with a Just-In-Time compiler. It is very fast running most benchmarks - including very large and complicated Python applications, not just 10-liners. The case where PyPy works best is when executing long-running programs where a significant fraction of the time is spent executing Python code.
- Memory Usage: Memory-hungry Python programs (several hundreds of MBs or more) might end up taking less space than they do in CPython.
- Stackless: Support for Stackless and greenlets are now integrated in the normal PyPy.
- Other Features: PyPy implemented other languages that make use of the RPython toolchain: Prolog, Smalltalk, JavaScript, Io, Scheme and Gameboy
- Sandboxing: PyPy's sandboxing is a working prototype for the idea of running untrusted user programs. Unlike other sandboxing approaches for Python, PyPy's does not try to limit language features considered "unsafe". Instead, PyPy replaces all calls to external libraries (C or platform) with a stub that communicates with an external process handling the policy.
To run the sandboxed process, you need to get the full sources and build pypy-sandbox from it (see Building from source). These instructions give you a pypy-c that you should rename to pypy-sandbox to avoid future confusion. Then run:
$ cd pypy/sandbox
$ pypy_interact.py path/to/pypy-sandbox
$ # don't confuse it with pypy/goal/pyinteractive.py!
To get a fully sandboxed interpreter, in its own filesystem hierarchy (try os.listdir('/')). For example, you would run the untrusted script as:
$ mkdir virtualtmp
$ cp untrusted.py virtualtmp/
$ pypy_interact.py --tmp=virtualtmp pypy-sandbox /tmp/untrusted.py
Goals and Architecture Overview
PyPy aims to provide a compliant, flexible and fast implementation of the Python language which uses the RPython toolchain to enable new advanced high-level features without having to encode the low-level details. This Python implementation is written in RPython as a relatively simple interpreter, in some respects easier to understand than CPython, the C reference implementation of Python. PyPy uses its high level and flexibility to quickly experiment with features or implementation techniques in ways that would, in a traditional approach, require pervasive changes to the source code. PyPy's Python Interpreter is written in RPython and implements the full Python language. This interpreter very closely emulates the behavior of CPython. It contains the following key components:
- a bytecode compiler responsible for producing Python code objects from the source code of a user application
- a bytecode evaluator responsible for interpreting Python code objects
- a standard object space, responsible for creating and manipulating the Python objects seen by the application
The bytecode compiler is the preprocessing phase that produces a compact bytecode format via a chain of flexible passes (tokenizer, lexer, parser, abstract syntax tree builder, bytecode generator). The bytecode evaluator interprets this bytecode. It does most of its work by delegating all actual manipulations of user objects to the object space. The latter can be thought of as the library of built-in types. It defines the implementation of the user objects, like integers and lists, as well as the operations between them, like addition or truth-value-testing.
Layers
RPython
RPython is the language in which we write interpreters. Not the entire PyPy project is written in RPython, only the parts that are compiled in the translation process. The interesting poit is that RPython as no parser, it's compiled from the live python objects, which makes it possible to do all kinds of metaprogramming during import time. The RPython standard library can be found in the rlib subdirectory.
Translation
The translation toolchain - this is the part that takes care of translating RPython to flow graphs cna then to C. There is more in the architecture document written about it. It lives in teh rpython directory: flowspace, annotator, and rtyper.
PyPy Interpreter
This is in the pypy directory. pypy/interpreter is a standard interpreter for Python written in RPython. The fact that it is RPython is not apparent at first. Built-in modules are written in pypy/modules/*. Some modules that CPython implements in C are simply written in pure Python; they are in the top-level lib_pypy directory. The standard library of Python (with a few changes to accomodate PyPy) is in lib-python.
JIT Compiler
Just-in-Time Compiler (JIT): we have a tracing JIT that traces the interpreter written in RPython, rather than the user program that it interprets. As a result it applies to any interpreter, i.e. any language. But getting it to work correctly is not trivial: it requires a small number of precise "hints" and possibly some small refactorings of the interpreter. The JIT itself also has several almost independent parts: the tracer itself in rpython/jit/metainterp, the optimizer in rpython/jit/metainterp/optimizer that optimizes a list of residual operations, and the backend in rpython/jit/backend/<machine-name> that turns it into machine code.
Garbage Collectors
Garbage Collectors (GC): as you may notice if you are used to CPython's C code, there are no Py_INCREF/PyDECRF equivalents in the RPython code. Garbage Collection in RPython is inserted during tanslation.
Downloading and Installing PyPy
Just like CPython, you need a base interpreter environment and then can install extra packages. The choices for installing the base intepreter are:
- Use conda
- Use your distribution package manager
- Use hmoebrew
- Use the builtin tarballs
- Build from source
There are instructions for installing additional modules and installing PyPy inside an env can be found on this page.
You cannot import any extension module in a sandboxed PyPy. Event the built-in modules available are very limited. Sandboxing in PyPy is a good proof of concept. It currently requires some work form a motivated developer. However, until then it can only be used for "pure Python" example: programs that import mostly nothing (or only pure Python modules, recursively).
Jupyter Notebook
- Jupyter Notebook Read the Docs
- Running a Notebook Server Read the Docs
- IPython Rich Display Capability
The notebook extends the console-based approach to interactive computing in a qualitatively new direction, providing a web-based application suitable for capturing the whole computation process: developing, documenting, and executing code, as well as communicating the results. The Jupyter Notebook combines two components:
- A web application: a browser-based tool for interactive authoring of documents which combine explanatory text, mathematics, computations and their rich media output
- Notebook documents: a representation of all content visible in the web application, including inputs and outputs of the computations, explanatory text, mathematics, images,a nd rich media representations of objects
*Main Features of the Web Application
- In-browser editing for code, with automatic syntax highlighting, indentation, and tab completion/introspection
- The ability to execute code from the browser, with the results of computations attached to the code which generated them
- Displaying the result of computation using rich media representations
- In-browser editing for rich text using the Markdown markup language
- The ability to include mathematical notation within markdown cells using LaTeX
Notebook Documents
Notebook documents contain the inputs and outputs of an interactive session as well as additional text that accompanies the code but is not meant for execution. In this way, \notebook files can serve as a complete computational record of a session, interleaving executable code with explanatory text, mathematics, and rich representations of resulting objects. These documents are internally JSON files and are saved with the .ipynb extension. Since JSON is a plain text format, they can be version-controlled and shared with colleagues. Notebooks may be exported to a range of static formats, including HTML via the nbconvert command. Any .ipynb notebook document available form a public URL can be shared via the Jupyter Notebook Viewer (nbviewer). This service loads the notebook document from the URL and renders it as a static web page. In effect, nbviewer is simply nbconvert as a web service.
You can start running a notebook server from teh command line using the following command:
$ jupyter notebook
This will print some information about the notebook server in your console, and open a web browser to the URL of the web application (by default, http://127.0.1:8888). The landing page of the Jupyter notebook web application, the dashboard, shows the notebooks currently available in the notebook directory. When starting a notebook server from the command line, you can also open a particular notebook directly, bypassing the dashboard, with jupyter notebook my_notebook.ipynb.
An open notebook has exactly one interactive session connected to an IPython kernel, which will execute code by the user and communicate back results. This kernel remains active if the web browser window is closed, and reopening the same notebook from the dashboard will reconnect the web application to the same kernel. In the dashboard, notebooks with an active kernel have a Shutdown button next to them, whereas notebooks without an active kernel will have a Delete button in its place.
A code cell allows you to edit and write new code, with full syntax highlighting and tab completion. By default, the language associated to a code cell is Python, but other languages, such as Julia and R, can be handled using cell magic commands. When a code cell is executed, code that it contains is sent to the kernel associated with the notebook. The results that are returned from this computation are then displayed in the notebook at the cell's output. The output is not limited to text, with many other possible forms of output are also possible, including matplotlib figures and HTML tables (pandas). This ks known as IPython's rich display capability.
Raw cells provide a place in which you can write output directly. Raw cells are not evaluated by the notebook.
The normal workflow in a notebook is quite similar to a standard IPython session, with the difference that you can edit cells in-place multiple times until you obtain the desired results, rather than having to separate scripts with the %run magic command.
Running a Jupyter Notebook Server
The Jupyter notebook web application is based on a server-client structure. The notebook server uses a two-process kernel architecture based on ZeroMQ, as well as Tornado for serving HTTP requests.This document describes how you can secure a notebook server and how to run it on a public interface. This is not the multi-user server you are looking for. This should only be used by someone who wants access to their personal machine. You can protect your notebook server with a simple single password by configuring the NotebookApp.password setting in jupyter_notebook_config.py. You can create a Jupyter notebook config file with the following command:
$ jupyter notebook --generate-config
You can prepare a hashed password using the function notebook.auth.security.passwd(), and you can add that password to your jupyter_notebook_config.py. The default location for this file jupyter_notebook_config.py is in your Jupyter folder in your home directory, ~/.jupyter.
When using a password, it is a good idea to also use SSL with web certificate so that your hashed password is not sent unencrypted by your browser.
Security in Jupyter Notebook server
Since access to the Jupyter notebook server means access to running arbitrary code, it is important to restrict access to the notebook server. For this reason, notebook 4.3 introduces token-based authentication that is on by default. When token authentication is enabled, the notebook uses a token to authenticate requests. As Jupyter notebooks become more popular for sharing and collaboration, the potential for malicious people to attempt to exploit the notebook for their nefarious purposes increases. IPython 2.0 introduced a security model to prevent execution of untrusted code without explicit user input. Thw whole point of Jupyter is arbitrary code execution. We have no desire to limit what can be done with a notebook, which would negatively impact its utility. Unlike other programs, a Jupyter notebook document includes output. Unlike other documents, that output exists in a context that can execute code (via JavaScript). The security problem we need to solve is that node code should ever execute just because a user has opened a notebook that they did not write. Like any other program, once a user decides to execute code in a notebook, it is considered trusted, and should be allowed to do anything.
"Our" Security Model
- Untrusted HTML is always sanitized
- Untrusted JavaScript is never executed
- HTML and JavaScript in Markdown cells are never trusted
- Outputs generated by the user are trusted
- Any other HTML or JavaScript is never trusted
- The central question of trust is "Did the current user do this?"
When a notebook is executed, a signature is computed from a digest of the notebook's contents plus a secret key. This is stored in a database, writeable only by the current user. Byy default, this is located at:
~/.local/share/jupyter/nbsignatures.db # Linux
~/Library/Jupyter/nbsignatures.db # OS X
%APPDATA%/jupyter/nbsignatures.db # Windows
Each signature represents a series of outputs which were produced by the code the current user executed, and are therefore trusted. When you open a notebook, the server computes its signature, and checks if it's in the database. If a match is found, HTML and JavaScript output in the notebook will be trusted at load, otherwise it will be untrusted. Any output generated during an interactive session is trusted. A notebook's trust us updated when the notebook is saved. Users can explicitly trust a notebook with the trust option: jupyter trust /path/to/notebook.ipynb.
Styling the notebook can only be done via either custom.css or CSS in HTML output. The latter only have an effect if the notebook is trusted, otherwise the output will be sanitized just like Markdown.
Configuring the Notebook Frontend
This document is a rough explanation on how you can persist some configuration options for the notebook JavaScript. The frontend configuration system works as follows:
- get a handle of a configurable JavaScript object
- access its configuration attribute
- update its configuration attribute with a JSON patch
The example below shows how to change the default setting indentUnit for CodeMirror Code Cells:
var cell = Jupyter.notebook.get_selected_cell();
var config = cell.config;
var patch = {
CodeCell:{
cm_config:{indentUnit:2}
}
}
config.update(patch)
You can enter the previous snippet in your browser’s JavaScript console once. Then reload the notebook page in your browser. Now, the preferred indent unit should be equal to two spaces. The custom setting persists and you do not need to reissue the patch on new notebooks.
Under the hood, Jupyter will persist the preferred configuration settings in ~/.jupyter/nbconfig/<section>.json, with <section> taking various value depending on the page where the configuration is issued. <section> can take various values like notebook, tree, and editor. A common section contains configuration settings shared by all pages.
Extending the Notebook
Certain subsystems of the notebook server are designed to be extended or overridden by users. These documents explain these systems, and how to override the notebook's defaults with custom behavior:
Contents API
The Jupyter Notebook web application provides a graphical interface for creating, opening, renaming, and deleting files in a virtual filesystem. The ContextManager class defines an abstract API for translating these interactions on a particular storage medium. The default implementation, FileContentsManager, uses the local filesystem of the server for storage and straightforwardly serializes notebooks into JSON. Users can override these behaviors by supplying custom subclasses of ContentsManager. This section describes the interface implemented by ContentsManager subclasses. We refer to this interface as the Contents API.
ContentsManager methods represent virtual filesystem entities as dictionaries, which we refer to as models. ContentsManager methods represent the locations of the filesystem resources as API-style paths. Such paths are interpreted as relative to the root directory of the notebook server. The default ContentsManager is designed for users running the notebook as an application on a personal computer.
File Save Hooks
You can configure functions that are run wherever a file is saved. There are two hooks available:
- ContentsManager.pre_save_hook runs on the API path and model with content. This can be used for things like stripping output that people don't like adding to VCS noise.
- FileContentsManager.post_save_hook runs on the filesystem path and model without content. This could be used to commit changes after every save.
Custom Request Handlers
The notebook webserver can be interacted with using a well defined RESTful API. You can define a custom RESTful API handlers in addition to the ones provided by the notebook. The notebook webserver is written in Python, hence your server extension should be written in Python too.
<aside>
Element
<details>
Element
Comments
You have to be logged in to add a comment
User Comments