Sandboxing Python and Linux Jailing
I need to look into different ways to sandbox Python code on the server for multiple reasons. I need to sandbox Python to implement a better version of Jupyter Notebooks to HTML, TeX to HTML, and to execute user submitted Python code / AI generated Python code.
References
Introduction
Developing a web service that accepts Python code form users and runs it on the server is not itself technically challenging. The challenges are how to ensure that the web service is able to successfully run a user's code and how to protect the web service from the user's code.
For how to ensure user's code runs properly, this is best handled by publishing information about the environment the code will run in, such as the Python version, libraries available, etc. This places the onus on the user to make sure that their code runs correctly, and it is the approach taken by GitHub Actions and their runners, Travis CI or AppVeyor.
The most critical challenge is protecting the web service from the user's code.
- The code might have an infinite loop which consumes CPU resources
- It might be implemented with infinite recursion that devours the server's memory
- The code might maliciously try and write large files of junk consuming available disk space.
Restricted Python offers an approach to running untrusted code. Rather than create a sandbox or secure environment, it uses customizable policies to determine a restricted subset of the Python language that can be executed. It is hosted on GitHub.
The PyPy Python implementation provides an alternative model for sandboxing Python. PyPy sandboxing offers sandboxing at a level comparable to that offered by operating systems. A trusted Python program spawns a subprocess which runs untrusted code using a sandboxed version of PyPy. This version of PyPy serializes all input / output to a standard input/output pipe. The trusted Python program then determines which I/O accesses are permitted or not. Controls on the amount of CPU time and RAM that the untrusted code can consume can also be imposed.
There are various Linux OS features that can be used to control the execution of processes, to restrict the files and other I/O resources a process has access to, and to put constraints on their CPU and memory consumption. These could be used to control the execution of code submitted by the users:
chroot
- Can be used to restrict the parts of the file system that can be accessed by a non-root process and any sub-processes that it spawns. The process cannot access files outwith this restricted file system, which is termed a
chroot jail
- This might be good for to HTML since I need the user to upload a
.zip
file for it to work correctly.
- This might be good for to HTML since I need the user to upload a
- Can be used to restrict the parts of the file system that can be accessed by a non-root process and any sub-processes that it spawns. The process cannot access files outwith this restricted file system, which is termed a
ulimit
- Command which can put limits on the CPU, number of processes, number of open files, and memory available to a user.
- RedHat: How To Set
ulimit
values
seccomp
- Kernel facility that can isolate a process from a system's resources, allowing it to only access open file descriptors and to exit. If the process attempts any other system calls (e.g. to open another file) it is killed. This is used y Docker.
seccomp-bpf
- Provides added flexibility to
seccomp
. It allows filter programs to be written which determine which system calls should be available and to which processes.
- Provides added flexibility to
AppArmor
- Kernel security module that allows for access control to network, socket and file resources to be configured and enforced for specific programs.
SELinux
- Kernel security module for access control, which performs a similar function to AppArmor, through with richer, more complex configuration.
Docker
Docker exploits Linux kernel resource (CPU, memory, block I/O, network) isolation and virtualization features to allow independent "containers" to run on a single Linux server. Each Docker container offers a basic version of Linux. It is possible to set up a web service that receives code from a user, starts up a Docker container loaded with an image that can run the code, runs the user's code, retries the outputs, returns these to the user, and shuts down the container. An example of this model in practice was Remote interview, which used Docker to provide a service to allow job interview candidates to compile and run source code via a web service. Their framework - a mixture of JavaScript and shell scripts - was released on GitHub, as the open source framework CompileBox.
Jupyter Notebooks
There are some ways in which Jupyter Notebooks can be used to isolate code.
- Interactive Notebooks: Sharing the Code
- How did we serve more than 20,000 IPython Notebooks for Nature Readers?
tmpnb
tmbnb
is a temporary notebook service, which allows a deployer to configure the CPU quota and memory limits for each Docker container and how long to wait before closing a notebook down if it is idle.
Multi-server Architecture
Adopting an architecture whereby the web service runs on one server (either a physical or virtual machine) and each user's code is executed on a separate server greatly reduces, or even removes altogether, the risk that a specific user's code affects either the running of the web service or other user's code.
systemd-nspawn
can be used to run a command or operating system in a light-weight container. systemd
itself also supports IP Accounting and Access List to manage outgoing network access.
chroot
chroot
is an operation on Unix and Unix-like operating systems that changes the apparent root directory for the current running process and its children. A program that is run in such a modified environment cannot name (and therefore normally cannot access) files outside the designated directory tree. The modified environment is called a chroot jail.
The early use of the term "jail" as applied to chroot appeared in 1991. To make it useful for virtualization, FreeBSD expanded the concept and in its 4.0 release in 2000 introduced the jail command.
By 2008, LXC (upon which Docker was later built) adopted the "container" terminology and gained popularity in 2013.
Uses
A chroot environment can be used to create and host a separate virtualized copy of the software system. This can be useful for:
- Testing and Development
- A test environment can be set up in the chroot for software that would otherwise be too risky to deploy on a production system
- Dependency Control
- Software can be developed, built and tested in a chroot populated with its expected dependencies. This can prevent some kinds of linkage skew that can result from developers building products with different sets of program libraries installed.
- Compatibility
- Legacy software must sometimes be run in a chroot because their supporting libraries or data files may otherwise clash in name or linkage with those of the host system.
- Recovery
- Should a system be rendered unbootable, a chroot can be used to move back into the damaged environment after bootstrapping from an alternate root file system
- Privilege Separation
- Programs are allowed to carry open file descriptors (for files, pipelines and network connections) into the chroot, which can simplify jail design by making it unnecessary to leave working files inside the chroot directory.
Limitations
The chroot mechanism is not intended to defend against intentional tampering by privileged (root) users. chrooted programs should relinquish root privileges as soon as practical after chrooting, or other mechanisms - such as FreeBSD jails - should be used instead. The chroot mechanism is not intended by itself to be used to block low-level access to system devices by privileged users. It is not intended to restrict the use of resources like I/O, bandwidth, disk space, or CPU time. Tools like Jailkit help ease and automate the jailing process. Only the root user can perform a chroot.
Linux Host Kernel Virtual File System and Configuration Files
To have a functional chroot environment in Linux, the kernel virtual file systems and configuration files also have to be mounted / copied from host to chroot.
$ # Mount Kernel Virtual File Systems
$ TARGETDIR="/mnt/chroot"
$ mount -t proc proc $TARGETDIR/proc
$ mount -t sysfs sysfs $TARGETDIR/sys
$ mount -t devtmpfs devtmpfs $TARGETDIR/dev
$ mount -t tmpfs tmpfs $TARGETDIR/dev/shm
$ mount -t devpts devpts $TARGETDIR/dev/pts
$
$ # Copy /etc/hosts
$ /bin/cp -f /etc/hosts $TARGETDIR/etc/
$
$ # Copy /etc/resolv.conf
$ /bin/cp -f /etc/resolv.conf $TARGETDIR/etc/resolv.conf
$
$ # Link /etc/mtab
$ chroot $TARGETDIR rm /etc/mtab 2> /dev/null
$ chroot $TARGETDIR ln -s /proc/mounts /etc/mtab
Virtualization
In computing, virtualization (v1n) is a series of technologies that allows dividing of physical computing resources into a series of virtual machines, operating systems, processes or containers.
In hardware virtualization, the host machine is the machine that is used by the virtualization and the guest machine is the virtual machine. The words host and guest are used to distinguish the software that runs on the physical machine from the software that runs on the virtual machine. The software or firmware that creates a virtual machine on the host hardware is called a hypervisor or virtual machine monitor.
Operating-system-level virtualization, also known as containerization, refers to an operating system feature in which the kernel allows the existence of multiple isolated user-space instances.
Containerization
In software engineering, containerization is operating-system-level virtualization or application-level virtualization over multiple network resources so that software applications can run in isolated user spaces called containers in any cloud or non-cloud environment, regardless of type or vendor.
Each container is basically a fully functional and portable cloud or non-cloud computing environment surrounding the application and keeping it independent of other environments running in parallel. Individually, each container simulates a different software application and runs isolated processes by bundling related configuration files, libraries and dependencies. Containerization has been widely adapted on cloud computing platforms like AWS, Azure, and Google Cloud - and it is used by the DOD as a way of more rapidly developing and deploying software updates.
Container orchestration or container management is mostly used in the context of application containers. Implementations providing such orchestration include Kubernetes and Docker swarm.
FreeBSD
FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD).
FreeBSD maintains a complete system, delivering a kernel, device drivers, userland utilities, and documentation, as opposed to Linux only delivering a kernel and drivers, and relying on third-parties such as GNU for system software.
FreeBSD Jail
The jail mechanism is an implementation of FreeBSD's OS-level virtualization that allows system administrators to partition a FreeBSD-derived computer system into several independent mini-systems called jails, all sharing the same kernel, with very little overhead. Free BSD jails mainly aim at three goals:
- Virtualization: Each jail is a virtual environment running on the host machine with its own files, processes, user and superuser accounts. From the jailed process, the environment is almost indistinguishable from a real system.
- Security: Each jail is sealed from the others, thus providing an additional level of security.
- Ease of Use: The limited scope of a jail allows system administrators to delegate several tasks which require superuser access without handing out complete control over the system.
Unlike the chroot
jail, which only restricts processes to a particular view of the filesystem, the FreeBSD jail mechanism restricts the activities of a process in a jail with respect to the rest of the system. In effect, jailed processes are sandboxed.
LXC
Linux Containers (LXC) is an operating system level virtualization method for running multiple isolated Linux systems (containers) on a control host using a single Linux kernel.
The Linux kernel provides the cgroups functionality that allows limitation and prioritization or resources (CPU, memory, block I/O, network, etc.) without the need for starting any virtual machines, and also the namespace isolation functionality that allows complete isolation of an application's view of the operating environment, including process trees, networking, user IDs, and mounted file systems.
cgroups
cgroups (abbreviated from control groups) is a Linux kernel feature that limits, and accounts for, and isolates the resource usage (CPU, memory, disk I/O, etc.) of a collection of processes.
cgroups was developed by Google and released in 2008.
Features
- Resource Limiting: groups can be set to not exceed a configured memory limit, which also includes the file system cache, I/O bandwidth limit, CPU quota limit, CPU set limit, or maximum open files
- Prioritization: some groups may get a larger share of CPU utilization or disk I/O throughput
- Accounting: measures a group's resource usage, which may be used, for example, for billing purposes
- Control: freezing groups of processes, their checkpointing and restarting
A control group is a collection of processes that are bound by the same criteria and associated with a set of parameters or limits. These groups can be hierarchical - meaning that each group inherits limits from its parent group. The kernel provides access to multiple controllers (also called subsystems) through cgroup interface. Control groups can be used in multiple ways"
- by accessing the cgroup virtual file system manually
- By creating and managing groups on the fly using tools like
cgcreate
,cgexec
, andcgclassify
(fromlibcgroup
) - Through the "rules engine daemon" that can automatically move processed of certain users, groups, or commands to cgroups as specified
- Indirectly through other software that uses cgroups, such as Docker, LXC, systemd
Redesign
- Namespace isolation
- A related feature (to cgroups) of the Linux kernel is namespace isolation, where groups of processes are separated such that they cannot "see" resources in other groups.
- Unified hierarchy
- Kernel Memory control groups
- cgroup awareness of OOM killer
Docker
Docker is a set of platform as a service (PaaS) products that use OS-level virtualization to deliver software in packages called containers.
Docker is a tool that is used to automate the deployment of applications in lightweight containers so that applications can work efficiently in different environments in isolation.
Containers are isolated from one another and bundle their own software, libraries, and configuration files; they can communicate with each other through well-defined channels. Containers use fewer resources than virtual machines b/c they share a single OS kernel. When running on Linux, Docker uses the resource isolation features of the Linux kernel and a union-capable file system to allow containers to run on a single Linux instance. Because Docker containers are lightweight, a single server or virtual machine can run several containers simultaneously.
Components
- Software
- The Docker daemon, called
dockerd
, is a persistent process that manages Docker containers and handles container objects. The daemon lists for requests sent via the Docker engine API. The Docker client program, calleddocker
, provides a command-line interface that allows users to interact with the Docker daemons.
- The Docker daemon, called
- Objects
- Docker objects are various entities used to assemble an application in Docker. The main classes of Docker objects are images, containers, and services.
- A Docker Container is a standardized, encapsulated environment that runs applications. A container is managed using the Docker API or CLU. It is a process created from an image.
- A Docker image is a read-only template to build containers. Images are used to store and ship applications. It is a process image.
- A Docker service allows containers to be scaled across multiple Docker daemons. The result is known as a swarm, a set of cooperating daemons that communicate through the Docker API.
- Registries
- A Docker registry is a repository for Docker images. Docker clients connect to registries to download ("pull") images for use or upload ("push") images that they have built. Registries can be public or private. The main public registry is Docker Hub.
# Example docker file
ARG CODE_VERSION=latest
FROM ubuntu:${CODE_VERSION}
COPY ./examplefile.txt /examplefile.txt
ENV MY_ENV_VARIABLE="example_value"
RUN apt-get update
# Mount a directory from the Docker volume
# Note: This is usually specified in the 'docker run' command.
VOLUME ["/myvolume"]
# Expose a port (22 for SSH)
EXPOSE 22
Tools
- Docker Compose
- a tool for defining and running multi-container Docker applications. It uses YAML files to configure the application's services and performs the creation and start-up process of all the containers with a single command. The
docker-compose
CLI utility allows users to run commands on multiple containers at once; for example, building images, scaling containers, running containers that were stopped, and more.
- a tool for defining and running multi-container Docker applications. It uses YAML files to configure the application's services and performs the creation and start-up process of all the containers with a single command. The
- Docker Swarm
- provides native clustering functionality for Docker containers, which turns a group of Docker engines into a single virtual Docker engine.
File Descriptors
In Unix and Unix-like computer operating systems, a file descriptor (FD, less frequently fildes) is a process-unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket.
File descriptors typically have non-negative integer values, with negative values being reserved to indicate "no value" of error conditions.
In the traditional implementation of Unix, file descriptors index into a per-process file descriptor table maintained by the kernel, that in turn indexes into a system-wide table of files opened by all processes, called the file table. This table records the mode with which the file (or other resource) has been opened: for reading, writing, appending, and possibly other modes.
ulimit
User limits - limit the use of system-wide resources. ulimit provides control over the resources available to the shell and to processes started by it, on systems that allow such control. The soft limit is the value that the kernel enforces for the corresponding resource. The hard limit acts as a ceiling for the soft limit.
seccomp / seccomp-bmf
seccomp (short for secure computing) is a computer security facility in the Linux kernel. seccomp allows a process to make a one-way transition into asecurestate where it cannot make system calls exceptsexit()
,sigreturn()
,read()
, andwrite()
to already-open file descriptors. Should it attempt any other system calls, the kernel will either just log the event or terminate the process with SIGKILL or SIGSYS.
seccomp-bpf is an extension to seccomp that allows filtering of system calls using a configurable policy implemented using Berkeley Packet Filter rules. It is used by OpenSSH and the Google Chrome/Chromium web browsers on ChromeOS and Linux.
AppArmor
AppArmor is an effective and easy-to-use Linux application security system. AppArmor proactively protects the operating system and applications from external or internal threats, even zero-day attacks, by enforcing good behavior and preventing both known and unknown application flaws from being exploited.
Many Linux distributions ship with AppArmor. Run aa-status
to see if your Linux distribution already has AppArmor integrated.
$ aa-status
apparmor module is loaded.
SELinux
Security-Enhanced Linux (SELinux) is a Linux security module that provides a mechanism for supporting access control security policies, including mandatory access controls.
SELinux is a set of kernel modifications and user-space tools that have been added to various Linux distributions. Its architecture strives to separate enforcement of security decisions from the security policy, and streamlines the amount of software involved with security policy enforcement. The key concepts underlying SELinux can be traced to several earlier projects by the United States National Security Agency (NSA).
systemd-nspawn
$ systemd-nspawn [OPTIONS...] [COMMAND [ARGS...] ]
$ systemd-nspawn --boot [OPTIONS...] [ARGS...]
systemd-nspawn
may be used to run a command or OS in a light-weight namespace container. In many ways, it is similar to chroot
, but more powerful since it fully virtualizes the file system hierarchy, as well as the process tree, the various IPC subsystems and domain name.
It may be invoked on any directory tree containing an operating system tree, using the --directory=
command line option.
Comments
There are currently no comments to show for this article.