Reaidng the ffmpeg Docs

I want to learn more about the capabilities of ffmpeg. Similar to why I am reading the ImageMagick docs, I want to implement a Lambda Layer that has ffmpeg so that I can manipulate video and audio files using a Lambda function.

Date Created:
Last Edited:

Resources



Definitions


  • Rescaling
    • The process of changing the video size. Several rescaling options and algorithms are available. This is usually a lossy process.
  • Pixel Format Conversion
    • The process of converting the image format and colorspace of the image, for example from planar YUV420P to RGB24 packed. It also handles packing conversion, that is converts from packed layout (all pixels belonging to distinct planes interleaved in the same buffer), to planar layout (all samples belonging to the same plane stored in a dedicated buffer or "plane")
  • Resampling
    • The process of changing the audio rate, for example from a high sample rate of 44100Hz to 8000Hz. Audio conversion from high to low sample rate is a lossy process. Several resampling options and algorithms are available.
  • Format Conversion
    • Process of converting the type of samples, for example from 16-bit signed samples to unsigned 8-bit or float samples. It also handles packing conversion, when passing from packed layout (all samples belonging to distinct channels interleaved in the same buffer), to planar layout (all samples belonging to the same channel stored in a dedicated buffer or "plane")
  • Rematrixing
    • The process of changing the channel layout, for example from stereo to mono. When the input channels cannot be mapped to the output streams, the process is lossy, since it involves different gain factors and mixing.
  • Multiplexing (Muxing)
    • Video processing technique that combines multiple audio, video, caption, and metadata streams into a single container file. The container file can be in various formats - such as mp4, avi, mkv, or webm, and acts as a digital suitcase that keeps all parts of the media data synchronized for playback.
  • Demultiplexing (Demuxing)
    • The process of making available for separate processing the components of a audio / video transport stream that was combined during the muxing process.
  • Video Coding Format
    • Content representation format (encoded format for converting a specific type of data to displayable information) of digital video content. It typically uses a standardized video compression algorithm.
  • Codec
    • A codec is a device or computer program that encodes or decodes a data stream or signal. It is a blend of a coder and a decoder.
  • Video Codec
    • A video codec is software or hardware that compresses and decompresses digital video. In the context of video compression, codec is a blend of encoder and decoder, while a device that only compresses is typically called an encoder, and one that only decompresses is called a decoder.
  • Encoder
    • An encoder in digital electronics is a one-hot to binary converter. If there are 2n input lines, and at most one of them will ever be high, the binary code of this 'hot' line is produced on the n-bit output lines.
  • Audio Codec
    • An audio codec is a device or computer program capable of encoding or decoding a digital data stream (a codec) that encodes or decodes audio. It implements an algorithm that compresses or decompresses digital audio data according to a given audio file or streaming media audio coding format.
  • Multimedia container format
    • File used to identify and interleave different data types. Simpler container formats can contain different types of audio formats, while more advanced container formats can contain multiple audio and video streams, subtitles, chapter information, and meta-data - along with synchronization information needed to play back the various streams together. In most cases, most of the metadata and the synchro chunks are specified by the container format.

About ffmpeg


ffmpeg is the leading multimedia framework, able to decode, encode, transpile, mux, demux, stream, filter, and play pretty much anything that humans and machines have created. It supports many formats. It is highly portable - ffmpeg compiles, runs, and passes testing infrastructure across Linux, macOS, Windows, and other build environments.

It contains libavcodec, libavutil, libavformat, libavdevice, libswscale, and libswresample which can be used by applications. As well as ffmpeg, ffplay, and ffprobe which can be used by end users for transcoding and playing.


FFmpeg Tools


ffmpeg


Synopsis
ffmpeg [global_options] {[input_file_options] -i input_url}...{[output_file_options] output_url}...
Description

ffmpeg is a universal media converter. It can read a variety of inputs - including live grabbing/recording devices - filter, and transcode them into a plethora of output formats.

It reads from an arbitrary number of input "files" (which can be regular files, pipes, network streams, grabbing devices, etc.) specified by the -i option, and writes to an arbitrary number of output "files", which are specified by a plain output URL. Anything found on the command line which cannot be interpreted as an option is considered an output URL.

Each input / output can, in principle, contain any number of streams/types (video/audio/subtitle/attachment/data). The allowed number / type of streams may be limited by the container format. Selecting which stream from which inputs will go into which output is either done automatically or with the -map option.

To refer to input files, you use indices (0-based). Each stream within the file is referred to by its index as well (0-based).

As a general rule, options are applied to the next specified file. Therefore, order is important. Exceptions are global options.

Do not mix input/output files or options that go to input/output files.

Examples:

# Convert an input media file to a different format, by re-encding media streams
ffmpeg -i input.avi output.mp4
# Set the video bitrate of the output file to 64 kbit/s
ffmpeg -i input.avi -b:v 64k -bufsize 64k output.mp4
#Force the frame rate of the output file to 24 fps
ffmpeg -i input.avi -r 24 output.mp4
# Force the frame rate of the input file (valid for raw formats only) to 1fps and the frame rate of the output file to 24fps
ffmpeg -r 1 -i input.m2v -r 24 output.mp4
Detailed Description

The transcoding process in ffmpeg for each output can be described in the following diagram:

ffmpeg calls the libavformat library (containing demuxers) to read input files and get packets containing encoded data from them. When there are multiple input files, ffmpeg tries to keep them synchronized by tracking lowest timestamp on any active input stream.

Encoded packet are then sent to the decoder. The decoder produces uncompressed frames which can be processed further by filtering. After filtering, the frames are passed to the encoder, which encodes them and outputs encoded packets. Finally, those are passed to the muxer, which writes the encoded packets to the output file.

Filtering

Before encoding, ffmpeg can process raw audio and video frames using filters from the libvafilter library. Several chained filters form a filter graph.

Simple Filtergraphs

Simple filtergraphs are those that have exactly one input and output, both of the same type.

Simple filtergraphs are configured with the per-stream-filter option. A simple filtergraph for video can look like this:

Complex Filtergraphs

Complex filtergraphs are those which cannot be described as simply linear processing chained applied to one stream. That is the case when the graph has more than one input / output, or when the output stream is a different type from input.

Stream Copy

Stream copy mode is a mode selected by supplying the copy parameter to the -codec option. It makes ffmpeg omit the decoding and encoding step for the specified stream, so it does only demuxing and muxing. It is useful for changing the container format or modifying the container-level metadata.

Stream Selection

ffmpeg provides the -map option for manual control of stream selection in each output file. Users can skip the -map and let ffmpeg perform automatic stream selection. The -vn / -an / -sn / -dn options can be used to skip the inclusion of video, audio, subtitle, and data streams respectively, whether manually mapped or automatically selected, except for those streams which are outputs of the complex filtergraphs.

In the absence of any map options for a particular output, ffmpeg inspects the output format to check which type of streams can be included in it, vix, video, audio and / or subtitles. For each acceptable stream type, ffmpeg will pick one stream from all the inputs based on the following criteria:

  • for video, it is the stream with the highest resolution
  • for audio, it is the stream with the most channels
  • for subtitles, it is the first subtitle stream found but there's a caveat. The output format's default subtitle encoder can be either text-based or image-based, and only a subtitle stream of the same type will be chosen.
Options

All the numerical options accept a string representing a number as input, which may be followed by one of the SI unit prefixes, for example: 'K', 'M', or 'G'.

IF 'i' is appended to the SI unit prefix, the complete prefix will be interpreted as a unit prefix for binary multiples.

Options which do not take arguments are Boolean options, and set the corresponding value to true. They can be set to false by prefixing the option name with "no". For example, -nofoo will set the Boolean option with name -foo to false.

Options that take arguments support a special syntax where the argument given on the command line is interpreted as a path to the file from which the actual argument value is loaded. To use this feature, ass a forward slash before the option name

ffmpeg -i INPUT -/filter:v filter.script OUTPUT # will load a filtergraph description from the file named filter.script

Some option are applied per-stream, like bitrate and codec. Some specifiers are used to precisely specify which stream(s) a given option belongs to.

A stream specifier is a string generally appended to the option name and separated from it by a colon. E.g., -codec:a:1 ac3 contains the a:1 stream specifier, which matches the second audio stream. It selects the ac3 codec for the second audio stream.

A stream specifier can match several streams, so that the option is applied to them all. E.g., the stream specifier in -b:a 128k matches all audio streams.

An empty stream specifier matches all streams. For example, -codec copy or -codec: copy would copy all the streams without reencoding. Possible forms of stream specifiers:

  • stream_index
    • Matches the stream with this index. E.g., -threads:1 4 would set the thread count for the second stream to 4.
  • stream_type[:additional_stream_specifier]
    • stream_type is one of the following: 'v' or 'V' for video, 'a' for audio, 's' for subtitle, 'd' for data, 't' for attachments. 'V' matches video streams which are not attached pictures, video thumbnails pr cover arts.
  • g:group_specifier[:additional_stream_specifer]
    • Matches streams which are in the group with the specifier group_specifier.
  • p:program_id[:additional_stream_specifier]
  • #stream_if or i:stream_id
  • m:key[:value]
  • u
I'm not going to go through all the options (there are many). I will look at specific options when I need them.

ffplay


Synopsis
ffplay [options] [input_url]
Description

FFplay is a very simple and portable media player using FFmpeg libraries and the SDL library. It is mostly used as a testbed for the various FFmpeg APIs.

I don't really need to use this, so I'm skipping it.

ffprobe


Synopsis
ffprobe [options] input_url
Description
  • ffprobe gathers information from multimedia streams and prints it in human-readable fashion.
  • For example, it can be used to check the format of the container used by a multimedia stream and the format and type of each media stream contained in it.
  • ffprobe will return a positive exit code if the input URL is not specified or if the file cannot be opened.
  • If no output is specified, ffprobe will write to STDOUT.
  • Options are used to list some of the formats supported by ffprobe or for specifying which information to display, and for setting how ffprobe will show it.
  • ffprobe output is designed to be easily parable by a textual filter, and consists of one or more sections of a form defined by the selected writer, which is specified by the output_format option.
  • Sections may contain other nested sections and are identified by name (which may be shared by other sections), and an unique name.
  • Metadata tags stored in the container or in the streams are recognized and printed in the corresponding format: "FORMAT", "STREAM", "STREAM_GROUP_STREAM", or "PROGRAM_STREAM" section.
Writers
  • Writers have options which can:
    • check for invalid strings in the output string_validation
    • replace invalid strings in the output (string_validation_replacement)
  • You can write output to CSV format, flat format, STDOUT, INI format, json format, and xml format

FFmpeg Libraries for Developers


libavutil


  • The libavutil library is a utility library to aid portable multimedia programming. It contains string functions, random number generators, data structures, additional mathematical functions, cryptography and multimedia functionality (like emulators for pixel and sample formats).

libavcodec


  • The libavcodec library provides a generic encoding/decoding framework and contains multiple decoders and encoders for audio, video, and subtitle streams, and several bitstream filters.

libavformat


  • The libavformat library provides a generic framework for multiplexing and demultiplexing (muxing and demuxing) audio, video, and subtitle streams. It encompasses multiple muxers and demuxers for multimedia container formats.

libavdevice


  • The libavdevice library provides a generic framework for grabbing from and rendering common multimedia input/output devices, and supports several input and output devices, including Video4Linux2, DShow, and ALSA.

libavfilter


  • The libavfilter library provides a generic audio/video filtering framework containing several filters, sources, and sinks.

libswscale


  • The libswscale library performs highly optimized scaling and colorspace and pixel format conversion operations. Specifically, the library performs Rescaling and Pixel format conversion

libswresample


  • The libswresample library performs highly optimized audio resampling, rematrixing, and sample format conversion operations. Specifically, the library provides Resampling, Format Conversion, and Rematrixing

Comments

You must be logged in to post a comment!

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


Insert Chart

ESC

View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language