Reaidng the ffmpeg Docs
I want to learn more about the capabilities of ffmpeg. Similar to why I am reading the ImageMagick docs, I want to implement a Lambda Layer that has ffmpeg so that I can manipulate video and audio files using a Lambda function.
Resources
Definitions
- Rescaling
- The process of changing the video size. Several rescaling options and algorithms are available. This is usually a lossy process.
- Pixel Format Conversion
- The process of converting the image format and colorspace of the image, for example from planar YUV420P to RGB24 packed. It also handles packing conversion, that is converts from packed layout (all pixels belonging to distinct planes interleaved in the same buffer), to planar layout (all samples belonging to the same plane stored in a dedicated buffer or "plane")
- Resampling
- The process of changing the audio rate, for example from a high sample rate of 44100Hz to 8000Hz. Audio conversion from high to low sample rate is a lossy process. Several resampling options and algorithms are available.
- Format Conversion
- Process of converting the type of samples, for example from 16-bit signed samples to unsigned 8-bit or float samples. It also handles packing conversion, when passing from packed layout (all samples belonging to distinct channels interleaved in the same buffer), to planar layout (all samples belonging to the same channel stored in a dedicated buffer or "plane")
- Rematrixing
- The process of changing the channel layout, for example from stereo to mono. When the input channels cannot be mapped to the output streams, the process is lossy, since it involves different gain factors and mixing.
- Multiplexing (Muxing)
- Video processing technique that combines multiple audio, video, caption, and metadata streams into a single container file. The container file can be in various formats - such as mp4, avi, mkv, or webm, and acts as a digital suitcase that keeps all parts of the media data synchronized for playback.
- Demultiplexing (Demuxing)
- The process of making available for separate processing the components of a audio / video transport stream that was combined during the muxing process.
- Video Coding Format
- Content representation format (encoded format for converting a specific type of data to displayable information) of digital video content. It typically uses a standardized video compression algorithm.
- Codec
- A codec is a device or computer program that encodes or decodes a data stream or signal. It is a blend of a coder and a decoder.
- Video Codec
- A video codec is software or hardware that compresses and decompresses digital video. In the context of video compression, codec is a blend of encoder and decoder, while a device that only compresses is typically called an encoder, and one that only decompresses is called a decoder.
- Encoder
- An encoder in digital electronics is a one-hot to binary converter. If there are 2n input lines, and at most one of them will ever be high, the binary code of this 'hot' line is produced on the n-bit output lines.
- Audio Codec
- An audio codec is a device or computer program capable of encoding or decoding a digital data stream (a codec) that encodes or decodes audio. It implements an algorithm that compresses or decompresses digital audio data according to a given audio file or streaming media audio coding format.
- Multimedia container format
- File used to identify and interleave different data types. Simpler container formats can contain different types of audio formats, while more advanced container formats can contain multiple audio and video streams, subtitles, chapter information, and meta-data - along with synchronization information needed to play back the various streams together. In most cases, most of the metadata and the synchro chunks are specified by the container format.
About ffmpeg
ffmpeg is the leading multimedia framework, able to decode, encode, transpile, mux, demux, stream, filter, and play pretty much anything that humans and machines have created. It supports many formats. It is highly portable - ffmpeg compiles, runs, and passes testing infrastructure across Linux, macOS, Windows, and other build environments.
It contains libavcodec, libavutil, libavformat, libavdevice, libswscale, and libswresample which can be used by applications. As well as ffmpeg, ffplay, and ffprobe which can be used by end users for transcoding and playing.
FFmpeg Tools
ffmpeg
Synopsis
ffmpeg [global_options] {[input_file_options] -i input_url}...{[output_file_options] output_url}...
Description
ffmpeg
is a universal media converter. It can read a variety of inputs - including live grabbing/recording devices - filter, and transcode them into a plethora of output formats.
It reads from an arbitrary number of input "files" (which can be regular files, pipes, network streams, grabbing devices, etc.) specified by the -i
option, and writes to an arbitrary number of output "files", which are specified by a plain output URL. Anything found on the command line which cannot be interpreted as an option is considered an output URL.
Each input / output can, in principle, contain any number of streams/types (video/audio/subtitle/attachment/data). The allowed number / type of streams may be limited by the container format. Selecting which stream from which inputs will go into which output is either done automatically or with the -map
option.
To refer to input files, you use indices (0-based). Each stream within the file is referred to by its index as well (0-based).
As a general rule, options are applied to the next specified file. Therefore, order is important. Exceptions are global options.
Do not mix input/output files or options that go to input/output files.
Examples:
# Convert an input media file to a different format, by re-encding media streams
ffmpeg -i input.avi output.mp4
# Set the video bitrate of the output file to 64 kbit/s
ffmpeg -i input.avi -b:v 64k -bufsize 64k output.mp4
#Force the frame rate of the output file to 24 fps
ffmpeg -i input.avi -r 24 output.mp4
# Force the frame rate of the input file (valid for raw formats only) to 1fps and the frame rate of the output file to 24fps
ffmpeg -r 1 -i input.m2v -r 24 output.mp4
Detailed Description
The transcoding process in ffmpeg
for each output can be described in the following diagram:
ffmpeg
calls the libavformat library (containing demuxers) to read input files and get packets containing encoded data from them. When there are multiple input files, ffmpeg
tries to keep them synchronized by tracking lowest timestamp on any active input stream.
Encoded packet are then sent to the decoder. The decoder produces uncompressed frames which can be processed further by filtering. After filtering, the frames are passed to the encoder, which encodes them and outputs encoded packets. Finally, those are passed to the muxer, which writes the encoded packets to the output file.
Filtering
Before encoding, ffmpeg
can process raw audio and video frames using filters from the libvafilter library. Several chained filters form a filter graph.
Simple Filtergraphs
Simple filtergraphs are those that have exactly one input and output, both of the same type.
Simple filtergraphs are configured with the per-stream-filter
option. A simple filtergraph for video can look like this:
Complex Filtergraphs
Complex filtergraphs are those which cannot be described as simply linear processing chained applied to one stream. That is the case when the graph has more than one input / output, or when the output stream is a different type from input.
Stream Copy
Stream copy mode is a mode selected by supplying the copy
parameter to the -codec
option. It makes ffmpeg
omit the decoding and encoding step for the specified stream, so it does only demuxing and muxing. It is useful for changing the container format or modifying the container-level metadata.
Stream Selection
ffmpeg
provides the -map
option for manual control of stream selection in each output file. Users can skip the -map
and let ffmpeg perform automatic stream selection. The -vn / -an / -sn / -dn
options can be used to skip the inclusion of video, audio, subtitle, and data streams respectively, whether manually mapped or automatically selected, except for those streams which are outputs of the complex filtergraphs.
In the absence of any map options for a particular output, ffmpeg inspects the output format to check which type of streams can be included in it, vix, video, audio and / or subtitles. For each acceptable stream type, ffmpeg will pick one stream from all the inputs based on the following criteria:
- for video, it is the stream with the highest resolution
- for audio, it is the stream with the most channels
- for subtitles, it is the first subtitle stream found but there's a caveat. The output format's default subtitle encoder can be either text-based or image-based, and only a subtitle stream of the same type will be chosen.
Options
All the numerical options accept a string representing a number as input, which may be followed by one of the SI unit prefixes, for example: 'K', 'M', or 'G'.
IF 'i' is appended to the SI unit prefix, the complete prefix will be interpreted as a unit prefix for binary multiples.
Options which do not take arguments are Boolean options, and set the corresponding value to true. They can be set to false by prefixing the option name with "no". For example, -nofoo
will set the Boolean option with name -foo
to false.
Options that take arguments support a special syntax where the argument given on the command line is interpreted as a path to the file from which the actual argument value is loaded. To use this feature, ass a forward slash before the option name
ffmpeg -i INPUT -/filter:v filter.script OUTPUT # will load a filtergraph description from the file named filter.script
Some option are applied per-stream, like bitrate and codec. Some specifiers are used to precisely specify which stream(s) a given option belongs to.
A stream specifier is a string generally appended to the option name and separated from it by a colon. E.g., -codec:a:1 ac3
contains the a:1
stream specifier, which matches the second audio stream. It selects the ac3
codec for the second audio stream.
A stream specifier can match several streams, so that the option is applied to them all. E.g., the stream specifier in -b:a 128k
matches all audio streams.
An empty stream specifier matches all streams. For example, -codec copy
or -codec: copy
would copy all the streams without reencoding. Possible forms of stream specifiers:
- stream_index
- Matches the stream with this index. E.g.,
-threads:1 4
would set the thread count for the second stream to 4.
- Matches the stream with this index. E.g.,
- stream_type[:additional_stream_specifier]
- stream_type is one of the following: 'v' or 'V' for video, 'a' for audio, 's' for subtitle, 'd' for data, 't' for attachments. 'V' matches video streams which are not attached pictures, video thumbnails pr cover arts.
- g:group_specifier[:additional_stream_specifer]
- Matches streams which are in the group with the specifier group_specifier.
- p:program_id[:additional_stream_specifier]
- #stream_if or i:stream_id
- m:key[:value]
- u
I'm not going to go through all the options (there are many). I will look at specific options when I need them.
ffplay
Synopsis
ffplay [options] [input_url]
Description
FFplay is a very simple and portable media player using FFmpeg libraries and the SDL library. It is mostly used as a testbed for the various FFmpeg APIs.
I don't really need to use this, so I'm skipping it.
ffprobe
Synopsis
ffprobe [options] input_url
Description
- ffprobe gathers information from multimedia streams and prints it in human-readable fashion.
- For example, it can be used to check the format of the container used by a multimedia stream and the format and type of each media stream contained in it.
- ffprobe will return a positive exit code if the input URL is not specified or if the file cannot be opened.
- If no output is specified, ffprobe will write to STDOUT.
- Options are used to list some of the formats supported by ffprobe or for specifying which information to display, and for setting how ffprobe will show it.
- ffprobe output is designed to be easily parable by a textual filter, and consists of one or more sections of a form defined by the selected writer, which is specified by the
output_format
option. - Sections may contain other nested sections and are identified by name (which may be shared by other sections), and an unique name.
- Metadata tags stored in the container or in the streams are recognized and printed in the corresponding format: "FORMAT", "STREAM", "STREAM_GROUP_STREAM", or "PROGRAM_STREAM" section.
Writers
- Writers have options which can:
- check for invalid strings in the output
string_validation
- replace invalid strings in the output (
string_validation_replacement
)
- check for invalid strings in the output
- You can write output to CSV format, flat format, STDOUT, INI format, json format, and xml format
FFmpeg Libraries for Developers
libavutil
- The libavutil library is a utility library to aid portable multimedia programming. It contains string functions, random number generators, data structures, additional mathematical functions, cryptography and multimedia functionality (like emulators for pixel and sample formats).
libavcodec
- The libavcodec library provides a generic encoding/decoding framework and contains multiple decoders and encoders for audio, video, and subtitle streams, and several bitstream filters.
libavformat
- The libavformat library provides a generic framework for multiplexing and demultiplexing (muxing and demuxing) audio, video, and subtitle streams. It encompasses multiple muxers and demuxers for multimedia container formats.
libavdevice
- The libavdevice library provides a generic framework for grabbing from and rendering common multimedia input/output devices, and supports several input and output devices, including Video4Linux2, DShow, and ALSA.
libavfilter
- The libavfilter library provides a generic audio/video filtering framework containing several filters, sources, and sinks.
libswscale
- The libswscale library performs highly optimized scaling and colorspace and pixel format conversion operations. Specifically, the library performs Rescaling and Pixel format conversion
libswresample
- The libswresample library performs highly optimized audio resampling, rematrixing, and sample format conversion operations. Specifically, the library provides Resampling, Format Conversion, and Rematrixing
Comments
You have to be logged in to add a comment
User Comments
There are currently no comments for this article.