Automatic Diffeneriation
Want to read more about automatic differentiation because I have been focused on deep learning the past couple days.
References
Related
- Chain Rule
- In calculus, the chain rule is a formula that expresses the derivative of the composition of two differentiable functions and in terms of the derivatives of and . More precisely, if is the function such that for every , then the chain rule in Lagrange's notation is .
- Symbolic Differentiation
- Scientific area that refers to the study and development of algorithms and software for manipulating mathematical expressions and other mathematical objects.
- Numerical Differentiation
- Estimate the derivative of a mathematical function or function subroutine using values of the function and perhaps other knowledge about the function.
- Round-off error
- The difference between the result provided by a given algorithm using exact arithmetic and the result produced by the same algorithm using finite-precision, rounded arithmetic.
- Discretization
- The process of transferring continuous functions, models, variables, and equations into discrete counterparts.
Notes
In mathematics and computer algebra, automatic differentiation (auto-differentiation, autodiff, or AD), also called algorithmic differentiation, computational differentiation, and differentiation arithmetic is a set of techniques to evaluate the partial derivative of a function specified by a computer program. Automatic differentiation is a subtle and central tool to automatize the simultaneous computation of the numerical values of arbitrary complex functions and their derivatives with no need for the symbolic representation of the derivative, only the function rule or an algorithm thereof is required. Auto-differentiation is neither numeric nor symbolic, nor is it a combination of both. It is also preferable to ordinary numerical methods: In contrast to the more traditional numerical methods based on finite differences, auto-differentiation isin theoryexact, and in comparison to symbolic algorithms, it is computationally inexpensive.
Automatic differentiation exploits the fact that every computer calculation, no matter how complicated, executes a sequence of elementary operations (arithmetic, subtraction, multiplication, division, etc.) and elementary functions (exp, log, sin, cos, etc.). By applying the chain rule repeatedly to these operations, partial derivatives of arbitrary order can be computed automatically, accurately to working precision, and using at most a small constant factor of more arithmetic operations that the original program.
Automatic differentiation is distinct from symbolic differentiation and numerical differentiation.
- Symbolic differentiation faces the difficulty of converting a computer program into a single mathematical expression and can lead to inefficient code.
- Numeric differentiation (the method of finite differences) can introduce round-off errors in the discretization process and cancellation.
- Both of these classical methods are slow at computing partial derivates of a function with respect to many inputs, as is needed for gradient-based optimization algorithms.
- Autodiff solves all of these problems.
Autodiff, because of its efficiency and accuracy in computing first and higher order derivatives, is a celebrated technique with diverse applications in scientific computing and mathematics. In practice, there are two types of algorithmic differentiation: a forward type and a reversed-type. The two types are highly correlated and complementary and both have a wide variety of applications in:
- non-linear optimization: the selection of the best element, with regard to some criteria, from set of available alternatives
- sensitivity analysis: the study of how the uncertainty in the output of a mathematical model or system can be divided and allocated to different sources of uncertainty in its inputs.
- robotics: the interdisciplinary study and practice of the design, construction, operation, and use of robots.
- machine learning: field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data
- computer graphics: deals with generating images and art with the aid of computers.
- computer vision
Fundamental to automatic differentiation is the decomposition of differentials provided by the chain rule of partial derivatives of composite functions.
Two modes of autodiff are presented usually:
- forward accumulation
- reverse accumulation
Forward accumulation specifies that one traverses the chain rule from inside to outside (that is, first compute and then and then , while reverse accumulation has the traversal from outside to inside (first compute and the and at last ).
The value of the partial derivative, called seed, is propagated forward or backward and is initially or . Forward accumulation evaluates the function and calculates the derivative with respect to one independent variable in one pass.reverse accumulation requires the evaluated partial functions for the partial derivatives. Reverse accumulation therefore evaluates the function first and calculates the derivatives with respect to all independent variables in an additional pass.
Backpropagation of errors in MLP is a special case of reverse accumulation.
In forward accumulation AD, one first fixes the independent variable with respect to which differentiation is performed and computes the derivative of each sub-expression recursively. In reverse accumulation AD, the dependent variable to be differentiated is fixed and the derivative is computed with respect to each sub-expression recursively.
Comments
You have to be logged in to add a comment
User Comments
There are currently no comments for this article.