Overview

Minimalistic vector art of backpropagation

Backpropagation is a method for efficiently computing the gradient of the loss function with respect to the parameters of a neural network. It is a generalization of the gradient computation methods that are used in the context of neural networks with differentiable activation functions. This algorithm cuts down on the complexity of the gradient computation from $O(n^2)$ to $O(n)$!

In any neural network, calculus does a lot of the heavy lifting. Imagine we're training a neural network, and backpropagation has not yet been baked into every library everywhere. When we get to finding gradients, we'll need to compute the derivative of the loss function with respect to some parameter. In the naïve approach, this could take up to $O(n^2)$ time, where $n$ is the number of parameters. Now, if we're a company training a network of 100 million parameters, this is quickly intractable, and with the bias-variance tradeoff out the window these days, we'll need to cut down on the complexity of some part of our training.

The Algorithm

...