Derivative are fundamental to optimization of neural network. Activation functions allow for non-linearity in an inherently linear model (y = wx + b), which nothing but a sequence of linear operations.
There are various type of activation functions: linear, ReLU, LReLU, PReLU, step, sigmoid, tank, softplus, softmax and many other.
In this particular story, we will focus on the first order derivative of ReLU, LReLU, sigmoid, and tanh activation functions as they are critical to the optimization of the neural network to learn a high performing network weights (parameters). Feel free to list other activations in the comments that you would like me to include in this story or discuss in a separate story.
- relu(x) — Rectified Linear Unit
- lrelu(x) — Leaky Rectified Linear Unit
- sigmoid(x) — logistic function
- tanh(x) — hyperbolic tangent
relu (x): Rectified Linear Unit
The Relu(x) can be written as
and can be expanded as
Therefore, the first order derivative can be written as
and can be simplified as
here, we can see that the individual derivative can be solved as follows:
*note: derivative of constant is 0 not the constant itself.
Therefore,
as,
so the final derivative of relu(x), can be written as
lrelu(x): Leaky Rectified Linear Unit
LReLU is an extension of ReLU, here a leaky term is introduced to prevent total loss of negative activations, as unsigned (or non-zero centered) activation can lead to zip-zag problem and discussed in CS231n.
This can be expanded as
the first order derivative can be determined as follows:
Similar to ReLU, the LReLU first order derivative is:
tanh(x)-hyperbolic tangent
It can also be written as a trigonometric function:
The first-order derivative is as follows:
Applying the differentiation rule of division
Solving the individual derivative, i.e. derivative of sinh and cosh are cosh and sinh respectively, thus,
Separating the denominator
Simplifying the equation
Simplifying the trigonometric function
Final first-order derivative is as follows:
sigmoid(x) — Logistic function
[EDIT] Fix minor errors.