Derivative of Neural Activation Function

4 min readOct 9, 2019

--

Derivative are fundamental to optimization of neural network. Activation functions allow for non-linearity in an inherently linear model (y = wx + b), which nothing but a sequence of linear operations.

There are various type of activation functions: linear, ReLU, LReLU, PReLU, step, sigmoid, tank, softplus, softmax and many other.

In this particular story, we will focus on the first order derivative of ReLU, LReLU, sigmoid, and tanh activation functions as they are critical to the optimization of the neural network to learn a high performing network weights (parameters). Feel free to list other activations in the comments that you would like me to include in this story or discuss in a separate story.

relu(x) — Rectified Linear Unit
lrelu(x) — Leaky Rectified Linear Unit
sigmoid(x) — logistic function
tanh(x) — hyperbolic tangent

relu (x): Rectified Linear Unit

The Relu(x) can be written as

Rectified Linear Unit

and can be expanded as

Therefore, the first order derivative can be written as

and can be simplified as

here, we can see that the individual derivative can be solved as follows:

Standard derivatives of a variable and a constant

*note: derivative of constant is 0 not the constant itself.
Therefore,

Derivative of individual cases of Rectified Linear Unit

as,

Comparison of individual case to standard derivatives

so the final derivative of relu(x), can be written as

lrelu(x): Leaky Rectified Linear Unit

LReLU is an extension of ReLU, here a leaky term is introduced to prevent total loss of negative activations, as unsigned (or non-zero centered) activation can lead to zip-zag problem and discussed in CS231n.