Derivative of Neural Activation Function

Yash Garg
4 min readOct 9, 2019

--

Derivative are fundamental to optimization of neural network. Activation functions allow for non-linearity in an inherently linear model (y = wx + b), which nothing but a sequence of linear operations.

There are various type of activation functions: linear, ReLU, LReLU, PReLU, step, sigmoid, tank, softplus, softmax and many other.

In this particular story, we will focus on the first order derivative of ReLU, LReLU, sigmoid, and tanh activation functions as they are critical to the optimization of the neural network to learn a high performing network weights (parameters). Feel free to list other activations in the comments that you would like me to include in this story or discuss in a separate story.

  1. relu(x) — Rectified Linear Unit
  2. lrelu(x) — Leaky Rectified Linear Unit
  3. sigmoid(x) — logistic function
  4. tanh(x) — hyperbolic tangent

relu (x): Rectified Linear Unit

The Relu(x) can be written as

Rectified Linear Unit

and can be expanded as

Expanded formed of Rectified Linear Unit

Therefore, the first order derivative can be written as

First order derivative in long form

and can be simplified as

Simplifying derivative

here, we can see that the individual derivative can be solved as follows:

Standard derivatives of a variable and a constant

*note: derivative of constant is 0 not the constant itself.
Therefore,

Derivative of individual cases of Rectified Linear Unit

as,

Comparison of individual case to standard derivatives

so the final derivative of relu(x), can be written as

First order derivative of ReLU

lrelu(x): Leaky Rectified Linear Unit

LReLU is an extension of ReLU, here a leaky term is introduced to prevent total loss of negative activations, as unsigned (or non-zero centered) activation can lead to zip-zag problem and discussed in CS231n.

Leaky Rectified Liner Unit

This can be expanded as

Leaky ReLU Expanded

the first order derivative can be determined as follows:

Derivative in long form

Similar to ReLU, the LReLU first order derivative is:

First Order Derivative of LReLU

tanh(x)-hyperbolic tangent

Hyperbolic tangent (tanh) as a function of exponential

It can also be written as a trigonometric function:

The first-order derivative is as follows:

Applying the differentiation rule of division

Derivative of a fraction (Rule of Division)

Solving the individual derivative, i.e. derivative of sinh and cosh are cosh and sinh respectively, thus,

Separating the denominator

Simplifying the equation

Simplifying the trigonometric function

Final first-order derivative is as follows:

sigmoid(x) — Logistic function

The sigmoid function
First order derivative of sigmoid
Removing the denominator
Solving the RHs
Further solving the individual components
First order derivative of individual components
Simplifying the equation
Adding and subtracting “1"
Rearranging the equation
Further rearranging the equation
Simplifying the equation
Replacing the unit with sigmoid function

[EDIT] Fix minor errors.

--

--

Yash Garg
Yash Garg

Written by Yash Garg

Machine Learning Researcher, Nokia Bell Labs

No responses yet