7. Differentiation and Gradient

For a given function, \(y = f(x)\), differentiation of the function provides the velocity of change at a particular point (assuming \(f(x)\) is differentiable at the point \(x\)). Differentiating a function once is called the first-order derivative and denoted \(y' = f'(x)\). You will also see other notations denoting the first-order derivative.

  • \(y' = f'(x)\)

  • \(\dfrac{\mathrm{d} y}{\mathrm{d} x} = \dfrac{\mathrm{d} f(x)}{\mathrm{d} x}\)

  • \(y' = \nabla f(x)\)

The value of the derivative at a particular point describes the slope of the line tangent to that point.

The second-order derivative is denoted as follows and provides the acceleration of change at a particular point.

  • \(y'' = f''(x)\)

  • \(\dfrac{\mathrm{d}^2 y}{\mathrm{d} x^2} = \dfrac{\mathrm{d}^2 f(x)}{\mathrm{d} x^2}\)

  • \(y' = \nabla^2 f(x)\)

Being able to compute the derivative of a function and using it to evaluate the rate of change at a particular point is useful for optimization problems (eg finding the global minimum typically equates to finding a point whose tangent line has a slope of zero).

When there are more than one input variable, the gradient generalizes differentiation for the multivariable case. For example, assume we have the following function.

  • \(y = x_0^2 + x_1^2\)

We would like to know how \(y\) changes with respect to \(x_0\), \(\dfrac{\partial y}{\partial x_0}\), and also with respect to \(x_1\), \(\dfrac{\partial y}{\partial x_1}\).

  • \(\dfrac{\partial y}{\partial x_0} = 2 x_0\)

  • \(\dfrac{\partial y}{\partial x_1} = 2 x_1\)

The second order partial derivatives would look like the following.

  • \(\dfrac{\partial^2 y}{\partial x_0^2} = 2\)

  • \(\dfrac{\partial^2 y}{\partial x_1^2} = 2\)

Let’s have fun and try to compute the first and second order derivatives for some functions. We will use numdifftools to evaluate the derivatives and gradients of the following functions.

  • \(y = f(x) = x^2\)

  • \(y = f(x) = \sin(x)\)

  • \(y = f(x) = e^{-x}\)

  • \(y = f(x) = \tanh(x)\)

  • \(y = f(x_0, x_1) = \sin \left( \sqrt{x_0^2 + x_1^2} \right)\)

7.1. \(y = f(x) = x^2\)

This function is a parabola. As you can see, \(y'\) slopes up while \(y''\) is constant.

[1]:
import pandas as pd
import numpy as np
import numdifftools as nd
import matplotlib.pyplot as plt

def get_derivatives(f, x_min=-10, x_max=10, x_step=0.01):
    return pd.DataFrame({'x': np.arange(x_min, x_max + x_step, x_step)}) \
        .assign(**{
            "y": lambda d: f(d['x']),
            "y'": lambda d: nd.Derivative(f, n=1)(d['x']),
            "y''": lambda d: nd.Derivative(f, n=2)(d['x'])
        })

def plot_derivatives(df):
    fig, ax = plt.subplots(1, 2, figsize=(10, 3))

    df \
        .set_index(['x'])['y'] \
        .plot(kind='line', ax=ax[0], ylabel='y', title=r'$y = f(x)$')

    df \
        .set_index(['x'])[["y'", "y''"]] \
        .plot(kind='line', ax=ax[1], title=r"$y'$ and $y''$")

    fig.tight_layout()
[2]:
_temp = get_derivatives(lambda x: x ** 2)
plot_derivatives(_temp)
_images/differentiation_3_0.png

7.2. \(y = f(x) = \sin(x)\)

This function is a sine wave. As you can see \(y'\) and \(y''\) are also sinusoidal and nearly out of phase.

[180]:
_temp = get_derivatives(lambda x: np.sin(x))
plot_derivatives(_temp)
_images/differentiation_5_0.png

7.3. \(y = f(x) = e^{-x}\)

This function is an exponential function. Note that \(y'\) increases exponentially and \(y''\) decreases exponentially.

[4]:
_temp = get_derivatives(lambda x: np.exp(-x))
plot_derivatives(_temp)
_images/differentiation_7_0.png

7.4. \(y = f(x) = \tanh(x)\)

This function is the hyperbolic tangent function. For a certain range, \(y'\) and \(y''\) go in opposite ways.

[5]:
_temp = get_derivatives(lambda x: np.tanh(x))
plot_derivatives(_temp)
_images/differentiation_9_0.png

7.5. \(y = f(x_0, x_1) = \sin \left( \sqrt{x_0^2 + x_1^2} \right)\)

At last, we are at a multivariable function. To visualize the gradients, we will use gradient field visualizations as below.

[124]:
f = lambda x: np.sin(np.sqrt(np.power(x[0], 2.0) + np.power(x[1], 2.0)))
f1 = nd.Gradient(f, n=1)
f2 = nd.Gradient(f, n=2)

x0, x1 = np.meshgrid(np.arange(-5, 5.1, 0.1), np.arange(-5, 5.1, 0.1))
print(f'{x0.shape=}, {x1.shape=}')

x0, x1 = np.ravel(x0), np.ravel(x1)
print(f'{x0.shape=}, {x1.shape=}')

_df = pd.DataFrame({
    'x0': x0,
    'x1': x1
    }) \
    .assign(**{
        'y': lambda d: d.apply(f, axis=1),
        "y'": lambda d: d[['x0', 'x1']].apply(lambda r: f1(r), axis=1),
        "y''": lambda d: d[['x0', 'x1']].apply(lambda r: f2(r), axis=1),
        'g10': lambda d: d[["y'"]].apply(lambda r: r[0][0], axis=1),
        'g11': lambda d: d[["y'"]].apply(lambda r: r[0][1], axis=1),
        'g20': lambda d: d[["y''"]].apply(lambda r: r[0][0], axis=1),
        'g21': lambda d: d[["y''"]].apply(lambda r: r[0][1], axis=1)
    }) \
    .drop(columns=["y'", "y''"])
x0.shape=(101, 101), x1.shape=(101, 101)
x0.shape=(10201,), x1.shape=(10201,)
[179]:
fig, ax = plt.subplots(1, 2, figsize=(10, 5))

ax[0].quiver(_df['x0'], _df['x1'], _df['g10'], _df['g11'])
ax[1].quiver(_df['x0'], _df['x1'], _df['g20'], _df['g21'])

ax[0].set_xlabel(r'$x_0$')
ax[0].set_ylabel(r'$x_1$')
ax[1].set_xlabel(r'$x_0$')
ax[1].set_ylabel(r'$x_1$')
ax[0].set_title(r"$y'$")
ax[1].set_title(r"$y''$")

fig.tight_layout()
_images/differentiation_12_0.png