7. Conditional Bivariate Gaussians

Let’s learn about bivariate conditional gaussian distributions.

7.1. Distribution

For two gaussian variables, \(X_1\) and \(X_2\), the probability of \(X_1\) given \(X_2\) is defined as follows.

\(P(X_1|X_2=a) \sim \mathcal{N}\left( \mu_1 + \dfrac{\sigma_1}{\sigma_2}\rho(a - \mu_2), (1 - \rho^2)\sigma_1^2 \right)\),


  • \(\mu_1\) is the mean of \(X_1\)

  • \(\mu_2\) is the mean of \(X_2\)

  • \(\sigma_1\) is the standard deviation of \(X_1\)

  • \(\sigma_2\) is the standard deviation of \(X_2\)

  • \(\rho\) is the correlation between \(X_1\) and \(X_2\)

A couple of things to note here.

  • \(P(X_1|X_2=a)\) is drawn from a univariate normal distribution

  • Notice the \(X_2=a\) part? This means that every probability evaluation of \(X_1=b\) will considered at \(X_2=a\)

7.2. Simulation

Let’s simulate two variables \(X\) and \(Y\).

  • \(X \sim \mathcal{N}(1, 1)\)

  • \(Y \sim \mathcal{N}(1 + 3.5 \times X, 1)\)

Note that \(Y\) is dependent on \(X\), but, \(X\) is not dependent on \(Y\). The dependency of \(Y\) on \(X\) implies \(P(Y|X) > P(X|Y)\) (the probability of \(Y\) given \(X\) should be greater than the probability of \(X\) given \(Y\)). This implication has use for causality since there is asymmetry; \(P(Y|X) \neq P(X|Y)\). Of course, we can manipulate the dependency and distributions so that \(P(Y|X) = P(X|Y)\), but, in real life, such relationships are rather rare.

import numpy as np


N = 1000

x = np.random.normal(1, 1, N)
y = np.random.normal(1 + 3.5 * x, 1, N)

data = np.vstack([x, y]).T
means = data.mean(axis=0)
mins = data.min(axis=0)
maxs = data.max(axis=0)
cov = np.cov(data.T)
std = np.sqrt(cov)
cor = np.corrcoef(data.T)

print('stddev matrix')
print('correlation matrix')
[1.01277839 4.52863965]

[-2.31079823 -8.33142158]

[ 3.92919388 17.57679341]

stddev matrix
[[0.98156075 1.8353206 ]
 [1.8353206  3.5714234 ]]

correlation matrix
[[1.        0.9608716]
 [0.9608716 1.       ]]

7.3. Modeling conditional bivariate gaussian

After we simulate the data, we can estimate the means, variances, standard deviations and correlations from the data. Then, we can build a model of the conditional normal gaussian. Below, we use a class to model the conditional normal gaussian, CondNorm.

from scipy.stats import norm
import pandas as pd
import itertools

class CondNormal(object):
    def __init__(self, m_1, m_2, s_1, s_2, p, zero=0.0000001):
        self.m_1 = m_1
        self.m_2 = m_2
        self.s_1 = s_1
        self.s_2 = s_2
        self.p = p
        self.zero = zero

    def pdf(self, a, b):
        m = self.m_1 + (self.s_1 / self.s_2) * self.p * (a - self.m_2)
        s = (1.0 - np.power(self.p, 2.0)) * np.power(self.s_1, 2.0)
        p = norm.pdf(b, loc=m, scale=s)
        p = np.log(p) if p >= self.zero else p
        return p if pd.notna(p) else 0.0

    def empirical_log_proba(self, data):
        return sum([self.pdf(a, b) for a, b in data])

Note the following.

  • p_x_y is the model of \(X | Y = a\) or \(Y \rightarrow X\), denoted \(\mathcal{N}_{X|Y}\)

  • p_y_x is the model of \(Y | X = a\) or \(X \rightarrow Y\), denoted \(\mathcal{N}_{Y|X}\)

p_x_y = CondNormal(means[0], means[1],
                   std[0][0], std[1][1], cor[0][1])
p_y_x = CondNormal(means[1], means[0],
                   std[1][1], std[0][0], cor[1][0])

7.4. Evaluate models

Here, we evaluate the log probability of the data given the two models. Remeber Bayes’ Theorem.

\(P(M|D) = \dfrac{P(D|M)P(M)}{P(D)}\)


  • \(M\) is the model (e.g. \(\mathcal{N}_{X|Y}\) and \(\mathcal{N}_{Y|X}\))

  • \(D\) is the data

\(P(D)\) is the normalizing constant and drops out; \(P(M)\) is assumed to be uniform and also drops out. Thus, the following.

\(P(M|D) \propto P(D|M)\)

In theory, the likelihood is

\(P(D|M) = P(d_1|M) \times P(d_2|M) \times \cdots \times P(d_n|M)\)

In practice,

\(P(D|M) = P(d_1|M) \times P(d_2|M) \times \cdots \times P(d_n|M) \propto \sum \log P(d_i|M)\)

where a higher score is better (all the scores will be negative since the log of \(x \in [0, 1]\) is \(\leq 0\)).

We see below that \(P(D|\mathcal{N}_{X|Y}) < P(D|\mathcal{N}_{Y|X})\).

p_x_y.empirical_log_proba([(v[1], v[0]) for v in data])
p_y_x.empirical_log_proba([(v[0], v[1]) for v in data])