# 7. Conditional Bivariate Gaussians

Let’s learn about bivariate conditional gaussian distributions.

## 7.1. Distribution

For two gaussian variables, $$X_1$$ and $$X_2$$, the probability of $$X_1$$ given $$X_2$$ is defined as follows.

$$P(X_1|X_2=a) \sim \mathcal{N}\left( \mu_1 + \dfrac{\sigma_1}{\sigma_2}\rho(a - \mu_2), (1 - \rho^2)\sigma_1^2 \right)$$,

where

• $$\mu_1$$ is the mean of $$X_1$$

• $$\mu_2$$ is the mean of $$X_2$$

• $$\sigma_1$$ is the standard deviation of $$X_1$$

• $$\sigma_2$$ is the standard deviation of $$X_2$$

• $$\rho$$ is the correlation between $$X_1$$ and $$X_2$$

A couple of things to note here.

• $$P(X_1|X_2=a)$$ is drawn from a univariate normal distribution

• Notice the $$X_2=a$$ part? This means that every probability evaluation of $$X_1=b$$ will considered at $$X_2=a$$

## 7.2. Simulation

Let’s simulate two variables $$X$$ and $$Y$$.

• $$X \sim \mathcal{N}(1, 1)$$

• $$Y \sim \mathcal{N}(1 + 3.5 \times X, 1)$$

Note that $$Y$$ is dependent on $$X$$, but, $$X$$ is not dependent on $$Y$$. The dependency of $$Y$$ on $$X$$ implies $$P(Y|X) > P(X|Y)$$ (the probability of $$Y$$ given $$X$$ should be greater than the probability of $$X$$ given $$Y$$). This implication has use for causality since there is asymmetry; $$P(Y|X) \neq P(X|Y)$$. Of course, we can manipulate the dependency and distributions so that $$P(Y|X) = P(X|Y)$$, but, in real life, such relationships are rather rare.

[1]:

import numpy as np

np.random.seed(37)

N = 1000

x = np.random.normal(1, 1, N)
y = np.random.normal(1 + 3.5 * x, 1, N)

data = np.vstack([x, y]).T
means = data.mean(axis=0)
mins = data.min(axis=0)
maxs = data.max(axis=0)
cov = np.cov(data.T)
std = np.sqrt(cov)
cor = np.corrcoef(data.T)

print('means')
print(means)
print('')
print('mins')
print(mins)
print('')
print('maxs')
print(maxs)
print('')
print('stddev matrix')
print(std)
print('')
print('correlation matrix')
print(cor)

means
[1.01277839 4.52863965]

mins
[-2.31079823 -8.33142158]

maxs
[ 3.92919388 17.57679341]

stddev matrix
[[0.98156075 1.8353206 ]
[1.8353206  3.5714234 ]]

correlation matrix
[[1.        0.9608716]
[0.9608716 1.       ]]


## 7.3. Modeling conditional bivariate gaussian

After we simulate the data, we can estimate the means, variances, standard deviations and correlations from the data. Then, we can build a model of the conditional normal gaussian. Below, we use a class to model the conditional normal gaussian, CondNorm.

[2]:

from scipy.stats import norm
import pandas as pd
import itertools

class CondNormal(object):
def __init__(self, m_1, m_2, s_1, s_2, p, zero=0.0000001):
self.m_1 = m_1
self.m_2 = m_2
self.s_1 = s_1
self.s_2 = s_2
self.p = p
self.zero = zero

def pdf(self, a, b):
m = self.m_1 + (self.s_1 / self.s_2) * self.p * (a - self.m_2)
s = (1.0 - np.power(self.p, 2.0)) * np.power(self.s_1, 2.0)
p = norm.pdf(b, loc=m, scale=s)
p = np.log(p) if p >= self.zero else p
return p if pd.notna(p) else 0.0

def empirical_log_proba(self, data):
return sum([self.pdf(a, b) for a, b in data])


Note the following.

• p_x_y is the model of $$X | Y = a$$ or $$Y \rightarrow X$$, denoted $$\mathcal{N}_{X|Y}$$

• p_y_x is the model of $$Y | X = a$$ or $$X \rightarrow Y$$, denoted $$\mathcal{N}_{Y|X}$$

[3]:

p_x_y = CondNormal(means[0], means[1],
std[0][0], std[1][1], cor[0][1])
p_y_x = CondNormal(means[1], means[0],
std[1][1], std[0][0], cor[1][0])


## 7.4. Evaluate models

Here, we evaluate the log probability of the data given the two models. Remeber Bayes’ Theorem.

$$P(M|D) = \dfrac{P(D|M)P(M)}{P(D)}$$

where

• $$M$$ is the model (e.g. $$\mathcal{N}_{X|Y}$$ and $$\mathcal{N}_{Y|X}$$)

• $$D$$ is the data

$$P(D)$$ is the normalizing constant and drops out; $$P(M)$$ is assumed to be uniform and also drops out. Thus, the following.

$$P(M|D) \propto P(D|M)$$

In theory, the likelihood is

$$P(D|M) = P(d_1|M) \times P(d_2|M) \times \cdots \times P(d_n|M)$$

In practice,

$$P(D|M) = P(d_1|M) \times P(d_2|M) \times \cdots \times P(d_n|M) \propto \sum \log P(d_i|M)$$

where a higher score is better (all the scores will be negative since the log of $$x \in [0, 1]$$ is $$\leq 0$$).

We see below that $$P(D|\mathcal{N}_{X|Y}) < P(D|\mathcal{N}_{Y|X})$$.

[4]:

p_x_y.empirical_log_proba([(v[1], v[0]) for v in data])

[4]:

-2262.259913305025

[5]:

p_y_x.empirical_log_proba([(v[0], v[1]) for v in data])

[5]:

-1407.7504048724836