7. Conditional Bivariate Gaussians
Let’s learn about bivariate conditional gaussian distributions.
7.1. Distribution
For two gaussian variables, \(X_1\) and \(X_2\), the probability of \(X_1\) given \(X_2\) is defined as follows.
\(P(X_1|X_2=a) \sim \mathcal{N}\left( \mu_1 + \dfrac{\sigma_1}{\sigma_2}\rho(a - \mu_2), (1 - \rho^2)\sigma_1^2 \right)\),
where
\(\mu_1\) is the mean of \(X_1\)
\(\mu_2\) is the mean of \(X_2\)
\(\sigma_1\) is the standard deviation of \(X_1\)
\(\sigma_2\) is the standard deviation of \(X_2\)
\(\rho\) is the correlation between \(X_1\) and \(X_2\)
A couple of things to note here.
\(P(X_1|X_2=a)\) is drawn from a univariate normal distribution
Notice the \(X_2=a\) part? This means that every probability evaluation of \(X_1=b\) will considered at \(X_2=a\)
7.2. Simulation
Let’s simulate two variables \(X\) and \(Y\).
\(X \sim \mathcal{N}(1, 1)\)
\(Y \sim \mathcal{N}(1 + 3.5 \times X, 1)\)
Note that \(Y\) is dependent on \(X\), but, \(X\) is not dependent on \(Y\). The dependency of \(Y\) on \(X\) implies \(P(Y|X) > P(X|Y)\) (the probability of \(Y\) given \(X\) should be greater than the probability of \(X\) given \(Y\)). This implication has use for causality
since there is asymmetry
; \(P(Y|X) \neq P(X|Y)\). Of course, we can manipulate the dependency and distributions so that \(P(Y|X) = P(X|Y)\), but, in real life,
such relationships are rather rare.
[1]:
import numpy as np
np.random.seed(37)
N = 1000
x = np.random.normal(1, 1, N)
y = np.random.normal(1 + 3.5 * x, 1, N)
data = np.vstack([x, y]).T
means = data.mean(axis=0)
mins = data.min(axis=0)
maxs = data.max(axis=0)
cov = np.cov(data.T)
std = np.sqrt(cov)
cor = np.corrcoef(data.T)
print('means')
print(means)
print('')
print('mins')
print(mins)
print('')
print('maxs')
print(maxs)
print('')
print('stddev matrix')
print(std)
print('')
print('correlation matrix')
print(cor)
means
[1.01277839 4.52863965]
mins
[-2.31079823 -8.33142158]
maxs
[ 3.92919388 17.57679341]
stddev matrix
[[0.98156075 1.8353206 ]
[1.8353206 3.5714234 ]]
correlation matrix
[[1. 0.9608716]
[0.9608716 1. ]]
7.3. Modeling conditional bivariate gaussian
After we simulate the data, we can estimate the means, variances, standard deviations and correlations from the data. Then, we can build a model of the conditional normal gaussian. Below, we use a class to model the conditional normal gaussian, CondNorm
.
[2]:
from scipy.stats import norm
import pandas as pd
import itertools
class CondNormal(object):
def __init__(self, m_1, m_2, s_1, s_2, p, zero=0.0000001):
self.m_1 = m_1
self.m_2 = m_2
self.s_1 = s_1
self.s_2 = s_2
self.p = p
self.zero = zero
def pdf(self, a, b):
m = self.m_1 + (self.s_1 / self.s_2) * self.p * (a - self.m_2)
s = (1.0 - np.power(self.p, 2.0)) * np.power(self.s_1, 2.0)
p = norm.pdf(b, loc=m, scale=s)
p = np.log(p) if p >= self.zero else p
return p if pd.notna(p) else 0.0
def empirical_log_proba(self, data):
return sum([self.pdf(a, b) for a, b in data])
Note the following.
p_x_y
is the model of \(X | Y = a\) or \(Y \rightarrow X\), denoted \(\mathcal{N}_{X|Y}\)p_y_x
is the model of \(Y | X = a\) or \(X \rightarrow Y\), denoted \(\mathcal{N}_{Y|X}\)
[3]:
p_x_y = CondNormal(means[0], means[1],
std[0][0], std[1][1], cor[0][1])
p_y_x = CondNormal(means[1], means[0],
std[1][1], std[0][0], cor[1][0])
7.4. Evaluate models
Here, we evaluate the log probability of the data given the two models. Remeber Bayes’ Theorem.
\(P(M|D) = \dfrac{P(D|M)P(M)}{P(D)}\)
where
\(M\) is the model (e.g. \(\mathcal{N}_{X|Y}\) and \(\mathcal{N}_{Y|X}\))
\(D\) is the data
\(P(D)\) is the normalizing constant
and drops out; \(P(M)\) is assumed to be uniform and also drops out. Thus, the following.
\(P(M|D) \propto P(D|M)\)
In theory, the likelihood
is
\(P(D|M) = P(d_1|M) \times P(d_2|M) \times \cdots \times P(d_n|M)\)
In practice,
\(P(D|M) = P(d_1|M) \times P(d_2|M) \times \cdots \times P(d_n|M) \propto \sum \log P(d_i|M)\)
where a higher score is better (all the scores will be negative since the log of \(x \in [0, 1]\) is \(\leq 0\)).
We see below that \(P(D|\mathcal{N}_{X|Y}) < P(D|\mathcal{N}_{Y|X})\).
[4]:
p_x_y.empirical_log_proba([(v[1], v[0]) for v in data])
[4]:
-2262.259913305025
[5]:
p_y_x.empirical_log_proba([(v[0], v[1]) for v in data])
[5]:
-1407.7504048724836