# 16. Gaussian Hidden Markov Models

Gaussian Hidden Markov Models, `GHHMs`

, are a type of HMMs where you have \(Z\) states generating a sequence \(X\) of values that are Gaussian distributed. \(Z\) is said to be `hidden`

since you never observe the state (at any particular time), and \(X\) is said to be observed since you can see the sequence. The hidden states have a transition matrix \(A\) that specifies the conditional probability of moving from the `i-th`

state to the `j-th`

state, denoted typically
as, \(a_{ij}\). The probability of an observation is Gaussian distributed with \(\theta\) parameters. Typically, you may have prior knowledge of the initial state of the system, and this is represented by a probability vector \(\pi\). A GHMM model, \(\lambda\), is defined as follows.

\(\lambda = (A, \theta, \pi)\)

We are usually interested in solving three fundamental problems with a HMM (including GHHM).

`evaluation`

: Given \(X\) and \(\lambda = (A, \theta, \pi)\), what is \(P(X|\lambda)\)?`recognition`

: Given \(X\) and \(\lambda = (A, \theta, \pi)\), what is the best hidden state sequence \(Z\) that best explains \(X\)?`training`

: Given \(X\), how do we learn the model parameters \(\lambda = (A, \theta, \pi)\) to maximize \(P(X|\lambda)\)?

These problems are well understood and there are canonical algorithms to solve each.

`evaluation`

: Evaluation is solved by the Foward-Backward algorithm.`recognition`

: Recognition is solved by dynamic programming algorithms known as Viterbi algorithm.`training`

: Training is solved by by the Baum-Welch algorithm, a special instance of the Expectation-Maximization (EM) algorithm.

## 16.1. Data

Let’s sample data from a fictitious GHHM.

\(A = \left[\begin{matrix}0.7 & 0.3 \\ 0.4 & 0.6\end{matrix}\right]\)

\(\theta = [\mathcal N(1, 1), \mathcal N(5, 1)]\)

\(\pi = [0.6, 0.4]\)

We will trace the hidden states in \(O_z\) and the observed values in \(O_x\).

```
[1]:
```

```
import numpy as np
from scipy.stats import norm
from random import choice
import bisect
import pandas as pd
np.random.seed(37)
p = np.array([0.6, 0.4])
Z = np.array([[0.7, 0.3], [0.4, 0.6]])
A = Z.cumsum(axis=1)
T = [norm(1, 1), norm(5, 1)]
O_z = []
O_x = []
n_iters = 100_000
z = bisect.bisect_left(p.cumsum(), np.random.random())
x = T[z].rvs()
for it in range(n_iters):
O_z.append(z)
O_x.append(x)
z = bisect.bisect_left(A[z], np.random.random())
x = T[z].rvs()
O_z = np.array(O_z)
O_x = np.array(O_x)
```

By tracing the sequence of hidden states in \(O_z\), we can compute the marginal probabilties (also called the unconditional probabilities) of the states in \(Z\).

```
[2]:
```

```
s = pd.Series(O_z).value_counts().sort_index()
s / s.sum()
```

```
[2]:
```

```
0 0.57421
1 0.42579
dtype: float64
```

Here’s a snippet of the sequence of \(Z\) in \(O_z\).

```
[3]:
```

```
O_z
```

```
[3]:
```

```
array([1, 1, 1, ..., 0, 1, 1])
```

Here’s a snippet of the sequence of \(X\) in \(O_x\).

```
[4]:
```

```
O_x
```

```
[4]:
```

```
array([ 3.62328106, 4.83910832, 4.3358552 , ..., -0.55832225,
3.61938788, 4.89798618])
```

We can visualize the distribution of \(O_x\). Clearly, the resulting density estimation of \(X\) is a Gaussian mixture.

```
[5]:
```

```
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
_ = pd.Series(O_x).plot(kind='kde', title='Density Estimation of X')
```

## 16.2. Training

We will use hmmlearn to illustrate how to solve the three fundamental problems above. First, we will learn the model using the `fit()`

function. The model is specified with `n_components=2`

to represent the number of hidden states, but, typically, we do not how many hidden states there actually are. In this running example, we know the number of hidden states because we created the GHMM.

```
[6]:
```

```
from hmmlearn import hmm
Q = O_x.reshape(O_x.shape[0], 1)
model = hmm.GaussianHMM(n_components=2, n_iter=2_000).fit(Q)
mus = np.ravel(model.means_)
sigmas = np.ravel(np.sqrt([np.diag(c) for c in model.covars_]))
P = model.transmat_
```

This result shows the estimated \(\theta\).

```
[7]:
```

```
mus, sigmas
```

```
[7]:
```

```
(array([1.00006561, 4.98655358]), array([1.00133963, 1.00569075]))
```

Here is the estimated \(A\).

```
[8]:
```

```
P
```

```
[8]:
```

```
array([[0.70078564, 0.29921436],
[0.40153297, 0.59846703]])
```

In general, the order of the elements in the estimated \(\theta\) and \(A\) will not correspond to the original ones. We simply got `lucky`

here that the states/elements corresponded!

## 16.3. Recognition

Recognition is solved using the `predict()`

method, which predicts the sequence of hidden state in \(Z\) from the sequence of observations in \(X\).

```
[9]:
```

```
hidden_states = model.predict(Q)
s = pd.Series(hidden_states).value_counts().sort_index()
s / s.sum()
```

```
[9]:
```

```
0 0.57365
1 0.42635
dtype: float64
```

You can also use the `decode()`

method too, which will return `(score, hidden state sequence)`

.

```
[10]:
```

```
_, hidden_states = model.decode(Q)
s = pd.Series(hidden_states).value_counts().sort_index()
s / s.sum()
```

```
[10]:
```

```
0 0.57365
1 0.42635
dtype: float64
```

## 16.4. Evaluation

The `score()`

method will return the log probability of the observed sequence.

```
[11]:
```

```
model.score(Q)
```

```
[11]:
```

```
-200384.94195070988
```