11. Dynamic Bayesian Networks, Hidden Markov Models

A Hidden Markov Model (HMM) is a special type of Bayesian Network (BN) called a Dynamic Bayesian Network (DNB). We will show how the two are related. A HMM may be represented in either matrix form for computation for as a graph for understanding the states and transitions. A DBN is a BN used to model time series data and can be used to model a HMM.

11.1. Hidden Markov Model

Below is a HMM with \(T\) transition matrix and \(E\) emission matrix. \(T\) specifies the transition probabilities between the hidden states and \(E\) specifies the output probabilities given the hidden state. Note that \(T\) is the hidden state of the weather (rainy or sunny) and \(E\) is the observation/output of the activity (walk, shop and clean).

[1]:
import numpy as np

T = np.array([[0.7, 0.3], [0.4, 0.6]]).cumsum(axis=1)

E = np.array([
    [0.1, 0.4, 0.5],
    [0.6, 0.3, 0.1]
]).cumsum(axis=1)

After we define \(T\) and \(E\), we can sample from \(T\) then \(E\) and keep count of the number of times we observe each hidden state and observed output. What we get at the end (after normalization) is the marginal probabilities of the hidden states and outputs. Note how we do not need the start probabilities and we can just pick a random starting hidden state.

[2]:
from random import choice
import bisect

np.random.seed(37)

C_t = np.zeros(T.shape[1])
C_e = np.zeros(E.shape[1])

n_iters = 100_000

t = choice(range(T.shape[0]))
e = bisect.bisect_left(E[t], np.random.random())

for it in range(n_iters):
    C_t[t] += 1
    C_e[e] += 1

    t_p, e_p = np.random.random(2)

    t = bisect.bisect_left(T[t], t_p)
    e = bisect.bisect_left(E[t], e_p)

C_t = C_t / C_t.sum()
C_e = C_e / C_e.sum()

C_t, C_e
[2]:
(array([0.57367, 0.42633]), array([0.31635, 0.35542, 0.32823]))

11.2. Dynamic Bayesian Network

We can model a HMM with a DBN. The DBN structure is endless, but we do not have to create an endless DBN structure. In fact, we only need the structure of the DNB at two consecutive time slices, and we assume that the structure within a time slice does not change (intra-relationship) and the relationship from the current to the next time slice does not change (inter-relationship). When we represent a DBN with two time slices, the DBN is said to be rolled up, but when we perform inference, we can unroll the DBN.

After we define the DBN, we observe the marginal probabilities of the hidden and output states at time 0, \(t_0\), and time 1, \(t_1\).

[3]:
from pybbn.graph.dag import Bbn
from pybbn.graph.edge import Edge, EdgeType
from pybbn.graph.jointree import EvidenceBuilder
from pybbn.graph.node import BbnNode
from pybbn.graph.variable import Variable
from pybbn.pptc.inferencecontroller import InferenceController

# create the nodes
w_curr = BbnNode(Variable(0, 'weather', ['rainy', 'sunny']), [0.6, 0.4])
a_curr = BbnNode(Variable(1, 'activity', ['walk', 'shop', 'clean']), [0.1, 0.4, 0.5, 0.6, 0.3, 0.1])
w_next = BbnNode(Variable(2, 'weather_next', ['rainy', 'sunny']), [0.7, 0.3, 0.4, 0.6])
a_next = BbnNode(Variable(3, 'activity_next', ['walk', 'shop', 'clean']), [0.1, 0.4, 0.5, 0.6, 0.3, 0.1])

# create the network structure
bbn = Bbn() \
    .add_node(w_curr) \
    .add_node(a_curr) \
    .add_node(w_next) \
    .add_node(a_next) \
    .add_edge(Edge(w_curr, a_curr, EdgeType.DIRECTED)) \
    .add_edge(Edge(w_curr, w_next, EdgeType.DIRECTED)) \
    .add_edge(Edge(w_next, a_next, EdgeType.DIRECTED))

# convert the BBN to a join tree
join_tree = InferenceController.apply(bbn)

for node in [w_curr, w_next, a_curr, a_next]:
    potential = join_tree.get_bbn_potential(node)
    print(node)
    print(potential)
    print('-' * 15)
0|weather|rainy,sunny
0=rainy|0.60000
0=sunny|0.40000
---------------
2|weather_next|rainy,sunny
2=rainy|0.58000
2=sunny|0.42000
---------------
1|activity|walk,shop,clean
1=walk|0.30000
1=shop|0.36000
1=clean|0.34000
---------------
3|activity_next|walk,shop,clean
3=walk|0.31000
3=shop|0.35800
3=clean|0.33200
---------------

We can perform inference throughout time on a DBN by using the probabilities of the next hidden states as the new start probabilities to the next time slice. Only the start probabilities propagate and the other conditional probabilities remain fixed. In the code below, we unroll the DBN from \(t_0\) to \(t_{20}\).

[4]:
import pandas as pd

def get_probs(jt, node):
    entries = jt.get_bbn_potential(node).entries
    return [e.value for e in entries]

p = get_probs(join_tree, w_curr)
jt = InferenceController.reapply(join_tree, {w_curr.id: p})

probs = [p]

for _ in range(20):
    p = get_probs(jt, w_next)
    jt = InferenceController.reapply(jt, {w_curr.id: p})

    probs.append(p)

pd.DataFrame(probs, columns=['rainy', 'sunny'])
[4]:
rainy sunny
0 0.600000 0.400000
1 0.580000 0.420000
2 0.574000 0.426000
3 0.572200 0.427800
4 0.571660 0.428340
5 0.571498 0.428502
6 0.571449 0.428551
7 0.571435 0.428565
8 0.571430 0.428570
9 0.571429 0.428571
10 0.571429 0.428571
11 0.571429 0.428571
12 0.571429 0.428571
13 0.571429 0.428571
14 0.571429 0.428571
15 0.571429 0.428571
16 0.571429 0.428571
17 0.571429 0.428571
18 0.571429 0.428571
19 0.571429 0.428571
20 0.571429 0.428571

As you can see, the marginal probabilities for the hidden and observed states have stabilized.

[5]:
for node in [w_curr, w_next, a_curr, a_next]:
    potential = jt.get_bbn_potential(node)
    print(node)
    print(potential)
    print('-' * 15)
0|weather|rainy,sunny
0=rainy|0.57143
0=sunny|0.42857
---------------
2|weather_next|rainy,sunny
2=rainy|0.57143
2=sunny|0.42857
---------------
1|activity|walk,shop,clean
1=walk|0.31429
1=shop|0.35714
1=clean|0.32857
---------------
3|activity_next|walk,shop,clean
3=walk|0.31429
3=shop|0.35714
3=clean|0.32857
---------------