9. Causal Inference

Let’s learn about causal inference using Bayesian Belief Networks (BBN).

[1]:

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import networkx as nx
import warnings

plt.style.use('seaborn')
warnings.filterwarnings('ignore')
np.random.seed(37)

9.1. Gender, drug and recovery

Imagine that we have an illness or disease, and that some patients were given a drug and some were not for treatment. We kept good records of three variables.

gender: whether the patient was female or male
drug: whether the patient took the drug or not
recovery: whether the patient recovered from the illness

We can model the interactions of these variables using a BBN.

9.1.1. Structure

The structure of a BBN is a directed acyclic graph DAG. We will visualize the structure later, but for now, the relationships in the model is as follows.

gender \(\rightarrow\) drug
gender \(\rightarrow\) recovery
drug \(\rightarrow\) recovery

9.1.2. Parameters

For each variable, we need to define local probability models, and, here, they come in the form of conditional probability tables.

[2]:

gender_probs = [0.49, 0.51]

print(gender_probs)

[0.49, 0.51]

[3]:

drug_probs = [0.23323615160349853, 0.7667638483965015,
              0.7563025210084033, 0.24369747899159663]

print(drug_probs[0:2])
print(drug_probs[2:4])

[0.23323615160349853, 0.7667638483965015]
[0.7563025210084033, 0.24369747899159663]

[4]:

recovery_probs = [0.31000000000000005, 0.69,
                  0.27, 0.73,
                  0.13, 0.87,
                  0.06999999999999995, 0.93]

print(recovery_probs[0:2])
print(recovery_probs[2:4])
print(recovery_probs[4:6])
print(recovery_probs[6:8])

[0.31000000000000005, 0.69]
[0.27, 0.73]
[0.13, 0.87]
[0.06999999999999995, 0.93]

9.1.3. Bayesian network

With the structure and parameters defined, now we can build a BBN.

[5]:

from pybbn.graph.dag import Bbn
from pybbn.graph.edge import Edge, EdgeType
from pybbn.graph.jointree import EvidenceBuilder
from pybbn.graph.node import BbnNode
from pybbn.graph.variable import Variable
from pybbn.pptc.inferencecontroller import InferenceController

X = BbnNode(Variable(1, 'drug', ['false', 'true']), drug_probs)
Y = BbnNode(Variable(2, 'recovery', ['false', 'true']), recovery_probs)
Z = BbnNode(Variable(0, 'gender', ['female', 'male']), gender_probs)

bbn = Bbn() \
    .add_node(X) \
    .add_node(Y) \
    .add_node(Z) \
    .add_edge(Edge(Z, X, EdgeType.DIRECTED)) \
    .add_edge(Edge(Z, Y, EdgeType.DIRECTED)) \
    .add_edge(Edge(X, Y, EdgeType.DIRECTED))

9.1.4. Visualize Bayesian network structure

Let’s see the structure of this network.

[6]:

g, labels = bbn.to_nx_graph()

[7]:

fig, ax = plt.subplots(figsize=(10, 5))

pos = nx.nx_agraph.graphviz_layout(g, prog='dot', args='-Kdot')

params = {
    'node_color': 'r',
    'node_size': 1000,
    'node_shape': 'o',
    'alpha': 0.5,
    'pos': pos,
    'ax': ax
}
_ = nx.drawing.nx_pylab.draw_networkx_nodes(g, **params)

params = {
    'labels': labels,
    'font_size': 20,
    'font_color': 'k',
    'font_family': 'monospace',
    'font_weight': 'bold',
    'pos': pos,
    'ax': ax
}
_ = nx.drawing.nx_pylab.draw_networkx_labels(g, **params)

params = {
    'width': 1.5,
    'alpha': 0.8,
    'edge_color': 'b',
    'arrowsize': 30,
    'pos': pos,
    'ax': ax
}
_ = nx.drawing.nx_pylab.draw_networkx_edges(g, **params)

_ = ax.set_title('Gender, drug and recovery')

plt.tight_layout()

9.2. Sample data

Let’s sample some data from the BBN and observe some summary statistics about the sampled data. Additionally, the data will help us answer if taking the drug does really help in recovery.

[8]:

from pybbn.sampling.sampling import LogicSampler

sampler = LogicSampler(bbn)
data = pd.DataFrame(sampler.get_samples(n_samples=10000, seed=37))
data = data.rename(columns=labels)

9.3. Simpson’s paradox

In general, the sample population made good recovery (80%).

[9]:

s = data.recovery.value_counts().sort_index()
s /= s.sum()

fig, ax = plt.subplots(figsize=(7, 4))
_ = s.plot.bar(ax=ax, color=['b', 'r'])
_ = ax.set_title('General recovery')
_ = ax.set_ylabel('%')

If we segment recovery by whether or not a patient takes the drug, we find that

82% of patients recovered if they did NOT take the drug, compared to 78% if they did take the drug (not taking the drug increases your chance of recovery), and
18% of the patients did NOT recover if they did NOT take the drug, compared to 22% if they did NOT recover but did take the drug (not taking the drug decreases of chance of not recovering).

The comparison might lead one to interpret the outcome that NOT taking the drug gives us a better chance of recovery.

[10]:

s0 = data[data.drug == 'false'].recovery.value_counts().sort_index()
s0 /= s0.sum()

s1 = data[data.drug == 'true'].recovery.value_counts().sort_index()
s1 /= s1.sum()

fig, ax = plt.subplots(figsize=(7, 4))
_ = pd.DataFrame(
    [[s0[0], s1[0]], [s0[1], s1[1]]],
    columns=['drug=false', 'drug=true'],
    index=['recovered=false', 'recovered=true']).plot.bar(ax=ax, color=['b', 'r'])

Now, we segment based on gender. Notice

for both females and males, taking the drug did increase recovery, and
for both females and males, not taking the drug did decrease non-recovery.

The observation here contradicts the one we had before without segmenting on gender. Before, not taking the drug was helpful, now, segmenting on gender, not taking the drug is harmful. This contradiction is Simpson's paradox. So is taking the drug helpful or harmful?

[11]:

df = data[data.gender == 'female']

s0 = df[df.drug == 'false'].recovery.value_counts().sort_index()
s0 /= s0.sum()

s1 = df[df.drug == 'true'].recovery.value_counts().sort_index()
s1 /= s1.sum()

f_df = pd.DataFrame(
    [[s0[0], s1[0]], [s0[1], s1[1]]],
    columns=['drug=false', 'drug=true'],
    index=['recovered=false', 'recovered=true'])

df = data[data.gender == 'male']

s0 = df[df.drug == 'false'].recovery.value_counts().sort_index()
s0 /= s0.sum()

s1 = df[df.drug == 'true'].recovery.value_counts().sort_index()
s1 /= s1.sum()

m_df = pd.DataFrame(
    [[s0[0], s1[0]], [s0[1], s1[1]]],
    columns=['drug=false', 'drug=true'],
    index=['recovered=false', 'recovered=true'])

fig, ax = plt.subplots(1, 2, figsize=(15, 4))

_ = f_df.plot.bar(ax=ax[0], color=['b', 'r'])
_ = m_df.plot.bar(ax=ax[1], color=['b', 'r'])

_ = ax[0].set_title('Female recovery')
_ = ax[1].set_title('Male recovery')

9.4. do-operator

Dr. Judea Pearl says that we can find out if taking the drug does really help or harm recovery (direction) and by how much (magnitude). Dr. Pearl starts by distguishing between conditional and interventional probabilities.

The conditional probabilities are what we are observing and interpreting above. Conditional probabilities are the result of merely filtering through data.
The interventional probabilities are the ones that result if we can somehow manipulate or intervene in the treatment process. Interventional probabilities are the result of changing the system/process, and thus, the data.

Said by Dr. Pearl.

When we condition on a variable, we change nothing; we merely narrow our foucs to the subset of cases in which the variable takes the value we are interested in. What changes, then, is our perception about the world, not the world itself.

Among many other amazing results, his casual calculus is able to frame interventional probabilities in terms of conditional probabilities and directly addresses causal inference in observational data (which is quite often controversial to use to draw causal conclusions, as opposed to experimental data). The main idea behind causal inference or estimating causal impact is the do-operator. The do-operator is the most simplest thing to do (no pun intended), but it takes some careful and critical thinking into why it actually works. In a nutshell, the do-operator is a graphical procedure where we remove the incoming links to a node (e.g. drug) for which we are trying to estimate causal impact (removing the links from a node’s parents) to another node (e.g. recovery). This operation is graph surgery as we are cutting the graph and removing edges. Once we have the manipulated graph from the do-operation, we are able to estimate causal impact. Note that the do-operator enables us to estimate causal impact because it removes spurious paths.

With the do-operator, we can estimate casual impact as follows (Pearl calls this equation the adjustment formula).

\(P_m(Y=y|X=x) = P(Y=y|\text{do}(X=x)) = \displaystyle \sum_z P(Y=y|X=x, Z=z)P(Z=z)\)

Where

\(P_m(Y=y|X=x)\) is the interventional probability of \(Y\) given \(X\) (uses manipulated model)

In our running example,

\(X\) is drug,
\(Y\) is recovery, and
\(Z\) is gender.

Again, note that the interventional probability \(P_m\) is defined entirely in terms of \(P\), the conditional probability (the do-operator does not apply in the manipulated model or graph). Wow! Since we have a way to compute \(P_m\) in terms of \(P\), we do not really have to do graph surgery. We can leave the graph structure as it is; or leave the edges as they are. Now, we need to find nodes to condition on to block spurious paths. The backdoor criterion, can help us identify which set of variables to focus on. Stated by Perl.

Backdoor Criterion: Given an ordered pair of variables (X, Y) in a directed acylic graph G, a set of variables Z satisfies the backdoor criterion relative to (X, Y) if no node in Z is a descendant of X, and Z blocks every path between X and Y that contains an arrow into it.

Any set of variables \(Z\) that d-separates \(X\) and \(Y\) satisfies the backdoor criterion.

9.5. Causal impact of drug on recovery

Let’s see if we can use the adjustment formula to estimate the causal impact of drug on recovery. The data we sampled earlier will help us compute the conditional and marginal probabilities required. In expanded form, the adjustment formula looks like the following.

\(P(Y=1|\text{do}(X=1)) = P(Y=1|X=1,Z=1)P(Z=1) + P(Y=1|X=1,Z=0)P(Z=0)\)
\(P(Y=1|\text{do}(X=0)) = P(Y=1|X=0,Z=1)P(Z=1) + P(Y=1|X=0,Z=0)P(Z=0)\)

\(P(Y=1|\text{do}(X=1))\) is the interventional probability of recovery given that we have taken the drug, and \(P(Y=1|\text{do}(X=0))\) is the interventional probability of recovery given that we have NOT taken the drug.

The average causal effect ACE is given as follows.

\(\text{ACE} = P(Y=1|\text{do}(X=1)) - P(Y=1|\text{do}(X=0))\)

The ACE is the expected casual impact of recovery using the drug.

[12]:

def get_prob(q):
    return data.query(q).shape[0] / N

N = data.shape[0]

n = get_prob('recovery=="true" and drug=="true" and gender=="male"')
d = get_prob('drug=="true" and gender=="male"')
p_y1_x1z1 = n / d

n = get_prob('recovery=="true" and drug=="true" and gender=="female"')
d = get_prob('drug=="true" and gender=="female"')
p_y1_x1z0 = n / d

p_z1 = data[data.gender == 'male'].shape[0] / N
p_z0 = data[data.gender == 'female'].shape[0] / N

do_y1x1 = p_y1_x1z1 * p_z1 + p_y1_x1z0 * p_z0
print(f'P(Y=1|do(X=1)) = {do_y1x1:.5f}')

P(Y=1|do(X=1)) = 0.82931

[13]:

n = get_prob('recovery=="true" and drug=="false" and gender=="male"')
d = get_prob('drug=="false" and gender=="male"')
p_y1_x0z1 = n / d

n = get_prob('recovery=="true" and drug=="false" and gender=="female"')
d = get_prob('drug=="false" and gender=="female"')
p_y1_x0z0 = n / d

do_y1x0 = p_y1_x0z1 * p_z1 + p_y1_x0z0 * p_z0
print(f'P(Y=1|do(X=0)) = {do_y1x0:.5f}')

P(Y=1|do(X=0)) = 0.77281

[14]:

ace = do_y1x1 - do_y1x0
print(f'ACE = {ace:.5f}')

ACE = 0.05650

We get an increase of about 6% (direction and magnitude) in recovery if we take the drug. Notice how we did not have to actually employ graph surgery or the do-operator and we used the backdoor criterion and only the conditional and marginal probabilities?

9.6. Manipulated graph

We can actually use the do-operator and get a manipulated graph and simply compute the interventional conditional.

[15]:

# notice the parameter change in X (drug or drug probabilities)
gender_probs = [0.49, 0.51]
drug_probs = [0.5, 0.5]
recovery_probs = [0.31000000000000005, 0.69,
                  0.27, 0.73,
                  0.13, 0.87,
                  0.06999999999999995, 0.93]

X = BbnNode(Variable(1, 'drug', ['false', 'true']), drug_probs)
Y = BbnNode(Variable(2, 'recovery', ['false', 'true']), recovery_probs)
Z = BbnNode(Variable(0, 'gender', ['female', 'male']), gender_probs)

# notice the missing edge Z -> X
bbn = Bbn() \
    .add_node(X) \
    .add_node(Y) \
    .add_node(Z) \
    .add_edge(Edge(Z, Y, EdgeType.DIRECTED)) \
    .add_edge(Edge(X, Y, EdgeType.DIRECTED))

join_tree = InferenceController.apply(bbn)
join_tree.get_posteriors()

[15]:

{'drug': {'false': 0.5, 'true': 0.5},
 'recovery': {'false': 0.1931, 'true': 0.8069},
 'gender': {'female': 0.49, 'male': 0.51}}

\(P(Y=1|\text{do}(X=1)) = 0.832\)

[16]:

ev = EvidenceBuilder() \
    .with_node(join_tree.get_bbn_node_by_name('drug')) \
    .with_evidence('true', 1.0) \
    .build()
join_tree.unobserve_all()
join_tree.set_observation(ev)
join_tree.get_posteriors()

[16]:

{'drug': {'false': 0.0, 'true': 1.0},
 'recovery': {'false': 0.16799999999999998, 'true': 0.8320000000000001},
 'gender': {'female': 0.49, 'male': 0.51}}

\(P(Y=1|\text{do}(X=0)) = 0.782\)

[17]:

ev = EvidenceBuilder() \
    .with_node(join_tree.get_bbn_node_by_name('drug')) \
    .with_evidence('false', 1.0) \
    .build()
join_tree.unobserve_all()
join_tree.set_observation(ev)
join_tree.get_posteriors()

[17]:

{'drug': {'false': 1.0, 'true': 0.0},
 'recovery': {'false': 0.21820000000000003, 'true': 0.7817999999999999},
 'gender': {'female': 0.49, 'male': 0.51}}

\(\text{ACE} = P(Y=1|\text{do}(X=1)) - P(Y=1|\text{do}(X=0)) = 0.05\) which is not too far off from using the adjustment formula based on the sampled data.