2. Differential Diagnosis of COVID-19 with Bayesian Belief Networks
Let’s see if a Bayesian Belief Network (BBN
) is able to diagnose the COVID-19 virus with any reasonable success. The idea is that a patients presents some symptoms, and we must diagnostically reason from the symptoms
back to the cause
. The BBN
is taken from BayesiaLab’s Differential Diagnosis model.
2.1. Data
The data is taken from the Hubei dataset. We will first load both sets of data.
[1]:
import pandas as pd
inside = pd.read_csv('./covid/data/00/COVID19_2020_open_line_list - Hubei.csv', low_memory=False)
outside = pd.read_csv('./covid/data/00/COVID19_2020_open_line_list - outside_Hubei.csv', low_memory=False)
outside = outside.drop(['data_moderator_initials'], axis=1)
data = pd.concat([inside, outside])
2.2. Data Transformation
We will apply transformations to the data, primarily on the symptoms. There are only about 200 unique symptoms on all the COVID-19 patients. We map these 200 unique symptoms in a many-to-many approach to 32 broad symptom categories. The following are the 32 broad symptom categories.
abdominal_pain
anorexia
anosmia
chest
chills
coronary
diarrhoea
digestive
discharge
dizziness
dry_cough
dryness
dyspnea
eye
fatigue
fever
headache
lungs
malaise
mild
muscle
myelofibrosis
nasal
nausea
respiratory
running_nose
sneezing
sore_throat
sputum
sweating
walking
wheezing
[2]:
import json
import itertools
from datetime import datetime
with open('./covid/data/00/symptom-mapping.json', 'r') as f:
symptom_map = json.load(f)
def tokenize(s):
if s is None or isinstance(s, float) or len(s) < 1 or pd.isna(s):
return None
try:
delim = ';' if ';' in s else ','
return [t.strip().lower() for t in s.split(delim) if len(t.strip()) > 0]
except:
return s
def map_to_symptoms(s):
if s.startswith('fever') or s.startswith('low fever'):
return ['fever']
return [k for k, v in symptom_map.items() if s in v]
d = data[['symptoms']].dropna(how='all').copy(deep=True)
print(d.shape)
d.symptoms = d.symptoms.apply(lambda s: tokenize(s))
d.symptoms = d.symptoms.apply(lambda tokens: [map_to_symptoms(s) for s in tokens] if tokens is not None else None)
d.symptoms = d.symptoms.apply(lambda arrs: None if arrs is None else list(itertools.chain(*arrs)))
for s in symptom_map.keys():
d[s] = d.symptoms.apply(lambda arr: 0 if arr is None else 1 if s in arr else 0)
d = d.drop(['symptoms'], axis=1)
print(d.shape)
(656, 1)
(656, 32)
[3]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('seaborn')
v = [d[d[c] == 1].shape[0] for c in d.columns]
s = pd.Series(v, d.columns)
fig, ax = plt.subplots(figsize=(15, 5))
_ = s.plot(kind='bar', ax=ax, title=f'Frequency of symptoms, n={d.shape[0]}')
plt.tight_layout()
2.3. Bayesian Belief Network
The BBN structure is a result of assuming independence between the symptoms, and we know this assumption is wrong. However, we know that if we do not assume independence between the symptoms, there are more parameters to estimate and/or provide. As for the parameters, according to the original authors of this BBN, the parameters are taken from a variety of sources.
The following are the variables (or nodes) in the BBN.
anosmia
chills
diarrhoea
dry_cough
dyspnea
fatigue
fever
headache
muscle
nasal
nausea
running_nose
sneezing
sore_throat
sputum
wheezing
Note that all these nodes, except disease
and flu_shot
are symptoms.
2.3.1. BBN structure
[4]:
from pybbn.graph.dag import Bbn
from pybbn.pptc.inferencecontroller import InferenceController
import json
with open('./covid/naive.json', 'r') as f:
bbn = Bbn.from_dict(json.load(f))
join_tree = InferenceController.apply(bbn)
The following shows the BBN structure. The disease
node points to all the symptoms, and the flu_shot
node points to the disease
node. The disease
node has the following values/states.
no_virus
rhinovirus
hmpv (Metapneumovirus)
hrsv (Respiratory syncytial)
influenza
covid19 (COVID-19)
[5]:
from pybbn.generator.bbngenerator import convert_for_drawing
import networkx as nx
import warnings
with warnings.catch_warnings():
warnings.simplefilter('ignore')
graph = convert_for_drawing(bbn)
pos = nx.nx_agraph.graphviz_layout(graph, prog='neato')
plt.figure(figsize=(15, 8))
plt.subplot(121)
labels = dict([(k, node.variable.name) for k, node in bbn.nodes.items()])
nx.draw(graph, pos=pos, with_labels=True, labels=labels)
plt.title('BBN DAG')
2.3.2. BBN Parameters
The following shows the marginal posteriors of the nodes.
[6]:
def potential_to_series(potential):
def get_entry_kv(entry):
arr = [(k, v) for k, v in entry.entries.items()]
arr = sorted(arr, key=lambda tup: tup[0])
return arr[0][1], entry.value
tups = [get_entry_kv(e) for e in potential.entries]
return pd.Series([tup[1] for tup in tups], [tup[0] for tup in tups])
series = [(node, potential_to_series(join_tree.get_bbn_potential(node))) for node in join_tree.get_bbn_nodes()]
n_cols = 3
n_rows = int(len(series) / n_cols)
fig, axes = plt.subplots(n_rows, n_cols, figsize=(10, 20))
axes = np.ravel(axes)
for ax, (node, s) in zip(axes, series):
s.plot(kind='bar', ax=ax, title=f'{node.variable.name}')
plt.tight_layout()
2.4. Diagnosis
Now we are ready to make diagnosis using the BBN. The total set of symptoms in the Hubei dataset (as we have transformed them) is 32, however, there are only 16 symptoms modeled into the BBN.
[7]:
%%time
from pybbn.graph.jointree import EvidenceBuilder
names = [
'anosmia', 'sputum', 'muscle', 'chills', 'fever',
'wheezing', 'nasal', 'fatigue', 'headache', 'sore_throat',
'dry_cough', 'diarrhoea', 'dyspnea', 'nausea', 'sneezing',
'running_nose'
]
predictions = []
for i, r in d.iterrows():
fields = [name for name in names if r[name] == 1]
join_tree.unobserve_all()
if len(fields) > 0:
bbn_nodes = [join_tree.get_bbn_node_by_name(f) for f in fields]
evidences = [EvidenceBuilder().with_node(n).with_evidence('t', 1.0).build() for n in bbn_nodes]
join_tree.update_evidences(evidences)
disease = join_tree.get_bbn_node_by_name('disease')
disease_potential = join_tree.get_bbn_potential(disease)
s = potential_to_series(disease_potential)
predictions.append(s)
CPU times: user 6.85 s, sys: 40.2 ms, total: 6.89 s
Wall time: 6.93 s
[8]:
predictions = pd.DataFrame(predictions)
predictions
[8]:
no_virus | rhinovirus | hmpv | hrsv | influenza | covid19 | |
---|---|---|---|---|---|---|
0 | 0.021350 | 0.011572 | 0.040865 | 0.058689 | 0.465734 | 0.401790 |
1 | 0.194664 | 0.056940 | 0.028707 | 0.085324 | 0.197598 | 0.436766 |
2 | 0.010166 | 0.013346 | 0.032700 | 0.040446 | 0.838125 | 0.065217 |
3 | 0.000525 | 0.017707 | 0.047804 | 0.135938 | 0.748414 | 0.049613 |
4 | 0.194664 | 0.056940 | 0.028707 | 0.085324 | 0.197598 | 0.436766 |
... | ... | ... | ... | ... | ... | ... |
651 | 0.001141 | 0.020118 | 0.034012 | 0.069237 | 0.875403 | 0.000089 |
652 | 0.242781 | 0.058875 | 0.031008 | 0.055796 | 0.327462 | 0.284078 |
653 | 0.242781 | 0.058875 | 0.031008 | 0.055796 | 0.327462 | 0.284078 |
654 | 0.021350 | 0.011572 | 0.040865 | 0.058689 | 0.465734 | 0.401790 |
655 | 0.242781 | 0.058875 | 0.031008 | 0.055796 | 0.327462 | 0.284078 |
656 rows × 6 columns
2.5. Diagnosis Performance
All the records/patients in the Hubei dataset are positively-tested COVID-19 patients. Thus, we have no non-COVID-19 patients, and so we will avoid using performance measures that requires negative examples.
2.5.1. Quasi-proper scoring rules
We will try using average precision and plot the precision recall curve. Note the absurdity of doing so. These performance measures are so-called quasi-proper scoring rules
.
[9]:
from sklearn.metrics import average_precision_score
y_true = np.ones(predictions.shape[0])
y_pred = predictions.covid19
ap = average_precision_score(y_true, y_pred)
print(f'average precision score is {ap:.5f}')
average precision score is 1.00000
[10]:
from sklearn.metrics import precision_recall_curve
pre, rec, _ = precision_recall_curve(y_true, y_pred)
fig, ax = plt.subplots(figsize=(15, 5))
_ = ax.step(rec, pre, color='b', alpha=0.5, where='post', label='PR curve')
_ = ax.set_xlabel('recall')
_ = ax.set_ylabel('precision')
_ = ax.set_title('Precision-Recall Curve')
2.5.2. Proper scoring rule
Instead, we use a proper scoring rule
such as the Brier loss. The Brier score is in the range \([0, 1]\), where a value closer to 0 is better. The Brier score essentially is the mean squared difference between the real probability and predicted one. As you can see, the Brier score is about 0.49. Is this value good or bad? It is right smack in the middle; meaning,
it is not un-useful, but could be.
[11]:
from sklearn.metrics import brier_score_loss
bsl = brier_score_loss(y_true, y_pred)
print(f'brier score loss = {bsl:.5f}')
brier score loss = 0.48920
2.5.3. Agreement
Here, we take a different approach to judging the BBN’s diagnostic reliability by looking at the counts of predicted patients to have COVID-19 versus the empirical counts.
First, we create strata based on the observed and unique combinations of symptoms and observe the empirical number of patients with such co-symptoms.
Second, for each unique combination of symptoms observed, we present such symptoms as evidence to the model and allow it to give us the probability of having COVID-19.
Third, we multiply the probability by the total number of patients observed across all the strata.
Lastly, we compare the
agreement
between the numbers predicted by the BBN and the empirical ones.
[12]:
def get_symptom_combinations(r):
fields = sorted([name for name in names if r[name] == 1])
return fields
def get_query(combination):
p_tokens = combination.split(',')
n_tokens = [n for n in names if n not in p_tokens]
p_tokens = [f'{t}==1' for t in p_tokens]
n_tokens = [f'{t}==0' for t in n_tokens]
tokens = p_tokens + n_tokens
query = ' and '.join(tokens)
return query
combinations = [get_symptom_combinations(r) for _, r in d.iterrows()]
combinations = [c for c in combinations if len(c) > 0]
combinations = [','.join(c) for c in combinations]
combinations = sorted(list(set(combinations)))
print(f'number of combinations {len(combinations)}')
queries = [get_query(c) for c in combinations]
# we lose 67 patients, they have no symptoms
strata = pd.DataFrame([(c, d.query(q).shape[0]) for c, q in zip(combinations, queries)], columns=['stratum', 'n'])
strata['n_symptoms'] = strata.stratum.apply(lambda s: len(s.split(',')))
print(f'number of patients {strata.n.sum()}')
number of combinations 103
number of patients 589
This is the distribution of the unique combinations of co-symptoms. Note that some symptoms may show up only by themselves.
[13]:
fig, ax = plt.subplots(figsize=(20, 5))
s = pd.Series(strata.n.values, strata.stratum.values)
_ = s.plot(kind='bar', ax=ax, title=f'Frequency of all symptom combinations, n={strata.n.sum()}')
In this graph, we remove strata that have only 1 symptom to remove the effect of visual skewness.
[14]:
s = strata[strata.n_symptoms > 1]
fig, ax = plt.subplots(figsize=(20, 5))
s = pd.Series(s.n.values, s.stratum.values)
_ = s.plot(kind='bar', ax=ax, title=f'Frequency of symptom combinations (more than 1), n={strata.n.sum()}')
Now we feed the symptoms in each of the stratum to the BBN and estimate the predicted counts of patients with COVID-19.
[15]:
import math
predictions = []
for i, r in strata.iterrows():
fields = r.stratum.split(',')
join_tree.unobserve_all()
if len(fields) > 0:
bbn_nodes = [join_tree.get_bbn_node_by_name(f) for f in fields]
evidences = [EvidenceBuilder().with_node(n).with_evidence('t', 1.0).build() for n in bbn_nodes]
join_tree.update_evidences(evidences)
disease = join_tree.get_bbn_node_by_name('disease')
disease_potential = join_tree.get_bbn_potential(disease)
s = potential_to_series(disease_potential)
predictions.append(s)
predictions = pd.DataFrame(predictions)
n = strata.n.sum()
preds = pd.DataFrame([(math.ceil(n * p), c) for p, c in zip(predictions.covid19, strata.n)], columns=['y_pred', 'y_true'])
Below, we visualize the predicted number of patients with COVID-19 given multiple symptoms with the model versus the empirical numbers. We use Pearson, Kendall, and Spearman correlations. The latter two correlation measures are
rank correlations and may be used to gauge at the agreement between the ranked predicted and empirical frequencies/counts. For all these correlation measures, the higher the value, the better the agreement. As can be seen below, there is positive agreement, and in some sense, especially with Spearman correlation
, the agreement is strong.
Let’s note that the few dots to the right correspond to stratum with a single symptom. This observation is not surprising, since the BBN assumes independence between the symptoms; meaning, we should expect agreement between the predicted and empirical counts when it comes to stratum with one symptom.
[16]:
from scipy.stats import spearmanr, kendalltau, pearsonr
spearman = spearmanr(preds.y_true, preds.y_pred).correlation
kendall = kendalltau(preds.y_true, preds.y_pred).correlation
pearson = pearsonr(preds.y_true, preds.y_pred)[0]
fig, ax = plt.subplots(figsize=(10, 5))
_ = ax.scatter(preds.y_true, preds.y_pred)
_ = ax.set_title(f'Counts of patients predicted to have COVID-19 vs empirical counts\npearson={pearson:.2f}, spearman={spearman:.2f}, kendall={kendall:.2f}')
_ = ax.set_xlabel('empirical counts')
_ = ax.set_ylabel('predicted counts')
[17]:
x = preds / preds.sum()
spearman = spearmanr(x.y_true, x.y_pred).correlation
kendall = kendalltau(x.y_true, x.y_pred).correlation
pearson = pearsonr(x.y_true, x.y_pred)[0]
fig, ax = plt.subplots(figsize=(10, 5))
_ = ax.scatter(x.y_true, x.y_pred)
_ = ax.set_title(f'Probabilities of patients predicted to have COVID-19 vs empirical counts\npearson={pearson:.2f}, spearman={spearman:.2f}, kendall={kendall:.2f}')
_ = ax.set_xlabel('empirical probability')
_ = ax.set_ylabel('predicted probability')
Here is the mean squared difference between the predicted probabilities (of frequencies) and the empirical ones. Wow! Almost zero!
[18]:
x.apply(lambda r: (r.y_pred - r.y_true)**2, axis=1).mean()
[18]:
0.0012162588488363877
Here is the Brier score for the predicted probabilities. Remember, Brier loss ranges from \([0, 1]\) and the lower the Brier loss, the better. This approach of judging the BBN means that the model is very bad at diagnosing COVID-19.
[19]:
brier_score_loss(np.ones(x.shape[0]), x.y_pred)
[19]:
0.9807848118843429
Here is the Brier score for the empirical probabilities. Whew! These two last results suggest maybe this way of judging the BBN is not correct.
[20]:
brier_score_loss(np.ones(x.shape[0]), x.y_true)
[20]:
0.9820609112681514
2.6. Misc
Ignore the code below. It will print out all the unique symptoms in the Hubei data. Useful for the symptom mapping exercise.
[21]:
# x = [tokenize(s) for s in data.symptoms if s is not None]
# x = [tokens for tokens in x if tokens is not None and len(tokens) > 0]
# n = len(x)
# x = list(itertools.chain(*[item for item in x]))
# for i, s in enumerate(sorted(list(set(x)))):
# print(f'{i}, {s}')