2. Differential Diagnosis of COVID-19 with Bayesian Belief Networks

Let’s see if a Bayesian Belief Network (BBN) is able to diagnose the COVID-19 virus with any reasonable success. The idea is that a patients presents some symptoms, and we must diagnostically reason from the symptoms back to the cause. The BBN is taken from BayesiaLab’s Differential Diagnosis model.

2.1. Data

The data is taken from the Hubei dataset. We will first load both sets of data.

[1]:

import pandas as pd

inside = pd.read_csv('./covid/data/00/COVID19_2020_open_line_list - Hubei.csv', low_memory=False)
outside = pd.read_csv('./covid/data/00/COVID19_2020_open_line_list - outside_Hubei.csv', low_memory=False)

outside = outside.drop(['data_moderator_initials'], axis=1)

data = pd.concat([inside, outside])

2.2. Data Transformation

We will apply transformations to the data, primarily on the symptoms. There are only about 200 unique symptoms on all the COVID-19 patients. We map these 200 unique symptoms in a many-to-many approach to 32 broad symptom categories. The following are the 32 broad symptom categories.

abdominal_pain
anorexia
anosmia
chest
chills
coronary
diarrhoea
digestive
discharge
dizziness
dry_cough
dryness
dyspnea
eye
fatigue
fever
headache
lungs
malaise
mild
muscle
myelofibrosis
nasal
nausea
respiratory
running_nose
sneezing
sore_throat
sputum
sweating
walking
wheezing

[2]:

import json
import itertools
from datetime import datetime

with open('./covid/data/00/symptom-mapping.json', 'r') as f:
    symptom_map = json.load(f)

def tokenize(s):
    if s is None or isinstance(s, float) or len(s) < 1 or pd.isna(s):
        return None
    try:
        delim = ';' if ';' in s else ','
        return [t.strip().lower() for t in s.split(delim) if len(t.strip()) > 0]
    except:
        return s

def map_to_symptoms(s):
    if s.startswith('fever') or s.startswith('low fever'):
        return ['fever']
    return [k for k, v in symptom_map.items() if s in v]


d = data[['symptoms']].dropna(how='all').copy(deep=True)
print(d.shape)

d.symptoms = d.symptoms.apply(lambda s: tokenize(s))
d.symptoms = d.symptoms.apply(lambda tokens: [map_to_symptoms(s) for s in tokens] if tokens is not None else None)
d.symptoms = d.symptoms.apply(lambda arrs: None if arrs is None else list(itertools.chain(*arrs)))

for s in symptom_map.keys():
    d[s] = d.symptoms.apply(lambda arr: 0 if arr is None else 1 if s in arr else 0)

d = d.drop(['symptoms'], axis=1)
print(d.shape)

(656, 1)
(656, 32)

[3]:

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

plt.style.use('seaborn')

v = [d[d[c] == 1].shape[0] for c in d.columns]
s = pd.Series(v, d.columns)

fig, ax = plt.subplots(figsize=(15, 5))
_ = s.plot(kind='bar', ax=ax, title=f'Frequency of symptoms, n={d.shape[0]}')

plt.tight_layout()

2.3. Bayesian Belief Network

The BBN structure is a result of assuming independence between the symptoms, and we know this assumption is wrong. However, we know that if we do not assume independence between the symptoms, there are more parameters to estimate and/or provide. As for the parameters, according to the original authors of this BBN, the parameters are taken from a variety of sources.

The following are the variables (or nodes) in the BBN.

anosmia
chills
diarrhoea
dry_cough
dyspnea
fatigue
fever
headache
muscle
nasal
nausea
running_nose
sneezing
sore_throat
sputum
wheezing

Note that all these nodes, except disease and flu_shot are symptoms.

2.3.1. BBN structure

[4]:

from pybbn.graph.dag import Bbn
from pybbn.pptc.inferencecontroller import InferenceController
import json

with open('./covid/naive.json', 'r') as f:
    bbn = Bbn.from_dict(json.load(f))
join_tree = InferenceController.apply(bbn)

The following shows the BBN structure. The disease node points to all the symptoms, and the flu_shot node points to the disease node. The disease node has the following values/states.

no_virus
rhinovirus
hmpv (Metapneumovirus)
hrsv (Respiratory syncytial)
influenza
covid19 (COVID-19)

[5]:

from pybbn.generator.bbngenerator import convert_for_drawing
import networkx as nx
import warnings

with warnings.catch_warnings():
    warnings.simplefilter('ignore')

    graph = convert_for_drawing(bbn)
    pos = nx.nx_agraph.graphviz_layout(graph, prog='neato')

    plt.figure(figsize=(15, 8))
    plt.subplot(121)
    labels = dict([(k, node.variable.name) for k, node in bbn.nodes.items()])
    nx.draw(graph, pos=pos, with_labels=True, labels=labels)
    plt.title('BBN DAG')

2.3.2. BBN Parameters

The following shows the marginal posteriors of the nodes.

[6]:

def potential_to_series(potential):
    def get_entry_kv(entry):
        arr = [(k, v) for k, v in entry.entries.items()]
        arr = sorted(arr, key=lambda tup: tup[0])
        return arr[0][1], entry.value

    tups = [get_entry_kv(e) for e in potential.entries]
    return pd.Series([tup[1] for tup in tups], [tup[0] for tup in tups])

series = [(node, potential_to_series(join_tree.get_bbn_potential(node))) for node in join_tree.get_bbn_nodes()]

n_cols = 3
n_rows = int(len(series) / n_cols)

fig, axes = plt.subplots(n_rows, n_cols, figsize=(10, 20))
axes = np.ravel(axes)

for ax, (node, s) in zip(axes, series):
    s.plot(kind='bar', ax=ax, title=f'{node.variable.name}')

plt.tight_layout()

2.4. Diagnosis

Now we are ready to make diagnosis using the BBN. The total set of symptoms in the Hubei dataset (as we have transformed them) is 32, however, there are only 16 symptoms modeled into the BBN.

[7]:

%%time

from pybbn.graph.jointree import EvidenceBuilder

names = [
    'anosmia', 'sputum', 'muscle', 'chills', 'fever',
    'wheezing', 'nasal', 'fatigue', 'headache', 'sore_throat',
    'dry_cough', 'diarrhoea', 'dyspnea', 'nausea', 'sneezing',
    'running_nose'
]

predictions = []

for i, r in d.iterrows():
    fields = [name for name in names if r[name] == 1]

    join_tree.unobserve_all()

    if len(fields) > 0:
        bbn_nodes = [join_tree.get_bbn_node_by_name(f) for f in fields]
        evidences = [EvidenceBuilder().with_node(n).with_evidence('t', 1.0).build() for n in bbn_nodes]
        join_tree.update_evidences(evidences)

    disease = join_tree.get_bbn_node_by_name('disease')
    disease_potential = join_tree.get_bbn_potential(disease)
    s = potential_to_series(disease_potential)

    predictions.append(s)

CPU times: user 6.85 s, sys: 40.2 ms, total: 6.89 s
Wall time: 6.93 s

[8]:

predictions = pd.DataFrame(predictions)
predictions

[8]:

	no_virus	rhinovirus	hmpv	hrsv	influenza	covid19
0	0.021350	0.011572	0.040865	0.058689	0.465734	0.401790
1	0.194664	0.056940	0.028707	0.085324	0.197598	0.436766
2	0.010166	0.013346	0.032700	0.040446	0.838125	0.065217
3	0.000525	0.017707	0.047804	0.135938	0.748414	0.049613
4	0.194664	0.056940	0.028707	0.085324	0.197598	0.436766
...	...	...	...	...	...	...
651	0.001141	0.020118	0.034012	0.069237	0.875403	0.000089
652	0.242781	0.058875	0.031008	0.055796	0.327462	0.284078
653	0.242781	0.058875	0.031008	0.055796	0.327462	0.284078
654	0.021350	0.011572	0.040865	0.058689	0.465734	0.401790
655	0.242781	0.058875	0.031008	0.055796	0.327462	0.284078

656 rows × 6 columns

2.5. Diagnosis Performance

All the records/patients in the Hubei dataset are positively-tested COVID-19 patients. Thus, we have no non-COVID-19 patients, and so we will avoid using performance measures that requires negative examples.

2.5.1. Quasi-proper scoring rules

We will try using average precision and plot the precision recall curve. Note the absurdity of doing so. These performance measures are so-called quasi-proper scoring rules.

[9]:

from sklearn.metrics import average_precision_score

y_true = np.ones(predictions.shape[0])
y_pred = predictions.covid19

ap = average_precision_score(y_true, y_pred)
print(f'average precision score is {ap:.5f}')

average precision score is 1.00000

[10]:

from sklearn.metrics import precision_recall_curve

pre, rec, _ = precision_recall_curve(y_true, y_pred)

fig, ax = plt.subplots(figsize=(15, 5))

_ = ax.step(rec, pre, color='b', alpha=0.5, where='post', label='PR curve')
_ = ax.set_xlabel('recall')
_ = ax.set_ylabel('precision')
_ = ax.set_title('Precision-Recall Curve')

2.5.2. Proper scoring rule

Instead, we use a proper scoring rule such as the Brier loss. The Brier score is in the range \([0, 1]\), where a value closer to 0 is better. The Brier score essentially is the mean squared difference between the real probability and predicted one. As you can see, the Brier score is about 0.49. Is this value good or bad? It is right smack in the middle; meaning, it is not un-useful, but could be.

[11]:

from sklearn.metrics import brier_score_loss

bsl = brier_score_loss(y_true, y_pred)
print(f'brier score loss = {bsl:.5f}')

brier score loss = 0.48920

2.5.3. Agreement

Here, we take a different approach to judging the BBN’s diagnostic reliability by looking at the counts of predicted patients to have COVID-19 versus the empirical counts.

First, we create strata based on the observed and unique combinations of symptoms and observe the empirical number of patients with such co-symptoms.
Second, for each unique combination of symptoms observed, we present such symptoms as evidence to the model and allow it to give us the probability of having COVID-19.
Third, we multiply the probability by the total number of patients observed across all the strata.
Lastly, we compare the agreement between the numbers predicted by the BBN and the empirical ones.

[12]:

def get_symptom_combinations(r):
    fields = sorted([name for name in names if r[name] == 1])
    return fields

def get_query(combination):
    p_tokens = combination.split(',')
    n_tokens = [n for n in names if n not in p_tokens]

    p_tokens = [f'{t}==1' for t in p_tokens]
    n_tokens = [f'{t}==0' for t in n_tokens]

    tokens = p_tokens + n_tokens
    query = ' and '.join(tokens)
    return query

combinations = [get_symptom_combinations(r) for _, r in d.iterrows()]
combinations = [c for c in combinations if len(c) > 0]
combinations = [','.join(c) for c in combinations]
combinations = sorted(list(set(combinations)))
print(f'number of combinations {len(combinations)}')

queries = [get_query(c) for c in combinations]

# we lose 67 patients, they have no symptoms
strata = pd.DataFrame([(c, d.query(q).shape[0]) for c, q in zip(combinations, queries)], columns=['stratum', 'n'])
strata['n_symptoms'] = strata.stratum.apply(lambda s: len(s.split(',')))
print(f'number of patients {strata.n.sum()}')

number of combinations 103
number of patients 589

This is the distribution of the unique combinations of co-symptoms. Note that some symptoms may show up only by themselves.

[13]:

fig, ax = plt.subplots(figsize=(20, 5))
s = pd.Series(strata.n.values, strata.stratum.values)
_ = s.plot(kind='bar', ax=ax, title=f'Frequency of all symptom combinations, n={strata.n.sum()}')

In this graph, we remove strata that have only 1 symptom to remove the effect of visual skewness.

[14]:

s = strata[strata.n_symptoms > 1]

fig, ax = plt.subplots(figsize=(20, 5))
s = pd.Series(s.n.values, s.stratum.values)
_ = s.plot(kind='bar', ax=ax, title=f'Frequency of symptom combinations (more than 1), n={strata.n.sum()}')

Now we feed the symptoms in each of the stratum to the BBN and estimate the predicted counts of patients with COVID-19.

[15]:

import math

predictions = []

for i, r in strata.iterrows():
    fields = r.stratum.split(',')

    join_tree.unobserve_all()

    if len(fields) > 0:
        bbn_nodes = [join_tree.get_bbn_node_by_name(f) for f in fields]
        evidences = [EvidenceBuilder().with_node(n).with_evidence('t', 1.0).build() for n in bbn_nodes]
        join_tree.update_evidences(evidences)

    disease = join_tree.get_bbn_node_by_name('disease')
    disease_potential = join_tree.get_bbn_potential(disease)
    s = potential_to_series(disease_potential)

    predictions.append(s)

predictions = pd.DataFrame(predictions)

n = strata.n.sum()
preds = pd.DataFrame([(math.ceil(n * p), c) for p, c in zip(predictions.covid19, strata.n)], columns=['y_pred', 'y_true'])

Below, we visualize the predicted number of patients with COVID-19 given multiple symptoms with the model versus the empirical numbers. We use Pearson, Kendall, and Spearman correlations. The latter two correlation measures are rank correlations and may be used to gauge at the agreement between the ranked predicted and empirical frequencies/counts. For all these correlation measures, the higher the value, the better the agreement. As can be seen below, there is positive agreement, and in some sense, especially with Spearman correlation, the agreement is strong.

Let’s note that the few dots to the right correspond to stratum with a single symptom. This observation is not surprising, since the BBN assumes independence between the symptoms; meaning, we should expect agreement between the predicted and empirical counts when it comes to stratum with one symptom.

[16]:

from scipy.stats import spearmanr, kendalltau, pearsonr

spearman = spearmanr(preds.y_true, preds.y_pred).correlation
kendall = kendalltau(preds.y_true, preds.y_pred).correlation
pearson = pearsonr(preds.y_true, preds.y_pred)[0]

fig, ax = plt.subplots(figsize=(10, 5))

_ = ax.scatter(preds.y_true, preds.y_pred)
_ = ax.set_title(f'Counts of patients predicted to have COVID-19 vs empirical counts\npearson={pearson:.2f}, spearman={spearman:.2f}, kendall={kendall:.2f}')
_ = ax.set_xlabel('empirical counts')
_ = ax.set_ylabel('predicted counts')

[17]:

x = preds / preds.sum()

spearman = spearmanr(x.y_true, x.y_pred).correlation
kendall = kendalltau(x.y_true, x.y_pred).correlation
pearson = pearsonr(x.y_true, x.y_pred)[0]

fig, ax = plt.subplots(figsize=(10, 5))

_ = ax.scatter(x.y_true, x.y_pred)
_ = ax.set_title(f'Probabilities of patients predicted to have COVID-19 vs empirical counts\npearson={pearson:.2f}, spearman={spearman:.2f}, kendall={kendall:.2f}')
_ = ax.set_xlabel('empirical probability')
_ = ax.set_ylabel('predicted probability')

Here is the mean squared difference between the predicted probabilities (of frequencies) and the empirical ones. Wow! Almost zero!

[18]:

x.apply(lambda r: (r.y_pred - r.y_true)**2, axis=1).mean()

[18]:

0.0012162588488363877

Here is the Brier score for the predicted probabilities. Remember, Brier loss ranges from \([0, 1]\) and the lower the Brier loss, the better. This approach of judging the BBN means that the model is very bad at diagnosing COVID-19.

[19]:

brier_score_loss(np.ones(x.shape[0]), x.y_pred)

[19]:

0.9807848118843429

Here is the Brier score for the empirical probabilities. Whew! These two last results suggest maybe this way of judging the BBN is not correct.

[20]:

brier_score_loss(np.ones(x.shape[0]), x.y_true)

[20]:

0.9820609112681514

2.6. Misc

Ignore the code below. It will print out all the unique symptoms in the Hubei data. Useful for the symptom mapping exercise.

[21]:

# x = [tokenize(s) for s in data.symptoms if s is not None]
# x = [tokens for tokens in x if tokens is not None and len(tokens) > 0]
# n = len(x)
# x = list(itertools.chain(*[item for item in x]))
# for i, s in enumerate(sorted(list(set(x)))):
#     print(f'{i}, {s}')