How Viruses Spread...
(still under construction)
April 11, 2020. A cold Easter weekend in Toronto, Canada.
​
Like three-quarters of the Earth's population I am quarantined at home. We are in a "lockdown". We are told to stay at home and practice "social distancing." All "non essential" businesses are closed, so are theaters, parks, sport events and restaurants among others. People who do not comply with these orders are given fines.
What happened?
​
Briefly, this is the story.
​
In December of 2019 in the city of Wuhan China people were starting to die from an apparent new disease. The victims showed the typical signs of the flu. Some early tests revealed that it was a new type of Corona virus that was christened "COVID-19." At first health officials like the W.H.O. told us that there was nothing to worry about. However, the virus started to spread in the city of Wuhan and the province of Hubei. The Chinese started to take action which resulted in a lockdown of most cities in China. Then the virus spread westward to the middle east (Iran) and Europe eventually reaching North America. Most countries like Italy and Spain who were hit hardest, started to implement lockdowns of their populations. The USA, Canada, India and most of the world followed suit. As of today China is easing its lockdown as it seems the infection has petered off. Fueled by Hollywood movies like "Contagion" and sensationalist mainstream media this has caused fear among the population and lead to irrational behavior like stockpiling on toilet paper for example.
​
Being a mathematician at heart and someone who has written a lot of simulation code I wanted to understand the mechanisms of a virus spread. How can we model it? How can we quantify it? How can we simulate it? How can we analyze the results and make decisions? Many questions that many of you might have as well.
​
I was particularly intrigued by the following tweet exchange that I subsequently posted on LinkedIn.
It piqued my interest as it mentions "thousands of lines of undocumented C". What? That code was developed at the Imperial College in London England and it predicted that half a million people would die if no action was taken by the British government. It swayed Boris Johnson's government to implement a total lockdown. After the lockdown was put in effect news came out that the simulation overstated the impact. The estimate was now set at twenty thousand or perhaps even as low as five thousand. That is two orders of magnitude wrong! Having worked with simulation code I wasn't surprised by the discrepancies. This is parameter sensitivity. I was shocked by the fact that the Government took the code that seriously. Especially since a lockdown is a decision with potentially disastrous consequences. It can cripple the economy of an entire country. At the time of this writing one million Canadians and 17 million Americans have lost their jobs because of the lockdown.
​
I had to figure this out. I know math, I know simulation and I know how to code. Ha in C! But also other languages like JavaScript as shown below. So I was good to go. And since I was forcefully quarantined I could take a break from Netflix and YouTube and do some independent research.
​
Finding the math behind the simulation was easy. After some googling the following is what I found out.
​
Firstly, the seminal work in this area is the 1927 paper by Kermack and McKendrik.
​
​
The paper is somewhat mathematically obtuse but the resulting methods are not. Keep in mind that this paper was written in 1927 well before the advent of the modern computer. Therefore they focus on finding analytical solutions.
The science that studies the spread of viruses is called Epidemiology. There are many Epidemiological models but the most basic one is the SIR model.
​
The SIR model
​
SIR is an acronym for Susceptible, Infected and Recovered. It separates the population at a certain point in time into three groups
​
-
S : Susceptibles: fraction of the population which is susceptible to get the virus (initially equal to 1).
-
I : Infected: fraction of the population infected (initially equal to the number infected divided by the total population)
-
R : Recovered: fraction of the population that recovered from an infection and is immune for life (initially equal to 0).
​
Evidently we have that:
​
S + I + R = 1.
​
This model assumes that once you are recovered you are immune for life for this particular virus. This model has two transition rates:
​
-
b : fraction of susceptibles which become infected per day
-
k : fraction of infected that recover
​
The dynamics of the variables can be represented by the following transition graph:
We model the fractions of the population using three arrays S[], I[] and R[]. These fractions vary over time according to the transition rates as follows:
​
S[0] = 1;
I[0] = InitialInfected/population;
R[0] = 0;
for(i=1; i<numDays; i++) {
S[i] = S[i-1] - b * I[i-1] * S[i-1];
I[i] = I[i-1] + b * I[i-1] * S[i-1] - k * I[i-1];
R[i] = R[i-1] + k * I[i-1];
}
​
For the mathematically savvy: this is an explicit integration scheme of an underlying ODE with unit time step. (Our implementation actually uses ten sub steps with time step 0.1 for stability for high rates).
In this simple model we can extract the number of people dying from the virus by multiplying the infected by a death percentage:
​
dead[i] = deathPercentage * I[i] * population;
​
Obviously if both b and k are zero nothing happens. In the case when k is zero the population dies out at an exponential rate.
​
Please experiment with the following sliders to get a feel for how these rates influence the spread of a virus:
This simple model allows us to make some general qualitative remarks.
Initially, the infection increases exponentially. This usually creates a panic in the population: a fear that death is imminent. The sensationalist press further prays on these fears focusing mainly on anecdotal evidence. Fear sells. For people not familiar with epidemiological models the curve seems to be on an apocalyptic path of death. Instead of reassuring the population the press bring in"experts" to explain the concept of an exponential curve.
Another fear factor is that the lethality rate of a virus is often over-estimated. Remember that the lethality rate is the number of deaths due to the virus (the numerator) divided by the number of infected (the denominator):
​
lethality = dead / infected.
​
Do not confuse this with a mortality rate which has the entire population as its denominator. The number of deaths is known while the number of infected is estimated. Usually only the infected who have symptoms of the virus are counted. This is further complicated by the fact that these symptoms are similar to the common flu. Not to mention the asymptomatics: people who are infected but show no sign of being infected. Virus tests are also still somewhat unreliable and may return false positives or false negatives. The number of deaths due to the virus is also problematic since many deceased can have co morbid conditions. For example, consider a terminally ill cancer patient who gets infected by the virus in the hospital and dies. His death is then counted as a Covid-19 death and thus inflating the lethality rate.
The initial lethality rate given by the W.H.O. was 4 or even higher. That is problematic!
​
(update: 19/04/2020. A Stanford study now puts it at around 0.2, just like the common flu).
​
However, pandemics are short lived in general as shown by the red curve. It increases exponentially then peaks and drops off exponentially. At the end the entire population is recovered and immune. And that is it for the pandemic! It is over. That is what happens every year in the flu season as shown in this figure.
This natural cycle is called herd immunization.
If someone starts slurping the same bat soup or feasting on a Pangolin hot pot he will not infect anyone else because they are already immune. The cycle then repeats for another flu variant virus in the next season. Some seasons are more lethal than others and usually no one pays attention. COVID-19 on the other hand is supposedly no common flu. It is more related to the SARS virus from 2003 for which no vaccine exists.
​
A vaccine if available can help in accelerating this herd immunization. A vaccine has its own risks and side effects. More importantly there is no guarantee a safe vaccine will been found for the COVID-19 virus soon. No vaccines have been found for any of the Corona virus family yet. And it is not because of a lack of trying. Developing vaccines is potentially a very lucrative business. Especially if mandatory vaccinations are legislated.
​
Below we will see what effects a lockdown could have on the spread of the virus and the number of fatalities.
​
​
The SEIRD model with Lockdown
​
Ultimately we are interested in the number of deaths caused by the virus. The SIR model only gives the fraction of people that die as a percentage of the infected. Visually this corresponds to a simple scaling of the red curve. This is a gross oversimplification. Deaths usually occur some time after the infection occurred.
​
A better model is therefore needed.
​
We can improve our model by introducing two new variables: the fraction of the population that has been exposed (E) to the virus and the fraction of the population that is dead (D). This is the so-called SEIRD model. In this model the spread of the virus is modeled by the rates shown in the following interaction graph.
The corresponding simulation follows that of the SIR code and is given by:
​
S[0] = initS / 100.0;
E[0] = 0;
I[0] = InitialInfected/population;
R[0] = 0;
D[0] = 0;
for(i=1; i<numDays; i++) {
var Sm = S[i-1]; var Em = E[i-1]; var Im = I[i-1]; var Dm = D[i-1]
S[i] = Sm - rateSE*Em*Sm - rateSI*Im*Sm + rateRS*Rm;
E[i] = Em + rateSE*Em*Sm + rateSI*Im*Sm - rateEI*Em - rateER*Em;
I[i] = Im + rateEI*Em - rateIR*Im - rateID*Im;
R[i] = Rm + rateER*Em + rateIR*Im - rateRS*Rm;
D[i] = Dm + rateID*Im;
}
​
This code is like the SIR model just more sophisticated. We now wish to add the effect of a lockdown.
​
What is a lockdown?
​
I guess you would know because pretty much the entire world except for Sweden is in a lockdown right now. The goal is to minimize the interaction between people in order to minimize the spread of the virus. The virus spreads through the air and through shared surfaces. Consequently, you are supposed to stay at home (if you have one). You can leave your home only for essentials like breathing fresh air and stockpiling on food, toilet paper and liquor. When outside, you have to keep a distance of roughly two meters between fellow lockups (Canadians: that is the length of a hockey stick). This is called "social distancing." The goal is to minimize the exposure rate rateSE and the infection rate rateSI.
We model the lockdown with a start date startLockDown and an end date endLockDown. During this period we multiply the exposure rate and the infection rate by a damping factor dampLockDown. Therefore the second and third lines in the for loop have to be replaced by:
​
var rateSEL = rateSE;
var rateSIL = rateSE;
​
if(i>startLockDown && i<=endLockDown) {
rateSEL *= dampLockDown;
rateSIL *= dampLockDown;
}
​
S[i] = Sm - rateSEL*Em*Sm - rateSIL*Im*Sm + rateRS*Rm;
E[i] = Em + rateSEL*Em*Sm + rateSIL*Im*Sm - rateEI*Em - rateER*Em;
​
The web app below written in JavaScript implements the SEIRD model. We have exposed all the parameters of the model. You can experiment with the parameters to create different epidemics. As a benevolent dictator you are supposed to minimize the number of death shown at the top in the rosy box.
Lockdown Strategies
​
We now report on some experiments we have done with our SEIRD lockdown simulator. Feel free to do so on your own and come up with your own conclusion! As a benevolent and rational dictator our goal is to minimize the number of deaths. In our experiments we fix the parameters of the SEIRD model and only vary the variables that affect the lockdown. We assume that the SEIRD parameters have been estimated for a particular virus. How to estimate these parameters is the topic of the section below. With no lockdown, the so-called "laissez faire"approach, we get 63,874 deaths out of a million and get the following SEIRD curves:
One could call this is the "Swedish" approach. The virus disappears through herd immunization and at the end the population is completely immune to the virus. There is no rebound (in all these experiments we assume that immunity is for life: the rate RS is zero). In this scenario there are no deaths of despair caused by the lockdown. Now consider the opposite case of a country where everyone is always in a state of a lockdown. Fortunately, no such country exists. But this could be the case in a dystopian future where all communication between humans is virtual. In this case only the people who ate bat soup might die and the rest of the populace never gets infected as they are protected in their virtual bubbles. This results in zero deaths and flat curves as shown below.
Now, like most countries on earth we implement a lockdown. We assume that the social distancing and quarantine are not perfect and we use a damping value of 0.1. We also assume that we have no prior knowledge of the start date of the infection. The lockdown policy is usually introduced when there is a sudden rise in the number of deaths and infections. Therefore, usually a lockdown starts in the first exponential rise phase. Here we show the effect varying the start of the lockdown.
Not surprisingly the earlier we start the lockdown the more lives we save. The virus started in China and therefore lockdown measures were taken after the infection rate had peaked. That is shown in the second graph.Their lockdown did not have a dramatic effect on the number of fatalities. Other countries like Taiwan and Singapore had a heads up and started an early lockdown. And indeed their number of deaths is lower. European countries and North America were somewhat late in adopting the lockdown as per W.H.O. instructions.These graphs seems to indicate that the best strategy to lower the fatality rate is to start a lockdown as soon as possible.
​
Or so it seems...
​
Notice that in our two graphs the lockdown goes on indefinitely. Until perhaps an effective vaccine is discovered. This strategy is of course not sustainable. The devastating effect on the economy and ensuing deaths of despair would largely outnumber the lives saved from the virus.
Bottom line: this lockdown has to come to an end!
What happens when the lockdown ends too early? Well, we then run the risk that the number of infected and deaths start to increase again. We demonstrate this in the following graphs.
We observe that starting an early lockdown can lead to a reinfection that result in more deaths than a lockdown of same length that starts later. This is because in the first case not enough people have become immune to the virus. Once the strict "social distancing" is lifted, people mingle and the virus infects again. Notice that the lockdown has not had a substantial effect on the number of dead as compared to the "laissez faire" approach. However, a lockdown negatively affects the economy and the well being of the populace. Resulting in additional deaths of despair. Thos effects might be considerable even long after the lockdown has ended.
Finally, we want to see the effect of the damping on lockdown. Moderate damping actually characterizes the "Swedish" approach. It is not a free for all "laissez faire". Some stores and schools are closed and only gatherings of up to 50 people are allowed. On the other hand most shops, bars and restaurants remain open with people exerting reasonable "social distancing" that is not strictly enforced. This has the advantage that the economy and the well being of the Swedes is not affected as dramatically as in the case of a total lockdown.
The top curves show the effect of a total lockdown. Once the lockdown is lifted the spread occurs again. In the "Swedish" model the damping is less severe and more people have a time to recover and become immune. As you can notice that approach results in less deaths. This seems counter-intuitive at first but that is the whole point of doing mathematical modeling. Strict lockdowns need to be longer. That is the sad truth. This results in a destruction of the economy and many deaths of despair. It also goes counter to the fundamental tenets of a free society. A lockdown can even be considered unconstitutional. Humans have evolved to socialize not to be confined in virtual bubbles.
​
A strict lockdown is a trap from which it is hard to escape. It either results in a never ending lockdown or a repetition of a shorter lockdowns. It is a major threat to a free society.
​
These are my conclusions. Please experiment yourself and refine the models as you well please. Unlike the Imperial College code, this code is freely available in JavaScript embedded in this web page (press F12 to see the source).
Comparison to the JHU Data
​
UPDATE: 28/04/2020. A Study from Singapore has done exactly what I was going to do in this Section! Check out the link below. Hopefully, you will have enough math background to understand what this study does. As a benefit it allows one to predict the end of the pandemic for various countries.
​
​
UPDATE: 13/05/2020. The web page above has "internalized" their results and are therefore no longer available. That convinced me to finish my work and create my own fits.
​
ORIGINAL + FITTING WORK (as of 13/05/2020)
​
So far we have just played around with mathematical models. The derivation "makes sense" and follows the phenomenological description of the spread of a virus. But does the math actually match measured data for a real virus outbreak?
​
Welcome to the world of parameter-fitting.
​
We will first focus on the simple SIR model which has only two parameters: the spread rate "b" and the recovery rate "k". Each pair of values determines the three S, I and R curves as shown in the demo above.
We wish to estimate the values of these parameters for a given country from actual data. Of course the SIR model match won't be exact: the data is noisy, the model leaves out many factors, etc. But hopefully it will be "close enough." Making that last statement precise is the art and science of modeling.
​
First we need some real data!
​
Since January 22, 2020 the John Hopkin's University has been releasing COVID-19 virus data for every country of the world. Specifically it releases a daily time series for every country of the following three variables:
​
-
confirmed : # of people who tested positive
-
recovered : # of people who recovered
-
deaths : # of people who died.
​
These variables map easily to the variables of the SIR model. For more information see the JHU web page
​
https://systems.jhu.edu/research/public-health/ncov/
​
The daily data has been conveniently parsed into a json-format by Rodrigo Pombo. This is how you can access it in your JavaScript (modify it to match your needs, this code snipet just prints out the data to the console).
​
fetch("https://pomber.github.io/covid19/timeseries.json")
.then(response => response.json())
.then(data => { data["China"].forEach(
({ date, confirmed, recovered, deaths }) =>
console.log(`${date} C=${confirmed} R=${recovered} D=${deaths}`)
);
});
​
In the plots below I am using the number of daily deaths. The reason is that this is a solid number despite the fact that it is probably over-estimated because of co morbidity: a lot of patients die with the virus not because of the virus. In the SIR model the infection rate is assumed to be proportional to the lethality rate.
​
To fit the SIR model to the data we use a very simple non-linear least square approach. Our SIR model does not admit an analytical solution. To evaluate the function at a point "x" we therefore have to run the simulation up to the date x. The function also depends on the parameters p=[b, k]:
function funSIR(x, p) {
let steps = 10;
let dt = 1.0 / steps;
var S = 1.0;
var y = data[0] / maxData;
for (i = 1; i <= x; i++) {
var S0 = S;
var y0 = y;
for (k = 0; k < steps; k++) {
S = S0 + dt * (-p[0] * S0 * y0);
y = y0 + dt * ( p[0] * S0 * y0 - p[1] * y0);
S0 = S;
y0 = y;
}
}
return y;
}
​
The function we are trying to minimize is the difference squared between the SIR function and the "death" data data[i].
​
function funCost(p) {
var sum = 0;
for(i = 0; i < nData; i++) {
var diff = funSIR(i, p) - data[i];
sum += diff * diff;
}
return sum;
}
​
To fit the parameters we use a very simple multivariate unconstrained optimization technique. The function is initialized with a step for each parameter value. Then separately for each parameter the step is added and the cost function is evaluated. If this cost is smaller than the current one we keep the parameter value and increase the step. We are going in the right direction! If not we decrease the step and reverse its sign. For those readers who know MATLAB, this is a simple version of "fminsearch". We give the full implementation here in JavaScript.
​
function fitFunc(p0, step, maxIter){
let n = p0.length;
var p1 = Array(n);
for(it = 0; it < maxIter; it++) {
for (j = 0; j < n; j++) {
p1 = p0.slice();
p1[j] += step[j];
if(funCost(p1) < funCost(p0)) {
step[j] *= 1.2;
p0 = p1.slice();
}
else {
step[j] *= -0.5;
}
}
}
return p0;
}
​
Simple no? ("slice" allows you to copy arrays, don't ask me why).
​
Following is a simple JavaScript program that allows you to view the data for a particular country. The data is from the JHU data base. The plot if fitted using the SIR model and the simple optimizer given above. If you do not like the fit try fitting the curve manually to the data by varying the "b" and "k" parameters. We also print the "R0" constant made famous in the movie "Contagion". Instead of a huge scrolling menu you have to manually type in the country's name. Beware! It is case sensitive! *
​
The data is updated daily by JHU. So come back daily to increase the number of repeat visitors.
​
​
* Sometimes the data is displayed twice. Try re-entering the name until it works (it doesn't affect the plot).
​
I use the fitted data curve to estimate both the last day someone dies and the total number of deaths. The last day of death is the first day when the curve falls below 1. The total number of deaths is simply the total integral of the curve. It is computed by summing up all the daily estimated deaths (height of the curve).
​
I also added a control on the number of data points that are actually used to compute the fit. This is the "% fit" slider. By default the fit is performed on the entire data (100 %). I added this feature so as to asses how accurate the predictions are compared to what actually happened. By decreasing the percentage you are actually going back in time and the curve is fit only up to the first percentage of the total data points.
​
After I read studies about T-cells and a study by the Imperial College of London (again!) that shows that in some cases the susceptibility for this virus is as low as 20% I added a new parameter: % susceptible which you can vary using the slider. In this case the curve will be fitted with this proportion of susceptibles.
​
The lockdown period (when implemented) when available is shown as shaded light purple at the bottom of the plot. I did not use this data to adjust the SIR model like I did for the SEIRD model above. Use it as an indication of its possible effect on the resulting curve.
UPDATE: 02/07/2021. The JHU data format has changed once again and the pandemic has been going on for way too long so I would have to rescale the time scales and rewrite JavaScript Code. All the methodology is still relevant. But with all the variants and talks of a second, third, fourth wave, ... which all contradict the mathematical models used in this paper there is no point. A more sophisticated model is required for the fit which requires lockdowns (one is given above), number of PCR test cycles used, political party in charge, level of fear in the population, number of people vaccinated, etc. You get the point analyzing this pandemic goes beyond simple mathematical models. But these models give the fundamental behavior of the spread of a single virus with no human interference. They rise exponentially first, reach a max and then taper off. And that is it. Then this process is repeated with another virus. That is why each year there is a flu season: it starts in the late fall, peaks in the winter and drops off in the spring and the summer is practically flu free.
Now imagine that we create a test called flu-PCR that detects the presence of any flu-like virus. Then if someone tests positive with test flu-PCR we say that that person is infected with the all encompassing virus that we arbitrarily call flu-VIRUS. Also if someone dies with a positive flu-PCR test then we say that that that person died from flu-VIRUS. flu-VIRUS has automatically many waves, actually an indefinite number of them. Of course the flu-PCR test has many false positives. This means that perfectly healthy people are being diagnosed with having flu-VIRUS. Consequently, the number of flu-VIRUS infections and deaths are highly inflated. flu-PCR is also a test that depends on many man set parameters and can therefore be controlled to adjust the number of positives.