Sandipan DeyOct 13, 2017
Measuring Semantic Relatedness using the Distance and the Shortest Common Ancestor and Outcast Detection with Wordnet Digraph in Python
The following problem appeared as an assignment in the Algorithm Course (COS 226) at Princeton University taught by Prof. Sedgewick. The description of the problem is taken from the assignment itself. However, in the assignment, the implementation is supposed to be in java, in this article a python implementation will be described instead. Instead of using nltk, this implementation is going to be from scratch.
The Problem
- WordNet is a semantic lexicon for the English language that computational linguists and cognitive scientists use extensively. For example, WordNet was a key component in IBM’s Jeopardy-playing Watson computer system.
- WordNet groups words into sets of synonyms called synsets. For example, { AND circuit, AND gate } is a synset that represent a logical gate that fires only when all of its inputs fire.
- WordNet also describes semantic relationships between synsets. One such relationship is the is-a relationship, which connects a hyponym (more specific synset) to a hypernym (more general synset). For example, the synset { gate, logic gate } is a hypernym of { AND circuit, AND gate } because an AND gate is a kind of logic gate.
- The WordNet digraph. The first task is to build the WordNet digraph: each vertex v is an integer that represents a synset, and each directed edge v→w represents that w is a hypernym of v.
- The WordNet digraph is a rooted DAG: it is acyclic and has one vertex—the root— that is an ancestor of every other vertex.
- However, it is not necessarily a tree because a synset can have more than one hypernym. A small subgraph of the WordNet digraph appears below.

The WordNet input file formats
The following two data files will be used to create the WordNet digraph. The files are in comma-separated values (CSV) format: each line contains a sequence of fields, separated by commas.
- List of synsets: The file synsets.txt contains all noun synsets in WordNet, one per line. Line i of the file (counting from 0) contains the information for synset i.
- The first field is the synset id, which is always the integer i;
- the second field is the synonym set (or synset); and
- the third field is its dictionary definition (or gloss), which is not relevant to this assignment.
For example, line 36 means that the synset { AND_circuit
, AND_gate
} has an id number of 36 and its gloss is a circuit in a computer that fires only when all of its inputs fire
. The individual nouns that constitute a synset are separated by spaces. If a noun contains more than one word, the underscore character connects the words (and not the space character).
- List of hypernyms: The file hypernyms.txt contains the hypernym relationships. Line i of the file (counting from 0) contains the hypernyms of synset i.
The WordNet data type
Implement an immutable data type WordNet
with the following API:

- The Wordnet Digraph contains 76066 nodes and 84087 edges, it’s very difficult to visualize the entire graph at once, hence small subgraphs will be displayed as and when required relevant to the context of the examples later.
- The sca() and the distance() between two nodes v and w are implemented using bfs (bread first search) starting from the two nodes separately and combining the distances computed.
Performance requirements
- The data type must use space linear in the input size (size of synsets and hypernyms files).
- The constructor must take time linearithmic (or better) in the input size.
- The method
isNoun()
must run in time logarithmic (or better) in the number of nouns.
- The methods
distance()
and sca()
must make exactly one call to the length()
and ancestor()
methods in ShortestCommonAncestor
, respectively.
The Shortest Common Ancestor
- An ancestral path between two vertices v and w in a rooted DAG is a directed pathfrom v to a common ancestor x, together with a directed path from w to the same ancestor x.
- A shortest ancestral path is an ancestral path of minimum total length.
- We refer to the common ancestor in a shortest ancestral path as a shortest common ancestor.
- Note that a shortest common ancestor always exists because the root is an ancestor of every vertex. Note also that an ancestral path is a path, but not a directed path.

Basic performance requirements
The data type must use space proportional to E + V, where E and V are the number of edges and vertices in the digraph, respectively. All methods and the constructor must take time proportional to E+ V (or better).
Measuring the semantic relatedness of two nouns
Semantic relatedness refers to the degree to which two concepts are related. Measuring semantic relatedness is a challenging problem. For example, let’s consider George W. Bushand John F. Kennedy (two U.S. presidents) to be more closely related than George W. Bush and chimpanzee (two primates). It might not be clear whether George W. Bush and Eric Arthur Blair are more related than two arbitrary people. However, both George W. Bush and Eric Arthur Blair (a.k.a. George Orwell) are famous communicators and, therefore, closely related.
Let’s define the semantic relatedness of two WordNet nouns x and y as follows:
- A = set of synsets in which x appears
- B = set of synsets in which y appears
- distance(x, y) = length of shortest ancestral path of subsets A and B
- sca(x, y) = a shortest common ancestor of subsets A and B
This is the notion of distance that we need to use to implement the distance()
and sca()
methods in the WordNet
data type.

Finding semantic relatedness for some example nouns with the shortest common ancestor and the distance method implemented
apple and potato (distance 5 in the Wordnet Digraph, as shown below)

As can be seen, the noun entity is the root of the Wordnet DAG.
beer and diaper (distance 13 in the Wordnet Digraph)

beer and milk (distance 4 in the Wordnet Digraph, with SCA as drink synset), as expected since they are more semantically closer to each other.

bread and butter (distance 3 in the Wordnet Digraph, as shown below)

cancer and AIDS (distance 6 in the Wordnet Digraph, with SCA as disease as shown below, bfs computed distances and the target distance between the nouns are also shown)

car and vehicle (distance 2 in the Wordnet Digraph, with SCA as vehicle as shown below)

cat and dog (distance 4 in the Wordnet Digraph, with SCA as carnivore as shown below)

cat and milk (distance 7 in the Wordnet Digraph, with SCA as substance as shown below, here cat is identified as Arabian tea)

Einstein and Newton (distance 2 in the Wordnet Digraph, with SCA as physicist as shown below)

Leibnitz and Newton (distance 2 in the Wordnet Digraph, with SCA as mathematician)

Gandhi and Mandela (distance 2 in the Wordnet Digraph, with SCA as national_leader synset)
laptop and internet (distance 11 in the Wordnet Digraph, with SCA as instrumentation synset)
school and office (distance 5 in the Wordnet Digraph, with SCA as construction synset as shown below)

bed and table (distance 3 in the Wordnet Digraph, with SCA as furniture synset as shown below)

Tagore and Einstein (distance 4 in the Wordnet Digraph, with SCA as intellectual synset as shown below)

Tagore and Gandhi (distance 8 in the Wordnet Digraph, with SCA as person synset as shown below)

Tagore and Shelley (distance 2 in the Wordnet Digraph, with SCA as author as shown below)

text and mining (distance 12 in the Wordnet Digraph, with SCA as abstraction synset as shown below)

milk and water (distance 3 in the Wordnet Digraph, with SCA as food, as shown below)
Outcast detection
Given a list of WordNet nouns x1, x2, …, xn, which noun is the least related to the others? To identify an outcast, compute the sum of the distances between each noun and every other one:
di = distance(xi, x1) + distance(xi, x2) + … + distance(xi, xn)
and return a noun xt for which dt is maximum. Note that distance(xi, xi) = 0, so it will not contribute to the sum.
Implement an immutable data type Outcast
with the following API:

Examples
As expected, potato is the outcast in the list of the nouns shown below (a noun with maximum distance from the rest of the nouns, all of which except potato are fruits, but potato is not). It can be seen from the Wordnet Distance heatmap from the next plot, as well as the sum of distance plot from the plot following the next one.

Again, as expected, table is the outcast in the list of the nouns shown below (a noun with maximum distance from the rest of the nouns, all of which except table are mammals, but table is not). It can be seen from the Wordnet Distance heatmap from the next plot, as well as the sum of distance plot from the plot following the next one.

Finally, as expected, bed is the outcast in the list of the nouns shown below (a noun with maximum distance from the rest of the nouns, all of which except bed are drinks, but bed is not). It can be seen from the Wordnet Distance heatmap from the next plot, as well as the sum of distance plot from the plot following the next one.

Measuring Semantic Relatedness using the Distance and the Shortest Common Ancestor and Outcast Detection with Wordnet Digraph in Python
Sandipan DeyOct 4, 2017
Some Variational Image Processing: Poisson Image Editing and its applications in Python
Poisson Image Editing
The goal of Poisson image editing is to perform seamless blending of an object or a texturefrom a source image (captured by a mask image) to a target image. We want to create a photomontage by pasting an image region onto a new background using Poisson image editing. This idea is from the P´erez et al’s SIGGRAPH 2003 paper Poisson Image Editing.
The following figures describe the basic theory. As can be seen, the problem is first expressed in the continuous domain as a constrained variational optimization problem and then can be solved using a discrete Poisson solver

As described in the paper , the main task of Poisson image editing is to solve a huge linear system Ax = b (where I is the new unknown image and S and T are the known images).
Seamless Cloning
The following images are taken from an assignment , where the Poisson image editing had to be used to blend the source inside the mask inside the target image. The next few figures show the result obtained.
Source Image
https://sandipanweb.files.wordpress.com/2017/10/bear.png?w=150&h=82 150w” sizes=”(max-width: 701px) 100vw, 701px” />
Mask Image
https://sandipanweb.files.wordpress.com/2017/10/mask.png?w=150&h=82 150w” sizes=”(max-width: 687px) 100vw, 687px” />
Target Image
Output Gray-Scale Image with Poisson Image Editing
https://sandipanweb.files.wordpress.com/2017/10/pe_waterpool.png?w=150&h=82 150w” sizes=”(max-width: 676px) 100vw, 676px” />
The next animation shows the solution gray-scale images obtained at different iterations using Conjugate Gradient method when solving the linear system of equations.

Output Color Image with Poisson Image Editing
The next animation shows the solution color images obtained at different iterationsusing Conjugate Gradient method to solve the linear system of equations, applying the discrete Poisson solver on each channel.

The following images are taken from another on computational photography. Again, the same technique is used to blend the octopus from the source image to the target image.
Source Image
Mask Image
Target Image
Output Image
The next animation shows the solution color images obtained at different iterationsusing Conjugate Gradient method to solve the linear system of equations, applying the discrete Poisson solver on each channel.

Again, Poisson gradient domain seamless cloning was used to blend the penguins’ image inside the following target image with appropriate mask.
Source Image

Target Image 
Output Image
The next animation again shows the solution color images obtained at different iterations using Conjugate Gradient method to solve the linear system of equations, applying the discrete Poisson solver on each channel.

The next figures show how a source bird image is blended into the target cloud image with seamless cloning.
Source Image

Target Image
Output gray-scale image
The next animation shows the solution gray-scale images obtained at the first few iterations using Conjugate Gradient method when solving the linear system of equations.

Finally, the next figures show how the idol of the Goddess Durga is blended into the target image of the city of kolkata with seamless cloning. As can be seen, since the source mask had its own texture and there is a lots of variations in the background texture, the seamless cloning does not work that well.
Source Image

Target Image

Output Image

The next animation shows the solution gray-scale images obtained at the first few iterations using Conjugate Gradient method when solving the linear system of equations.

The next animation again shows the solution color images obtained at different iterations using Conjugate Gradient method to solve the linear system of equations, applying the discrete Poisson solver on each channel.

Gradient Mixing: Inserting objects with holes
Again, the next figures are taken from the same paper, this time the source objects contain holes. As can be seen from the following results, the seamless cloning does not work well in this case for inserting the object with holes into the target, the target texture is lost inside the mask after blending.
Source Image Target Image

Output Image with Poisson Seamless Cloning

The next animation again shows the solution color images obtained at the first few iterations using Conjugate Gradient method while solving the linear system of equations for seamless cloning, applying the discrete Poisson solver on each channel.

Using the mixing gradients method the blending result obtained is far better, as shown below, it preserves the target texture.
Output Image with Poisson Mixing Gradients

The next animation again shows the solution color images obtained at the first few iterations using Conjugate Gradient method while solving the linear system of equations for mixing gradients, applying the discrete Poisson solver on each channel.

The next few figures show the results obtained using mixing gradients on another set of images, the seamless cloning does not work well in this case, but mixing gradientworks just fine.
Source Image

Target Image

Output Image with mixing gradients

The next animation again shows the solution color images obtained at the first few iterations using Conjugate Gradient method while solving the linear system of equations for mixing gradients, applying the discrete Poisson solver on each channel.

Creating Cartoon-like images: Texture Flattening
As illustrated in the paper, by retaining only the gradients at edge locations, before integrating with the Poisson solver, one washes out the texture of the selected region, giving its contents a flat aspect.
The following figures show the output cartoon-like image obtained using texture flattening, using the canny-edge detector to generate the mask.
Source Image

Mask Image created with Canny edge detector

Output cartoon obtained with texture flattening from the source with the mask

The next animation shows the solution color images obtained at the first few iterationsusing Conjugate Gradient method while solving the linear system of equations for texture flattening, applying the discrete Poisson solver on each channel.

Again, the next figures show the output cartoon-like image obtained using texture flattening, using the canny-edge detector to generate the mask on my image.
Source Image

Output image obtained with texture flattening

The next animation again shows the solution color images obtained at the first few iterations using Conjugate Gradient method while solving the linear system of equations for texture flattening, applying the discrete Poisson solver on each channel.

Some Variational Image Processing: Poisson Image Editing and its applications in Python
Sandipan DeyJul 21, 2017
SIR Epidemic model for influenza A (H1N1): Modeling the outbreak of the pandemic in Kolkata, West Bengal, India, 2010
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Summary
In this report, the spread of the pandemic influenza A (H1N1) that had an outbreak in Kolkata, West Bengal, India, 2010 is going to be simulated. The basic epidemic SIR model will be used, it describes three populations: a susceptible population, an infected population, and a recovered population and assumes the total population (sum of these 3 populations) as fixed over the period of study.
There are two parameters for this model: namely the attack rate (β) per infected person per day through contacts and the recovery rate (α). Initially there will be a small number of infected persons in the population. Now the following questions are to be answered with the simulation / analysis:
1. Whether the number of infected persons increase substantially, producing an epidemic, or the flue will fizzle out.
2. Assuming there is an epidemic, how will it end? Will there still be any susceptibles left when it is over?
3. How long will the epidemic last?
The Euler’s method will be primarily used to solve the system of differential equations for the SIR model and compute the equilibrium points (along with some
analytic solution attempts for a few simplified special cases). Here are the
conclusions obtained from the simulations:
1. When the recovery rate α is ≈ 0 or very very low compared to the attack rate β, the flu will turn out to be an epidemic and the entire population will be infected first (the higher β is the quicker the epidemic outbreak).
2. To be more precise, when the initial susceptible population S(0) is greater than the inverse of the basic reproduction number 1/ R0 = α / β, a proper epidemic will break out.
3. When the initial susceptible population S(0) is less than the inverse of the basic reproduction number 1/R0=α/β, then a proper epidemic will never break out.
4. If the initial susceptible population is non-zero, in the end (at equilibrium) there will always be some susceptible population.
5. When there is an epidemic, it will eventually end in the equilibrium point with 0 infected population, how fast it reaches the equilibrium depends upon the recovery rate (the higher α is the quicker the infection removal).
6. The time to reach the equilibrium can be computed using Euler’s method, it depends on the parameters α (the higher the quicker) and β (the higher the quicker) and the initial infected populated size I(0) (the higher the quicker).

Introduction
In 2010, the pandemic influenza A (H1N1) had an outbreak in Kolkata, West Bengal, India. An increased number of cases with influenza like illness (ILI) were reported in Greater Kolkata Metropolitan Area (GKMA) during July and August 2010, as stated in [3]. The main motivation for this research project will be to understand the spread of the pandemic, compute the equilibrium points and find the impact of the initial values of the infected rate and the attack / recovery rate parameters on the spread of the epidemic, using simulations using the basic epidemic SIR model.
The Euler’s method will be primarily used to solve the system of differential equations
for SIR model and compute the equilibrium points. First a few simplified special cases will be considered and both analytic and numerical methods (with Euler method) will be used to compute the equilibrium points. Then the solution for the generic model will be found. As described in [6], the SIR model can also be effectively used (in a broader
context) to model the propagation of computer virus in computer networks, particularly for the networks with Erdos-Renyi type random graph topology.
SIR Epidemic Model
The SIR model is an epidemiological model that computes the theoretical number of people infected with a contagious illness in a closed population over time. One of the basic one strain SIR models is Kermack-McKendrick Model. The Kermack-McKendrick Model is used to explain the rapid rise and fall in the number of infective patients observed in epidemics. It assumes that the population size is fixed (i.e., no births, no deaths due to disease nor by natural causes), incubation period of the infectious agent
is instantaneous, and duration of infectivity is the same as the length of the disease. It also assumes a completely homogeneous population with no age, spatial, or social structure.
The following figure 2.1 shows an electron microscope image of the re-assorted H1N1influenza virus photographed at the CDC Influenza Laboratory. The viruses are 80 − 120 nm in diameter [1].

2.1 Basic Mathematical Model
The starting model for an epidemic is the so-called SIR model, where S stands for susceptible population, the people that can be infected. I is the already infected population, the people that are contagious, and R stands for the recovered population, people who are not contagious any more.
2.1.1 Differential Equations
The SIR model can be defined using the following ordinary differential equations 2.1:

• The terms dS/dt , dI/dt , dR/dt in the differential equations indicate the rate of change of the susceptible population size, the infected population size and the recovered population size, respectively.
• The terms β and α indicate the attack rate (number of susceptible persons get infected per day) and the recovery rate of the flu (inverse of the number of days a person remains infected), respectively.
• High value of α means a person will be infected by the flu for less number of days and high value of β means that the epidemic will spread quickly.
• Also, as can be seen from below, from the differential equations it can be shown that the population (S + I + R) is assumed to be constant.

2.1.2 Collected data, units and values for the constants
• As can be seen from the following figure 2.2, the focus of this analysis will be limited to the population in Kolkata Metropolitan Corporation (KMC, XII) area where the population can be assumed to be ≈ 4.5 million or 4500 thousands, as per [7].
• Units
– All population (S, I, R) units will be in thousands persons (so that total population
N = 4500).
– As can be derived from the differential equations 2.1, the unit of β will be in
10^(−6) /persons/day (β = 25 will mean 25 persons in a million gets infected by
susceptible-infected contact per infected person per day).
– Similarly, the units of α will be in 10^(−3) / day (α = 167 will mean 167 × 10^(−3) /
day gets recovered from the flu per day).
• The attack rate is 20-29/100000 and the number of days infected (i.e. the inverse of recovery rate) = 5−7 days on average (with a few exceptions), as per [3].
• Typical values for β and α can be assumed to be 25 /person / day and 10^3/6 ≈ 167 / day, respectively.

2.2 Simplified Model 1 (with α = 0)
• At first a simplified model is is created assuming that α = 0 (/ day) and that R = 0, so once infected, a person stays contagious for ever. Because S(t) + I(t) + R(t) = S(t) + I(t) = N is constant (since population size N is fixed), S(t) can be eliminated and a single differential equation in just I(t) is obtained as shown in the equation below 2.2.

• Also, let the (fixed) population size N = 4500 = S(0) + I(0), (in thousand persons), initially the number of persons infected = I(0) = 1 (in thousand persons) and number of persons susceptible S(0) = N −I(0) = 4499 (in thousand persons), respectively. Let β = 25 × 10^(−6) /persons / day) to start with.
2.2.1 Analytic Solution
• The analytic solution can be found by following the steps shown in the Appendix A and the final solution is shown in the below equations 2.3:

• The following figure 2.3 shows the logistic (bounded) growth in I(t) (in thousands persons) w.r.t. the time (in days) for different values of attack rate β×10^(−6) (/ person / day). As expected, the higher the attack rate, the quicker all the persons in the population become infected.

2.2.2 Finding the equilibrium points for I
• The equilibrium points are the points where the rate of change in I is zero, the points that satisfy the following equation
https://sandipanweb.files.wordpress.com/2017/07/f5.png?w=150&h=86 150w” sizes=”(max-width: 236px) 100vw, 236px” />
• Considering a small neighborhood of the equilibrium point at I = 0, it can be seen from the figure 2.4 that whenever I > 0, dI/dt > 0, so I increases and goes away from the equilibrium point.
• Hence, the equilibrium point at I = 0 is unstable.
• At I = N = 4500 (in thousand persons) it is a stable equilibrium. As can be seen from the following figure 2.4, in a small neighborhood of the equilibrium point at I = 4500, it always increases / decreases towards the equilibrium point.
• In a small ε > 0 neighborhood at I = 4500 (in thousand persons),
1. dI/dt > 0, so I increases when I <= 4500 − ε .
2. dI/dt > 0, so I decreases when I >= 4500 + ε .
• The same can be observed from the direction fields from the figure 2.5.
• Hence, the equilibrium at I = 4500 is stable.


2.2.3 Numerical Solution with the Euler’s Method
• The algorithm (taken from the course slides) shown in the following figure 2.6 will be used for numerical computation of the (equilibrium) solution using the Euler’s method.

• As can be seen from the figure 2.6, then the infection at the next timestep can be (linearly) approximated (iteratively) by the summation of the the infection current timestep with the product of the difference in timestep and the derivative of the infection evaluated at the current timestep.
2.2.3.1 Finding the right step size (with β = 25 × 10^(−6)/person/day)
• In order to decide the best step size for the Euler method, first the Euler’s method is run with different step sizes as shown in the figure 2.7.
• As can be seen from the following table 2.1 and the figure 2.7, the largest differences in the value of I (with two consecutive step sizes) occurs around 78 days:


• As can be seen from the table in the Appendix B, the first time when the error becomes less than 1 person (in thousands) is with the step size 1/512 , hence this step size will be used for the Euler method.
2.2.3.2 Computing the (stable) equilibrium point
• Now, this timestep will be used to solve the problem to find the equilibrium time teq(in days). Find teq such that N − I(teq) < ε = 10^(−6) , the solution obtained is teq = 272.33398 days ≈ 273 days.
• Now, from the analytic solution 2.3 and the following figure 2.8, it can be verified that the teq solution that the Euler’s method obtained is pretty accurate (to the ε tolerance).

2.2.3.3 Results with β = 29 × 10^(−6) / person / day, I(0) = 1 person
• Following the same iterations as above, the steepest error is obtained at t = 67 days in this case, as shown in the figure 2.9.
• The first time when error becomes less than one person for t = 67 days with the Euler ‘s method is with step size 1/512 again.
• The solution obtained is teq = 234.76953125 days ≈ 235 days, so the equilibrium (when all the population becomes infected) is obtained earlier as expected, since the attack rate β is higher.

2.2.3.4 Results with β = 25 × 10−6 / person / day, with different initial values for infected persons (I(0))
• Following the same iterations as above, the equilibrium point is computed using the Euler’s method with different values of initial infected population I(0), as shown in the figure 2.10.
• The solutions obtained are teq = 272.33, 258.02, 251.85, 248.23, 245.66, 245.66 days for I(0) = 1, 5, 10, 15, 20 days, respectively. So the equilibrium is obtained earlier when the initial infected population size is higher, as expected.

2.3 Simplified Model 2 (with β = 0)
• Next, yet another simplified model is considered by assuming that β = 0 and that α > 0, so the flu can no more infect anyone (susceptible, if any, possibly because everyone got infected), an infected person recovers from flu with rate α. This situation can be described again with a single differential equation in just I(t) as shown in the equation below 2.4.
https://sandipanweb.files.wordpress.com/2017/07/f14.png?w=150&h=30 150w” sizes=”(max-width: 351px) 100vw, 351px” />
• Also, let the the entire population be infected, N = 4500 = I(0), (in thousand persons), initially the number of persons susceptible = S(0) = 0, respectively. Let α = 167 × 10^(−3)
(/ day) to start with.
2.3.1 Analytic Solution
• The analytic solution can be found by following the steps shown in the below equations 2.5:

• The following figure 2.11 shows the exponential decay in I(t) (in thousand persons) w.r.t. the time (in days) for different values of recovery rate α × 10^(−3) (/ day). As expected, the higher the recovery rate, the quicker all the persons in the population get rid of the infection.

• Now, I(t) + R(t) = N (since S(t) = 0 forever, since no more infection) and I(0) = N, combining with the above analytic solution I(t) = I(0).exp(−αt) = N.exp(−αt), the following equation is obtained:

• The following figure 2.12 shows the growth in R(t) (in thousand persons) w.r.t. the time (in days) for different values of recovery rate α × 10^(−3) (/ day). As expected, the higher the recovery rate, the quicker all the persons in the population move to the removed state.

2.3.2 Numerical Solution with the Euler’s Method
2.3.2.1 Solution with α = 167 × 10−3 / day
• Following the same iterations as above, the steepest error is obtained at t = 6 in this case, as shown in the figure 2.16.
• The first time when error becomes less than one person for t = 67 with the Euler’s method is with step size 1/256 .
• The solution obtained with the Euler’s method is 133.076171875 days ≈ 133 days to remove the infection from population with 10^(−6) tolerance. From the analytic solution,
I(133) = N.exp(−αt) = 1.016478E−06, similar result is obtained.
2.3.2.2 Results
The following figure 2.16 shows the solutions obtained with different step sizes using the Euler’s method.

2.4 Generic Model (with α, β > 0)
First, the numeric solution will be attempted for the generic model (using the Euler’s method) and then some analytic insights will be derived for the generic model.
2.4.1 Numerical Solution with the Euler’s Method
• The following algorithm 2.14 shown in the next figure is going to be used to obtain the solution using Euler method (the basic program for Euler’s method, adapted to include three dependent variables and three differential equations).

• As can be seen from the figure 2.14, first the vector X(0) is formed by combining the three variables S, I, R at timestep 0. Then value of the vector at the next timestep can be (linearly) approximated (iteratively) by the (vector) summation of the vector value at the current timestep with the product of the difference in timestep and the derivative of the
vector evaluated at the current timestep.
2.4.1.1 Equilibrium points
• At the equilibrium point,

There will be no infected person at the equilibrium point (infection should get removed).
• As can be seen from the following figure 2.15 also, I = 0 is an equilibrium point, which is quite expected, since in the equilibrium all the infected population will move to the removed state.
• Also, at every point the invariant S + I + R = N holds.
• In this particular case shown in figure 2.15, the susceptible population S also becomes 0 at equilibrium (since all the population got infected initially, all of them need to move to removed state) and R = N = 4500 (in thousand persons).

2.4.1.2 Results with Euler method
• As explained in the previous sections, the same iterative method is to find the right stepsize for the Euler method. The minimum of the two stepsizes determined is
∆t = 1/512 day and again this stepsize is going to be used for the Euler’s method.
• The following figures show the solutions obtained with different values of α, β with the initial infected population size I(0) = 1 (in thousand persons). Higher values for the parameter β obtained from the literature are used for simulation, since β = 25 × 10^(−6) /person /day is too small (with the results not interesting) for the growth of the epidemic using the Euler’s method (at least till ∆t = 1/ 2^15), after which the iterative Euler’s
method becomes very slow).
• As can be seen, from the figures 2.16, 2.17 and 2.19, at equilibrium, I becomes zero.
• The solution (number of days to reach equilibrium) obtained at α = 167×10^(−3) /day and β = 25×10^(−5) /person /day is teq = 143.35546875 ≈ 144 days with I(0) = 1 (in thousand persons), the corresponding figure is figure 2.16.
• The solution (number of days to reach equilibrium) obtained at α = 167 × 10^(−3) /day and β = 5 × 10^(−5) /person /day is teq ≈ 542 days with I(0) = 1 (in thousand persons), the corresponding figure is figure 2.17.
• Hence, higher the β value, the equilibrium is reached much earlier.
• The solution obtained at α = 500 × 10^(−3) /day and β = 25 × 10^(−5) /person /day is
teq ≈ 78 days with I(0) = 1 (in thousand persons), the corresponding figure is figure 2.19.
• Hence, higher the α value, the equilibrium is reached earlier.
• The solution obtained at α = 167×10^(−3) /day and β = 25×10^(−5) /person /day is
teq = 140 days with I(0) = 10. Hence, as expected, higher the number of initial infected population size, quicker the equilibrium is reached.
• At equilibrium, S does not necessarily become close to zero, since sometimes the entire population may not get infected ever, as shown in the figure 2.17, where at equilibrium the susceptible population is non-zero.




• As can be seen from the phase planes from following figure 2.21, at equilibrium, the infected population becomes 0.

2.4.2 Analytic Solution and Insights
2.4.2.1 Basic Reproduction Number (R0)
The basic reproduction number (also called basic reproduction ratio) is defined by
R0 = β / α (unit is /day). As explained in [2], this ratio is derived as the expected number of new infections (these new infections are sometimes called secondary infections) from a single infection in a population where all subjects are susceptible. How the dynamics of the system depends on R0 will be discussed next.
2.4.2.2 The dynamics of the system as a function of R0
• By dividing the first equation by the third in 2.1, as done in [2], the following equation is obtained:

• Now, at t → ∞, the equilibrium must have been already reached and all infections must have been removed, so that lim (t→∞) I(t) = 0.
• Also, let R∞ = lim (t→∞) R(t).
• Then from the above equation 2.7, R∞ = N − S(0).exp(R0.(R∞−R(0)))
.
• As explained in [2], the above equation shows that at the end of an epidemic, unless
S(0) = 0, not all individuals of the population have recovered, so some must remain susceptible.
• This means that the end of an epidemic is caused by the decline in the number of infected individuals rather than an absolute lack of susceptible subjects [2].
• The role of the basic reproduction number is extremely important, as explained in [2]. From the differential equation, the following equation can be obtained:

• S(t) > 1/R0 ⇒ dI(t)/dt > 0 ⇒ there will be a proper epidemic outbreak with an increase of the number of the infectious (which can reach a considerable fraction of the population).
• S(t) < 1 R0 ⇒ dI(t) dt < 0 ⇒ independently from the initial size of the susceptible population the disease can never cause a proper epidemic outbreak.
• As can be seen from the following figures 2.21 and 2.22 (from the simulation results obtained with Euler method), when S(0) > 1/R0 , there is a peak in the infection curve, indicating a proper epidemic outbreak.
• Also, from the figures 2.21 and 2.22, when S(0) > 1/R0 , the higher the the gap between S(0) and 1/R0 , the higher the peak is (the more people get infected) and the quicker the peak is attained.
• Again, from the figure 2.22, when 4490 = S(0) < 1/R0 = 5000, it never causes a proper epidemic outbreak .
https://sandipanweb.files.wordpress.com/2017/07/f30.png?w=197&h=300 197w” sizes=”(max-width: 619px) 100vw, 619px” />
• Again, by dividing the second equation by the first in 2.1, the following equation is obtained:

• As can be noticed from the above figure 2.23 that because the formulas differ only by an additive constant, these curves are all vertical translations of each other.
• The line I(t) = 0 consists of equilibrium points.
• Starting out at a point on one of these curves with I(t) > 0, as time goes on one needs to travel along the curve to the left (because dS/dt < 0), eventually approaching at some positive value of S(t).
• This must happen since on any of these curves, as I(t) → ∞, as S(t) → 0, from equation 2.8.
• So the answer to question (2) is that the epidemic will end as with approaching some positive value and thus there must always be some susceptibles left over.
• As can be seen from the following figure 2.24 (from the simulation results obtained with the Euler’s method), when S(0) > 1/R0 , lesser the the gap between S(0) and 1/R0 , the higher the population remains susceptible at equilibrium (or at t → ∞).

Conclusions
In this report, the spread of the pandemic influenza A (H1N1) that had an outbreak in Kolkata, West Bengal, India, 2010 was simulated using the basic epidemic SIR model.Initially there will be a small number of infected persons in the population, most of the population had susceptible persons (still not infected but prone to be infected) and zero removed persons. Given the initial values of the variables and the parameter (attack and recovery rates of the flu) values, the following questions were attempted to be answered with the simulation / analysis:
1. Whether the number of infected persons increase substantially, producing an epidemic, or the flue will fizzle out.
2. Assuming there is an epidemic, how will it end? Will there still be any susceptibles left when it is over?
3. How long will the epidemic last?
The following conclusions are obtained after running the simulations with
different values of the parameters and the initial values of the variables:
1. When the recovery rate α is ≈ 0 or very very low compared to the attack rate β (so that R0 = β / α >> 1) and I(0) > 1, the flu will turn out to be an epidemic and the entire population will be infected first (the higher β is the quicker the epidemic break out).
2. To be more precise, when the initial susceptible population S(0) is greater than the inverse of the basic reproduction number 1/R0 = α / β, a proper epidemic will break out.
3. When the initial susceptible population S(0) is less than the inverse of the basic reproduction number 1/R0 = α/β, then a proper epidemic will never break out.
4. If the initial susceptible population is non-zero, in the end (at equilibrium) there will always be some susceptible population.
5. When there is an epidemic, it will eventually end in the equilibrium point with 0 infected population, how fast it reaches the equilibrium depends upon the recovery rate (the higher α is the quicker the infection removal).
6. The time to reach the equilibrium can be computed using Euler method, it depends on the parameters α (the higher the quicker) and β (the higher the quicker) and the initial infected populated size I(0) (the higher the quicker).
7. Scope of improvement: The SIR model could be extended to The Classic Endemic Model [5] where the birth and the death rates are also considered for the population (this will be particularly useful when a disease takes a long time to reach the equilibrium state).
https://sandipanweb.files.wordpress.com/2017/07/f35.png?w=137 137w, https://sandipanweb.files.wordpress.com/2017/07/f35.png?w=274 274w” sizes=”(max-width: 669px) 100vw, 669px” />


SIR Epidemic model for influenza A (H1N1): Modeling the outbreak of the pandemic in Kolkata, West Bengal, India, 2010
Recent Comments