# An Open Science Project on Statistics: Doing the power analysis, equivalence test, NHST t-test, POST and computing the Bayes Factor to compare the IMDB Ratings of a few most recent movies by the legendary film directors Satyajit Ray and Akira Kurosawa (in R)

The following appeared as a project assignment (using Open Science Framework) in the coursera course Improving your Statistical Inference. The project is available here.

First we need to do pre-registration to control  the (type-I) error rates and reduce publication bias, as required by the OSF and shown below:

## Theoretical hypothesis

The theoretical hypothesis we are going to test is the following: both Satyajit Ray (from Kolkata, India) and Akira Kurosawa (from Japan) are great directors, both of them won the Academy Award for their Lifetime Achievement. Because they are both great, the movies they directed are equally good.

## Dependent Variables to be measured

• The dependent variables to be measured are the IMDB ratings (scores), # Users rated each movie.
• First IMDB search will be used separately for the two legendary directors separately to get all the hits.
• Then the search results will be sorted based on the release date, and 29 most recent full movies (excluding documentaries / TV series) will be used.
• So in this case, we shall use the 29 last movies Satyajit Ray and Akira Kurosawa directed in from today (excluding documentaries / TV series), the moment we did the IMDB search.
• The following table shows the data collected for Satyajit Ray movies.
Movie Rating (Out of 10) #Users Rated Release Year
The Stranger (Agantuk) 8.1 1760 1991
Shakha Prosakha 7.6 453 1990
Ganashatru 7.3 662 1989
Ghare Baire 7.7 812 1984
Hirak Rajar Deshe 8.8 1387 1980
Jai Baba Felunath 7.9 1086 1979
Shatranj Ke Khilari 7.8 2370 1977
Jana Aranya 8.3 887 1976
Sonar Kella 8.5 1308 1974
Distant Thunder (Ashani Sanket) 8.2 908 1973
Company Limited (Seemabaddha) 8.0 782 1971
Pratidwandi 8.2 1051 1970
Days and Nights in the Forest (Aranyer Din Ratri) 8.3 1720 1970
Goopy Gayen Bagha Bayen 8.8 1495 1969
Chiriyakhana 7.2 477 1967
Nayak 8.3 1974 1966
Mahapurush 7.3 719 1965
The Coward (Kapurush) 7.8 858 1965
Charulata 8.3 3597 1964
Mahanagar 8.3 2275 1963
Abhijaan 8.0 781 1962
Kanchenjungha 8.0 706 1962
Teen Kanya 8.2 991 1961
Devi 8.0 1407 1960
The World of Apu (Apur Sansar) 8.2 8058 1959
The Music Room (Jalshaghar) 8.1 3872 1958
Paras-Pathar 7.8 723 1958
Aparajito 8.2 7880 1956
Pather Panchali 8.4 15799 1955
• The following table shows the data collected for Akira Kurosawa movies.
Movie Rating (Out of 10) #Users Rated Release Year
Rhapsody in August 7.3 5131 1991
Dreams 7.8 19373 1990
Ran 8.2 84277 1985
Kagemusha 8.0 25284 1980
Dersu Uzala 8.3 18898 1975
Dodes’ka-den 7.5 4839 1970
Red Beard 8.3 12295 1965
High and Low 8.4 19989 1963
Sanjuro 8.2 22296 1962
Yojimbo 8.3 80906 1961
The Bad Sleep Well 8.1 8082 1960
The Hidden Fortress 8.1 25980 1958
The Lower Depths 7.5 3776 1957
Throne of Blood 8.1 34723 1957
I Live in Fear 7.4 3090 1955
Seven Samurai 8.7 247406 1954
Ikiru 8.3 46692 1952
The Idiot 7.4 3533 1951
Rashomon 8.3 112668 1950
Scandal 7.4 2580 1950
Stray Dog 7.9 11789 1949
The Quiet Duel 7.5 2131 1949
Drunken Angel 7.8 7422 1948
One Wonderful Sunday 7.3 1988 1947
Waga seishun ni kuinashi 7.2 2158 1946
Asu o tsukuru hitobito 6.6 119 1946
The Men Who Tread on the Tiger’s Tail 6.8 2567 1945
Zoku Sugata Sanshirô 6.2 1419 1945
Zoku Sugata Sanshirô 5.8 1229 1944

## Justify the sample size

• We want to predict no difference, and thus we shall do a power analysis for an equivalence test. We want to be pretty sure that we can reject our smallest effect size of interest, so We shall design a study with 84% power. For this educational assignment, we do not collect a huge amount of data.
• As long as we can exclude a large effect (Cohen’s d=0.8 or larger) we shall be happy for this assignment.
• The power analysis estimates that the sample size we need to show the difference between the ratings for movies directed by Satyajit Ray and Akira Kurosawa is smaller than Cohen’s d=0.8 (assuming the true effect size is 0, and with n α of 0.05, when we aim for 84power) is 29 movie ratings from Satyajit Ray, and 29 movie ratings from Akira Kurosawa, as can be seen from the following R code and the figures.
• The αα-level I found acceptable is 0.05.
• we performed a two-sided test.
• we used 84% power for this study.
• The effect size expected is 0.78948 0.8, as shown below.
• Given that Satyajit Ray has a total 29 full movies directed, we can only collect 29 observations for him, also we collected equal amount of sample
data (29 movies) for each of the directors.

The following theory is going to be used for the statistical tests:

Results

• As can be seen from above, the sample size required to obtain 84power is 29.

## Specify the statistical test to conduct

• We need to translate our theoretical hypothesis to a statistical hypothesis.
• Let’s calculate the (90%) CI around the effect size.
• When the 90% CI falls below, and excludes a Cohen’s d of 0.8, we can consider the ratings of the movies directed by Satyajit Ray and Akira Kurosawa as equivalent.

• As can be seen from the NHST test above that the effects are statistically significant, since 90% confidence interval around the effect size does not contain 0.
• Also, the TOST procedure results shown above indicates that the observed effect size d=0.69 was not significantly within the equivalent bounds of d=-0.8 and d=0.8t(29)=2.86p=0.997.
• Also, the 90% CI (0.24,1.14) around the effect size includes a Cohen’s d of 0.8, hence, we can consider the ratings of the movies directed by Satyajit Ray and Akira Kurosawa as not equivalent.
• Hence, the effect is statistically significant, but not statistically equivalent.
• Supporting the alternative with Bayes Factors: As can be seen from the following results, the Bayes Factor 50.17844 increases our belief in the alternative hypothesis (H1) over the null hypothesis (H0), starting with small prior belief 0.2 on the effect size.
• The following code is taken from the course itself and modified as required and it’s originally written / protected by © Professor Daniel Lakens, 2016 and licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (https://creativecommons.org/licenses/by-nc-sa/4.0/).

# SIR Epidemic model for influenza A (H1N1): Modeling the outbreak of the pandemic in Kolkata, West Bengal, India in 2010 (Simulation in Python & R)

This appeared as a project in the edX course DelftX: MathMod1x Mathematical Modelling Basics and the project report can be found here. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Summary

In this report, the spread of the pandemic influenza A (H1N1) that had an outbreak in Kolkata, West Bengal, India, 2010 is going to be simulated. The basic epidemic SIR model will be used, it describes three populations: a susceptible population, an infected population, and a recovered population and assumes the total population (sum of these 3 populations) as fixed over the period of study.

There are two parameters for this model: namely the attack rate (β) per infected person per day through contacts and the recovery rate (α). Initially there will be a small number of infected persons in the population. Now the following questions are to be answered with the simulation / analysis:

1. Whether the number of infected persons increase substantially, producing an epidemic, or the flue will fizzle out.
2. Assuming there is an epidemic, how will it end? Will there still be any susceptibles left when it is over?
3. How long will the epidemic last?

The Euler’s method will be primarily used to solve the system of differential equations for the SIR model and compute the equilibrium points (along with some
analytic solution attempts for a few simplified special cases). Here are the
conclusions obtained from the simulations:

1. When the recovery rate α is ≈ 0 or very very low compared to the attack rate β, the flu will turn out to be an epidemic and the entire population will be infected first (the higher β is the quicker the epidemic outbreak).

2. To be more precise, when the initial susceptible population S(0) is greater than the inverse of the basic reproduction number 1/ R0 = α / β, a proper epidemic will break out.

3. When the initial susceptible population S(0) is less than the inverse of the basic reproduction number 1/R0=α/β, then a proper epidemic will never break out.

4. If the initial susceptible population is non-zero, in the end (at equilibrium) there will always be some susceptible population.

5. When there is an epidemic, it will eventually end in the equilibrium point with 0 infected population, how fast it reaches the equilibrium depends upon the recovery rate (the higher α is the quicker the infection removal).

6. The time to reach the equilibrium can be computed using Euler’s method, it depends on the parameters α (the higher the quicker) and β (the higher the quicker) and the initial infected populated size I(0) (the higher the quicker).

### Introduction

In 2010, the pandemic influenza A (H1N1) had an outbreak in Kolkata, West Bengal, India. An increased number of cases with influenza like illness (ILI) were reported in Greater Kolkata Metropolitan Area (GKMA) during July and August 2010, as stated in [3]. The main motivation for this research project will be to understand the spread of the pandemic, compute the equilibrium points and find the impact of the initial values of the infected rate and the attack / recovery rate parameters on the spread of the epidemic, using simulations using the basic epidemic SIR model.

The Euler’s method will be primarily used to solve the system of differential equations
for SIR model and compute the equilibrium points. First a few simplified special cases will be considered and both analytic and numerical methods (with Euler method) will be used to compute the equilibrium points. Then the solution for the generic model will be found. As described in [6], the SIR model can also be effectively used (in a broader
context) to model the propagation of computer virus in computer networks, particularly for the networks with Erdos-Renyi type random graph topology.

### SIR Epidemic Model

The SIR model is an epidemiological model that computes the theoretical number of people infected with a contagious illness in a closed population over time. One of the basic one strain SIR models is Kermack-McKendrick Model. The Kermack-McKendrick Model is used to explain the rapid rise and fall in the number of infective patients observed in epidemics. It assumes that the population size is fixed (i.e., no births, no deaths due to disease nor by natural causes), incubation period of the infectious agent
is instantaneous, and duration of infectivity is the same as the length of the disease. It also assumes a completely homogeneous population with no age, spatial, or social structure.

The following figure 2.1 shows an electron microscope image of the re-assorted H1N1 influenza virus photographed at the CDC Influenza Laboratory. The viruses are 80 − 120 nm in diameter [1].

2.1 Basic Mathematical Model

The starting model for an epidemic is the so-called SIR model, where S stands for susceptible population, the people that can be infected. I is the already infected population, the people that are contagious, and R stands for the recovered population, people who are not contagious any more.

2.1.1 Differential Equations

The SIR model can be defined using the following ordinary differential equations 2.1:

• The terms dS/dtdI/dtdR/dt in the differential equations indicate the rate of change of the susceptible population size, the infected population size and the recovered population size, respectively.

• The terms β and α indicate the attack rate (number of susceptible persons get infected per day) and the recovery rate of the flu (inverse of the number of days a person remains infected), respectively.

• High value of α means a person will be infected by the flu for less number of days and high value of β means that the epidemic will spread quickly.

• Also, as can be seen from below, from the differential equations it can be shown that the population (S + I + R) is assumed to be constant.

2.1.2 Collected data, units and values for the constants

• As can be seen from the following figure 2.2, the focus of this analysis will be limited to  the population in Kolkata Metropolitan Corporation (KMC, XII) area where the population can be assumed to be ≈ 4.5 million or 4500 thousands, as per [7].

Units

– All population (S, I, R) units will be in thousands persons (so that total population
N = 4500).
– As can be derived from the differential equations 2.1, the unit of β will be in
10^(−6) /persons/day (β = 25 will mean 25 persons in a million gets infected by
susceptible-infected contact per infected person per day).
– Similarly, the units of α will be in 10^(−3) / day (α = 167 will mean 167 × 10^(−3) /
day gets recovered from the flu per day).

• The attack rate is 20-29/100000 and the number of days infected (i.e. the inverse of recovery rate) = 5−7 days on average (with a few exceptions), as per [3].

• Typical values for β and α can be assumed to be 25 /person / day and 10^3/6 ≈ 167 / day, respectively.

2.2 Simplified Model 1 (with α = 0)

• At first a simplified model is is created assuming that α = 0 (/ day) and that R = 0, so once infected, a person stays contagious for ever. Because S(t) + I(t) + R(t) = S(t) + I(t) = N is constant (since population size N is fixed), S(t) can be eliminated and a single differential equation in just I(t) is obtained as shown in the equation below 2.2.

• Also, let the (fixed) population size N = 4500 = S(0) + I(0), (in thousand persons), initially the number of persons infected = I(0) = 1 (in thousand persons) and number of persons susceptible S(0) = N −I(0) = 4499 (in thousand persons), respectively. Let β = 25 × 10^(−6) /persons / day) to start with.

2.2.1 Analytic Solution

• The analytic solution can be found by following the steps shown in the Appendix A and the final solution is shown in the below equations 2.3:

• The following figure 2.3 shows the logistic (bounded) growth in I(t) (in thousands persons) w.r.t. the time (in days) for different values of attack rate β×10^(−6) (/ person / day). As expected, the higher the attack rate, the quicker all the persons in the population become infected.

2.2.2 Finding the equilibrium points for I

• The equilibrium points are the points where the rate of change in I is zero, the points that satisfy the following equation

• Considering a small neighborhood of the equilibrium point at I = 0, it can be seen from the figure 2.4 that whenever I > 0, dI/dt > 0, so I increases and goes away from the equilibrium point.

• Hence, the equilibrium point at I = 0 is unstable.

• At I = N = 4500 (in thousand persons) it is a stable equilibrium. As can be seen from the following figure 2.4, in a small neighborhood of the equilibrium point at I = 4500, it always increases / decreases towards the equilibrium point.

• In a small  ε > 0 neighborhood at I = 4500 (in thousand persons),

1. dI/dt > 0, so I increases when I <= 4500 − ε .
2. dI/dt > 0, so I decreases when I >= 4500 +  ε .

• The same can be observed from the direction fields from the figure 2.5.

• Hence, the equilibrium at I = 4500 is stable.

2.2.3 Numerical Solution with the Euler’s Method

• The algorithm (taken from the course slides) shown in the following figure 2.6 will be used for numerical computation of the (equilibrium) solution using the Euler’s method.

• As can be seen from the figure 2.6, then the infection at the next timestep can be (linearly) approximated (iteratively) by the summation of the the infection current timestep with the product of the difference in timestep and the derivative of the infection evaluated at the current timestep.

2.2.3.1 Finding the right step size (with β = 25 × 10^(−6)/person/day)

• In order to decide the best step size for the Euler method, first the Euler’s method is run with different step sizes as shown in the figure 2.7.

• As can be seen from the following table 2.1 and the figure 2.7, the largest differences in the value of I (with two consecutive step sizes) occurs around 78 days:

• As can be seen from the table in the Appendix B, the first time when the error becomes less than 1 person (in thousands) is with the step size 1/512 , hence this step size will be used for the Euler method.

2.2.3.2 Computing the (stable) equilibrium point

• Now, this timestep will be used to solve the problem to find the equilibrium time teq (in days). Find teq such that N − I(teq) < ε = 10^(−6) , the solution obtained is teq = 272.33398 days ≈ 273 days.

• Now, from the analytic solution 2.3 and the following figure 2.8, it can be verified that the teq solution that the Euler’s method obtained is pretty accurate (to the ε tolerance).

2.2.3.3 Results with β = 29 × 10^(−6) / person / day, I(0) = 1 person

• Following the same iterations as above, the steepest error is obtained at t = 67 days in this case, as shown in the figure 2.9.

• The first time when error becomes less than one person for t = 67 days with the Euler ‘s method is with step size 1/512 again.

• The solution obtained is teq = 234.76953125 days ≈ 235 days, so the equilibrium (when all the population becomes infected) is obtained earlier as expected, since the attack rate β is higher.

2.2.3.4 Results with β = 25 × 10^(−6) / person / day, with different initial values for infected persons (I(0))

• Following the same iterations as above, the equilibrium point is computed using the Euler’s method with different values of initial infected population I(0), as shown in the figure 2.10.

• The solutions obtained are teq = 272.33, 258.02, 251.85, 248.23, 245.66, 245.66 days for I(0) = 1, 5, 10, 15, 20 days, respectively. So the equilibrium is obtained earlier when the initial infected population size is higher, as expected.

2.3 Simplified Model 2 (with β = 0)

• Next, yet another simplified model is considered by assuming that β = 0 and that α > 0, so the flu can no more infect anyone (susceptible, if any, possibly because everyone got infected), an infected person recovers from flu with rate α. This situation can be described again with a single differential equation in just I(t) as shown in the equation below 2.4.

• Also, let the the entire population be infected, N = 4500 = I(0), (in thousand persons), initially the number of persons susceptible = S(0) = 0, respectively. Let α = 167 × 10^(−3)   (/ day) to start with.

2.3.1 Analytic Solution

• The analytic solution can be found by following the steps shown in the below equations 2.5:

• The following figure 2.11 shows the exponential decay in I(t) (in thousand persons)  w.r.t. the time (in days) for different values of recovery rate α × 10^(−3)  (/ day). As expected, the higher the recovery rate, the quicker all the persons in the population get rid of the infection.

• Now, I(t) + R(t) = N (since S(t) = 0 forever, since no more infection) and I(0) = N, combining with the above analytic solution I(t) = I(0).exp(−αt) = N.exp(−αt), the following equation is obtained:

• The following figure 2.12 shows the growth in R(t) (in thousand persons) w.r.t. the time (in days) for different values of recovery rate α × 10^(−3) (/ day). As expected, the higher the recovery rate, the quicker all the persons in the population move to the removed state.

2.3.2 Numerical Solution with the Euler’s Method

2.3.2.1 Solution with α = 167 × 10−3 / day

• Following the same iterations as above, the steepest error is obtained at t = 6 in this case, as shown in the figure 2.16.

• The first time when error becomes less than one person for t = 67 with the Euler’s method is with step size 1/256 .

• The solution obtained with the Euler’s method is 133.076171875 days ≈ 133 days to remove the infection from population with 10^(−6) tolerance. From the analytic solution,
I(133) = N.exp(−αt) = 1.016478E−06, similar result is obtained.

2.3.2.2 Results

The following figure 2.16 shows the solutions obtained with different step sizes using the Euler’s method.

2.4 Generic Model (with α, β > 0)

First, the numeric solution will be attempted for the generic model (using the Euler’s method) and then some analytic insights will be derived for the generic model.

2.4.1 Numerical Solution with the Euler’s  Method

• The following algorithm 2.14 shown in the next figure is going to be used to obtain the solution using Euler method (the basic program for Euler’s method, adapted to include three dependent variables and three differential equations).

• As can be seen from the figure 2.14, first the vector X(0) is formed by combining the three variables S, I, R at timestep 0. Then value of the vector at the next timestep can be (linearly) approximated (iteratively) by the (vector) summation of the vector value at the current timestep with the product of the difference in timestep and the derivative of the
vector evaluated at the current timestep.

2.4.1.1 Equilibrium points

• At the equilibrium point,

There will be no infected person at the equilibrium point (infection should get removed).

• As can be seen from the following figure 2.15 also, I = 0 is an equilibrium point, which is quite expected, since in the equilibrium all the infected population will move to the removed state.

• Also, at every point the invariant S + I + R = N holds.

• In this particular case shown in figure 2.15, the susceptible population S also becomes 0 at equilibrium (since all the population got infected initially, all of them need to move to removed state) and R = N = 4500 (in thousand persons).

2.4.1.2 Results with Euler method

• As explained in the previous sections, the same iterative method is to find the right stepsize for the Euler method. The minimum of the two stepsizes determined is
∆t = 1/512 day and again this stepsize is going to be used for the Euler’s method.

• The following figures show the solutions obtained with different values of α, β with the initial infected population size I(0) = 1 (in thousand persons). Higher values for the parameter β obtained from the literature are used for simulation, since β = 25 × 10^(−6) /person /day is too small (with the results not interesting) for the growth of the epidemic using the Euler’s method (at least till ∆t = 1/ 2^15), after which the iterative Euler’s
method becomes very slow).

• As can be seen, from the figures 2.16, 2.17 and 2.19, at equilibrium, I becomes zero.

• The solution (number of days to reach equilibrium) obtained at α = 167×10^(−3) /day and β = 25×10^(−5) /person /day is teq = 143.35546875 ≈ 144 days with I(0) = 1 (in thousand persons), the corresponding figure is figure 2.16.

• The solution (number of days to reach equilibrium) obtained at α = 167 × 10^(−3) /day and β = 5 × 10^(−5) /person /day is teq ≈ 542 days with I(0) = 1 (in thousand persons), the corresponding figure is figure 2.17.

• Hence, higher the β value, the equilibrium is reached much earlier.

• The solution obtained at α = 500 × 10^(−3) /day and β = 25 × 10^(−5) /person /day is
teq ≈ 78 days with I(0) = 1 (in thousand persons), the corresponding figure is figure 2.19.

• Hence, higher the α value, the equilibrium is reached earlier.

• The solution obtained at α = 167×10^(−3) /day and β = 25×10^(−5) /person /day is
teq = 140 days with I(0) = 10. Hence, as expected, higher the number of initial infected population size, quicker the equilibrium is reached.

• At equilibrium, S does not necessarily become close to zero, since sometimes the entire  population may not get infected ever, as shown in the figure 2.17, where at equilibrium the susceptible population is non-zero.

• As can be seen from the phase planes from following figure 2.21, at equilibrium, the infected population becomes 0.

2.4.2 Analytic Solution and Insights

2.4.2.1 Basic Reproduction Number (R0)

The basic reproduction number (also called basic reproduction ratio) is defined by
R0 = β / α (unit is /day). As explained in [2], this ratio is derived as the expected number of new infections (these new infections are sometimes called secondary infections) from a single infection in a population where all subjects are susceptible. How the dynamics of the system depends on R0 will be discussed next.

2.4.2.2 The dynamics of the system as a function of R0

• By dividing the first equation by the third in 2.1, as done in [2], the following equation is obtained:

• Now, at t → ∞, the equilibrium must have been already reached and all infections must have been removed, so that lim (t→∞) I(t) = 0.

• Also, let R∞ = lim (t→∞) R(t).

• Then from the above equation 2.7, R∞ = N − S(0).exp(R0.(R∞−R(0)))
.
• As explained in [2], the above equation shows that at the end of an epidemic, unless
S(0) = 0, not all individuals of the population have recovered, so some must remain  susceptible.

• This means that the end of an epidemic is caused by the decline in the number of infected individuals rather than an absolute lack of susceptible subjects [2].

• The role of the basic reproduction number is extremely important, as explained in [2]. From the differential equation, the following equation can be obtained:

S(t) > 1/R0 ⇒ dI(t)/dt > 0 ⇒ there will be a proper epidemic outbreak with an increase of the number of the infectious (which can reach a considerable fraction of the population).

S(t) < 1 R0 ⇒ dI(t) dt < 0 ⇒ independently from the initial size of the susceptible population the disease can never cause a proper epidemic outbreak.

• As can be seen from the following figures 2.21 and 2.22 (from the simulation results obtained with Euler method), when S(0) > 1/R0 , there is a peak in the infection curve, indicating a proper epidemic outbreak.

• Also, from the figures 2.21 and 2.22, when S(0) > 1/R0 , the higher the the gap between S(0) and 1/R0 , the higher the peak is (the more people get infected) and the quicker the peak is attained.

• Again, from the figure 2.22, when 4490 = S(0) < 1/R0 = 5000, it never causes a proper epidemic outbreak .

• Again, by dividing the second equation by the first in 2.1, the following equation is obtained:

• As can be noticed from the above figure 2.23 that because the formulas differ only by an additive constant, these curves are all vertical translations of each other.

• The line I(t) = 0 consists of equilibrium points.

• Starting out at a point on one of these curves with I(t) > 0, as time goes on one needs to travel along the curve to the left (because dS/dt < 0), eventually approaching at some positive value of S(t).

• This must happen since on any of these curves, as I(t) → ∞, as S(t) → 0, from equation 2.8.

• So the answer to question (2) is that the epidemic will end as with approaching some positive value and thus there must always be some susceptibles left over.

• As can be seen from the following figure 2.24 (from the simulation results obtained with the Euler’s method), when S(0) > 1/R0 , lesser the the gap between S(0) and 1/R0 , the higher the population remains susceptible at equilibrium (or at t → ∞).

Conclusions

In this report, the spread of the pandemic influenza A (H1N1) that had an outbreak in Kolkata, West Bengal, India, 2010 was simulated using the basic epidemic SIR model.Initially there will be a small number of infected persons in the population, most of the population had susceptible persons (still not infected but prone to be infected) and zero removed persons. Given the initial values of the variables and the parameter (attack and recovery rates of the flu) values, the following questions were attempted to be answered with the simulation / analysis:

1. Whether the number of infected persons increase substantially, producing an epidemic, or the flue will fizzle out.

2. Assuming there is an epidemic, how will it end? Will there still be any susceptibles left when it is over?

3. How long will the epidemic last?
The following conclusions are obtained after running the simulations with
different values of the parameters and the initial values of the variables:

1. When the recovery rate α is ≈ 0 or very very low compared to the attack rate β (so that R0 = β / α >> 1) and I(0) > 1, the flu will turn out to be an epidemic and the entire population will be infected first (the higher β is the quicker the epidemic break out).

2. To be more precise, when the initial susceptible population S(0) is greater than the inverse of the basic reproduction number 1/R0 = α / β, a proper epidemic will break out.

3. When the initial susceptible population S(0) is less than the inverse of the basic reproduction number 1/R0 = α/β, then a proper epidemic will never break out.

4. If the initial susceptible population is non-zero, in the end (at equilibrium) there will always be some susceptible population.

5. When there is an epidemic, it will eventually end in the equilibrium point with 0 infected population, how fast it reaches the equilibrium depends upon the recovery rate (the higher α is the quicker the infection removal).

6. The time to reach the equilibrium can be computed using Euler method, it depends on the parameters α (the higher the quicker) and β (the higher the quicker) and the initial infected populated size I(0) (the higher the quicker).

7. Scope of improvement: The SIR model could be extended to The Classic Endemic Model [5] where the birth and the death rates are also considered for the population (this will be particularly useful when a disease takes a long time to reach the equilibrium state).

# Deep Learning with TensorFlow in Python

The following problems appeared in the first few assignments in the Udacity course Deep Learning (by Google). The descriptions of the problems are taken from the assignments.

## Classifying the letters with notMNIST dataset

Let’s first learn about simple data curation practices, and familiarize ourselves with some of the data that are going to be used for deep learning using tensorflow. The notMNIST dataset to be used with python experiments. This dataset is designed to look like the classic MNIST dataset, while looking a little more like real data: it’s a harder task, and the data is a lot less ‘clean’ than MNIST.

### Preprocessing

First the dataset needs to be downloaded and extracted to a local machine. The data consists of characters rendered in a variety of fonts on a 28×28 image. The labels are limited to ‘A’ through ‘J’ (10 classes). The training set has about 500k and the testset 19000 labelled examples. Given these sizes, it should be possible to train models quickly on any machine.

Let’s take a peek at some of the data to make sure it looks sensible. Each exemplar should be an image of a character A through J rendered in a different font. Display a sample of the images downloaded.

Now let’s load the data in a more manageable format. Let’s convert the entire dataset into a 3D array (image index, x, y) of floating point values, normalized to have approximately zero mean and standard deviation ~0.5 to make training easier down the road. A few images might not be readable, we’ll just skip them. Also the data is expected to be balanced across classes. Let’s verify that. The following output shows the dimension of the ndarray for each class.

(52909, 28, 28)
(52911, 28, 28)
(52912, 28, 28)
(52911, 28, 28)
(52912, 28, 28)
(52912, 28, 28)
(52912, 28, 28)
(52912, 28, 28)
(52912, 28, 28)
(52911, 28, 28)

Now some more preprocessing is needed:

1. First the training data (for different classes) needs to be merged and pruned.
2. The labels will be stored into a separate array of integers from 0 to 9.
3. Also let’s create a validation dataset for hyperparameter tuning.
4. Finally the data needs to be randomized. It’s important to have the labels well shuffled for the training and test distributions to match.

After preprocessing, let’s peek a few samples from the training dataset and the next figure shows how it looks.

Here is how the validation dataset looks (a few samples chosen).

Here is how the test dataset looks (a few samples chosen).

### Trying a few off-the-shelf classifiers

Let’s get an idea of what an off-the-shelf classifier can give us on this data. It’s always good to check that there is something to learn, and that it’s a problem that is not so trivial that a canned solution solves it. Let’s first train a simple LogisticRegression model from sklearn (using default parameters) on this data using 5000 training samples.

The following figure shows the output from the logistic regression model trained, its accuracy on the test dataset (also the confusion matrix) and a few test instances classified wrongly (predicted labels along with the true labels) by the model.

### Deep Learning with TensorFlow

Now let’s progressively train deeper and more accurate models using TensorFlow. Again, we need to do the following preprocessing:

1. Reformat into a shape that’s more adapted to the models we’re going to train: data as a flat matrix,
2. labels as 1-hot encodings.

Now let’s train a multinomial logistic regression using simple stochastic gradient descent.

TensorFlow works like this:

First we need to describe the computation that is to be performed: what the inputs, the variables, and the operations look like. These get created as nodes over a computation graph.

Then we can run the operations on this graph as many times as we want by calling session.run(), providing it outputs to fetch from the graph that get returned.

1. Let’s load all the data into TensorFlow and build the computation graph corresponding to our training.
2. Let’s use stochastic gradient descent training (with ~3k steps), which is much faster. The graph will be similar to batch gradient descent, except that instead of holding all the training data into a constant node, we create a Placeholder node which will be fed actual data at every call of session.run().

### Logistic Regression with SGD

The following shows the fully connected computation graph and the results obtained.

### Neural Network with SGD

Now let’s turn the logistic regression example with SGD into a 1-hidden layer neural network with rectified linear units nn.relu() and 1024 hidden nodes. As can be seen from the below results, this model improves the validation / test accuracy.

### Regularization with Tensorflow

Previously we trained a logistic regression and a neural network model with Tensorflow. Let’s now explore the regularization techniques.

Let’s introduce and tune L2regularization for both logistic and neural network models. L2  amounts to adding a penalty on the norm of the weights to the loss. In TensorFlow, we can compute the L2loss for a tensor t using nn.l2_loss(t).

The right amount of regularization improves the validation / test accuracy, as can be seen from the following results.

### L2 Regularized Logistic Regression with SGD

The following figure recapitulates the simple network without anyt hidden layer, with softmax outputs.

The next figures visualize the weights learnt for the 10 output neurons at different steps using SGD and L2 regularized loss function (with λ=0.005). As can be seen below, the weights learnt are gradually capturing the different features of the letters at the corresponding output neurons.

As can be seen, the test accuracy also gets improved to 88.3% with L2 regularized loss function (with λ=0.005).

The following results show the accuracy and the weights learnt for couple of different values of λ (0.01 and 0.05 respectively). As can be seen, with higher values of λ, the logistic regression model tends to underfit and test accuracy decreases.

### L2 Regularized Neural Network with SGD

The following figure recapitulates the neural network with a single hidden layer with 1024 nodes, with relu intermediate outputs. The L2 regularizations applied on the loss function for the weights learnt at the input and the hidden layers are λ1 and λ2, respectively.

The next figures visualize the weights learnt for 225 randomly selected input neurons (out of 28×28) at different steps using SGD and L2 regularized loss function (with λλ0.01). As can be seen below, the weights learnt are gradually capturing (as the SGD steps increase) the different features of the letters at the corresponding output neurons.
If the regularization parameters used are λ1=λ2=0.005, the test accuracy increases to 91.1%, as shown below.

### Overfitting

Let’s demonstrate an extreme case of overfitting. Restrict your training data to just a few batches. What happens?

Let’s restrict the training data to n=5 batches. The following figures show how it increases the training accuracy to 100% (along with a visualization of a few randomly-picked input weights learnt) and decreases the validation and the test accuracy, since the model is not generalized enough.

### Dropouts

Introduce Dropout on the hidden layer of the neural network. Remember: Dropout should only be introduced during training, not evaluation, otherwise our evaluation results would be stochastic as well. TensorFlow provides nn.dropout() for that, but you have to make sure it’s only inserted during training.

What happens to our extreme overfitting case?

As we can see from the below results, introducing a dropout rate of 0.4 increases the validation and test accuracy by reducing the overfitting.

Till this point the highest accuracy on the test dataset using a single hidden-layer neural network is 91.1%. More hidden layers can be used / some other techniques (e.g., exponential decay in learning rate can be used) to improve the accuracy obtained (to be continued…).

# Some NLP: Probabilistic Context Free Grammar (PCFG) and CKY Parsing in Python

This problem appeared as an assignment in the coursera course Natural Language Processing (by Stanford) in 2012. The following description of the problem is taken directly from the assignment description.

In this article, a probabilistic parser will be built by implementing the CKY parser. The Manually Annotated Sub-Corpus (MASC) from the American National Corpus (ANC): http://www.anc.org/MASC/Home.html will be used for this purpose.

## Instructions

First, we need to learn a PCFG from the training trees. Since the training set is handparsed this learning is very easy. We need to simply set:

where C(N_jζ is the count observed for that production in the data set. While we could consider smoothing rule rewrite probabilities, it is sufficient to just work with un-smoothed MLE probabilities for rules. (Doing anything else makes things rather more complex and slow, since every rewrite will have a nonzero probability, so let’s get things working with an un-smoothed grammar before considering adding smoothing!).

At the very beginning, all the train and the test trees are read in. The training trees are going to be used to construct a PCFG parser, by learning the PCFG grammar. The parser is then used to predict trees for the sentences in the test set. For each test sentence, the parse given by the parser is evaluated by comparing the constituents it generates with the constituents in the hand-parsed version. From this, precision, recall and the F1 score are calculated.

There are the following basic data structures:

• ling.Tree: CFG tree structures (ling.Trees.PennTreeRenderer)

• Lexicon: Pre-terminal productions and probabilities

• Grammar, UnaryRule, BinaryRule: CFG rules and accessors

Most parsers require grammars in which the rules are at most binary branching. Hence, first we need to binarize the trees and then construct a Grammar out of them using MLE.

The first job is to build a CKY parser using this PCFG grammar learnt. Scan through a few of the training trees in the MASC dataset to get a sense of the range of inputs. Something worth noticing is that the grammar has relatively few non-terminal symbols but thousands of rules, many ternary-branching or longer. Currently there are 38 MASC train files and 11 test files.

Once we have a parser working on the treebank, the next task is improve upon the supplied grammar by adding 1st / 2nd-order vertical markovization. This means using parent annotation symbols like NP^S to indicate a subject noun phrase instead of just NP.

## The Dynamic Programming Algorithm (CKY) for Parsing

The following figure shows the CKY algorithm to compute the best possible (most probable) parse tree for a sentence using the PCFG grammar learnt from the training dataset.

The following animation (prepared from the lecture slides of the same course) shows how the chart for CKY is constructed using dynamic programming for  a small set of PCFG grammar.

## Evaulation

For this assignment we will use your average F1 score to evaluate the correctness of the CKY parser, although in essence we ought to know from the output on the development set (devtest) whether the parser is implemented correctly. The following figure shows an example evaluation:

## Results

(1) First let’s use a toy minimal training dataset containing just 3 POS-tagged trees, and a dev/test dataset with a single test sentence (with ground-truth), to start with. The following figure shows all the training trees.  There are just enough productions in the training set for the test sentence to have an ambiguity due to PP-attachment.

The following figure shows the PCFG learnt from these training trees. Now let’s try to parse a single test sentence‘cats scratch walls with claws’ with the CKY parser and using the PCFG grammar learnt. The following figure shows the manual (gold) parse tree and the best (most probable) parse tree using the CKY dynamic programming algorithm.

(2) Now, let’s use a much larger training dataeset MASC  (with a total of 3595
annotated training trees), a few of them are shown in the next figure.

Let’s use all these 3595  POS-tagged training trees to learn the PCFG grammar.

• There are ~10k of  lexicon rules producing terminals (with non-zero probabilities) are learnt, some of them are shown below:
• There are ~4.5k binary rules (with non-zero probabilitiesproducing terminals are learnt, some of them are shown below:
• There are ~3.5k unary rules (with non-zero probabilitiesproducing terminals are learnt, some of them are shown below:

Then let’s evaluate/compare the best parse trees obtained (guessed) with CKY for a few testsentences from the dev/test dataset using the PCFG learnt, with the manually (gold) parsed test trees (there are 355 of them) using precision, recall and F1 score. A few of the test sentence parses are shown in the following figures.

## Vertical Markovization

The independence assumptions of a PCFG are ofen too strong. In order to indicate the dependency on the parent non-terminal in a tree the grammar rules can be re-written depending on past k ancestor nodes, denoting the order-k vertical Markovization, as explained in the following figure, which often increases the accuracy of the parse trees.

There are ~14k of  lexicon rules producing terminals (with non-zero probabilities) are learnt, ~6k binary rules and ~5k  unary rules are learnt, some of them are shown below:

The following figure shows the best parse trees obtained with CKY for a sentence using PCFG learnt with and without vertical Markovization from the minimal dataset.

Similarly, using the MASC dataset, as can be seen for the following particular test sentence, the CKY parser performs much better with the PCFG learnt from the Vertically Markovized of the training trees:

A few more parse trees guessed by the CKY using the PCFG  learnt from the vertically Markovized training trees are shown below:

The markdown file can be found here.

# Some more Image Processing: Ostu’s Method, Hough Transform and Motion-based Segmentation with Python and OpenCV

Some of the following problems appeared in the lectures and the exercises in the coursera course Image Processing (by NorthWestern University). Some of the following descriptions of the problems are taken from the exercise’s description.

## 1. Ostu’s method for automatic thresholding to get binary images

We need to find a thershold to binarize an image, by separating the background from the foreground. Using Ostu’s method we can automatically find the global optimal threshold, by maximizing the between-class variance. The following figure shows the outline for the technique.

The following figures show the thresholds computed and the output binary images obtained by using Ostu’s method on a few gray-scale low-contrast images:

The following figures show the output binary images obtained by using Ostu’s method on a few gray-scale high-contrast images obtained using histogram equalization:

The following figures show the distribution of the between-class variance for different images and the threshold chosen by Ostu’s method:

## 2. Hough Transform to find lines in images

The Hough transform can be used to find lines in an image by taking votes from each of the pixels for all possible orientations of lines, for different values of the parameters in a straight line’s equation. The following figure describes the method.

The following figures show a couple of binary images and the votes obtained by the image pixels in the parametric space of any straight line and how thresholding on the votes we can find the most prominent line(s) in the image (markd by green). The parameter values marked with arrow(s) represent the most prominent line(s) in the image.

The experiment  on the following image is inspired by the youtube video How Hough Transform works (by Thales Sehn Korting) which can be found here. Again, by thresholding on the votes by the pixels for the differrent values of the straight line parameters, we can find the most prominent line(s) in the image (markd by green). The parameter values marked with arrow(s) represent the most prominent line(s) in the image.

The next figure shows all the steps in how the Hough transform can be used on any (gray-level) image to find lines (edges). As expected, the more voting threshold is increased, the lesser lines / edges are detected (since the more prominent ones will be detected by the Hough transform, they are colored red).

## 3. Motion based Segmentation using Accumulative Difference Image

In this problem we basically want to separate the moving objects in consecutive frames of a video from the non-moving objects. The following figures show the problem statement that appeared in an assignment in the same course, along with the theory to be used:

The following animation shows the motion of a moving rectangle.

The next figure shows how the motion-based  segmentation using ADI-based techniques can be applied to separate out the moving rectangle from the static background.

Again, the next animations show how the moving objects can be segmented from the non-moving ones from the consecutive frames of a video.

First the frames from the video are binarized using Ostu’s method, as shown below.

Then absolute ADI is applied to separate out the moving objects (here the students), as shown below:

The next video is from some past cricket match with Sachin Tendulkar batting (taken from youtube) and the following one is the motion-segmented video with ADI:

The next video is captured at a  live performance of a dance-drama written by Tagore and the following one is the motion-segmented video with ADI:

The markdown file can be found here.

# Some Image Processing, Information and Coding Theory with Python

Some of the following problems appeared in the exercises in the coursera course Image Processing (by Northwestern University). The following descriptions of the problems are taken directly from the assignment’s description.

## 1. Some Information and Coding Theory

### Computing the Entropy of an Image

The next figure shows the problem statement. Although it was originally implemented in MATLAB, in this article a python implementation is going to be described.

### Histogram Equalization and Entropy

Histogram equalization is a well-known image transformation technique to imrpove the contrast of an image. The following figure shows the theory for the technique: each pixel is transformed by the CDF of the image and as can be shown, the output image is expected to follow an uniform distribution (and thereby with the highest entropy) over the pixel intensities (as proved), considering the continuous pixel density. But since the pixel values are discrete (integers), the result obtained is going to be near-uniform.

The following figures show a few images and the corresponding equalized images and how the PMF and CDF changes.

### Image Data Compression with Huffman and Lempel Ziv (LZ78) Coding

Now let’s implement the following couple of compression techniques:

1. Huffman Coding
2. Lempel-Ziv Coding (LZ78)

and compress a few images and their histogram equalized versions and compare the entropies.

The following figure shows the theory and the algorithms to be implemented for these two source coding techniques:

Let’s now implement the Huffman Coding algorithm to compress the data for a few gray-scale images.

The following figures show how the tree is getting constructed with the Huffman Coding algorithm (the starting, final and a few intermediate steps) for the following low-contrast image beans. Here we have alphabets from the set {0,1,…,255}, only the pixels present in the image are used. It takes 44 steps to construct the tree.

The following table shows the Huffman Codes for different pixel values for the above low-contrast image beans.

index pixel code
15 108 0
3 115 1
20 109 10
1 112 11
32 107 100
44 110 101
18 96 110
11 95 111
21 120 1000
35 91 1001
6 116 1010
24 123 1011
2 98 1100
13 100 1101
17 94 1111
12 119 10000
4 114 10001
42 103 10011
16 106 10100
37 132 10101
41 130 10111
5 117 11000
26 125 11010
40 99 11011
38 118 11100
0 113 11101
22 121 11111
39 131 101011
27 127 101111
31 102 110011
25 124 110101
23 122 110111
33 104 111011
34 105 111111
29 129 1001111
30 93 1010101
7 137 1010111
14 255 1101011
28 128 1101111
36 133 11010111
10 134 11101011
9 135 101010111
43 90 1001010111
8 136 10001010111
19 89 100001010111
Let’s now repeat the Huffman-tree construction for the following histogram-equalized image beans. The goal is:
1. First construct the tree for the equalized image.
2. Use the tree to encode the image data and then compare the compression ratio with the one obtained using the same algorithm with the low-contrast image.
3. Find if the histogram equalization increases / reduces the compression ratio or equivalently the entropy of the image.

The following figures show how the tree is getting constructed with the Huffman Coding algorithm (the starting, final and a few intermediate steps) for the image beans. Here we have alphabets from the set {0,1,…,255}, only the pixels present in the image are used. It takes 40 steps to construct the tree, also as can be seen from the following figures the tree constructed is structurally different from the one constructed on the low-contract version of the same image.

The following table shows the Huffman Codes for different pixel values for the above high-contrast image beans.

index pixel code
13 119 0
11 250 1
5 133 10
0 150 11
8 113 100
32 166 101
36 60 110
40 30 111
25 208 1000
38 16 1001
21 181 1010
33 224 1011
9 67 1100
37 80 1101
27 244 1111
23 201 10000
17 174 10001
6 89 10011
34 106 10100
24 141 10101
28 246 10111
22 188 11000
10 235 11010
30 71 11011
2 195 11100
3 158 11101
1 214 11111
31 228 100001
15 84 101011
12 251 101111
39 18 110011
4 219 110111
14 93 111011
26 99 111111
19 253 1000001
7 238 1001111
18 21 1010111
29 241 1101111
35 248 1110011
20 0 10101111
16 255 110101111

The following figure shows the compression ratio for different images using Huffman and LZ78 codes, both on the low-contrast and high contrast images (obtained using histogram equalization). The following observations can be drawn from the comparative results shown in the following figures (here H represents Huffman and L represents LZ78):

1. The entropy of an image stays almost the same after histogram equalization
2. The compression ratio with Huffman / LZ78 also stays almost the same with an image with / without histogram equalization.
3. The Huffman codes achieves higher compression in some cases than LZ78.

The following shows the first few symbol/code pairs for the dictionary obtained with LZ78 algorithm, with the alphabet set as {0,1,..,255} for the low-contrast beans image (there are around 18k codes in the dictionary for this image):

 index symbol (pixels) code
7081 0 0
10325 1 1
1144 2 10
7689 4 100
8286 8 1000
7401 16 10000
8695 32 100000
10664 64 1000000
5926 128 10000000
7047 128,128 1000000010000000
1608 128,128,128 100000001000000010000000
12399 128,128,128,128 10000000100000001000000010000000
3224 128,128,128,128,128 1000000010000000100000001000000010000000
3225 128,128,128,128,129 1000000010000000100000001000000010000001
3231 128,128,128,128,127 100000001000000010000000100000001111111
12398 128,128,128,129 10000000100000001000000010000001
12401 128,128,128,123 1000000010000000100000001111011
12404 128,128,128,125 1000000010000000100000001111101
12403 128,128,128,127 1000000010000000100000001111111
1609 128,128,129 100000001000000010000001
8780 128,128,129,128 10000000100000001000000110000000
7620 128,128,130 100000001000000010000010
2960 128,128,130,125 1000000010000000100000101111101
2961 128,128,130,127 1000000010000000100000101111111
8063 128,128,118 10000000100000001110110
1605 128,128,121 10000000100000001111001
6678 128,128,121,123 100000001000000011110011111011
1606 128,128,122 10000000100000001111010
1601 128,128,124 10000000100000001111100
1602 128,128,125 10000000100000001111101
4680 127,127,129 1111111111111110000001
2655 127,127,130 1111111111111110000010
1254 127,127,130,128 111111111111111000001010000000
7277 127,127,130,129 111111111111111000001010000001
9087 127,127,120 111111111111111111000
9088 127,127,122 111111111111111111010
595 127,127,122,121 1111111111111111110101111001
9089 127,127,123 111111111111111111011
9090 127,127,124 111111111111111111100
507 127,127,124,120 1111111111111111111001111000
508 127,127,124,125 1111111111111111111001111101
9091 127,127,125 111111111111111111101
9293 127,127,125,129 11111111111111111110110000001
2838 127,127,125,132 11111111111111111110110000100
1873 127,127,125,116 1111111111111111111011110100
9285 127,127,125,120 1111111111111111111011111000
9284 127,127,125,125 1111111111111111111011111101
10394 127,127,125,125,131 111111111111111111101111110110000011
4357 127,127,125,125,123 11111111111111111110111111011111011
4358 127,127,125,125,125 11111111111111111110111111011111101
6615 127,127,125,125,125,124 111111111111111111101111110111111011111100
9283 127,127,125,127 1111111111111111111011111111
9092 127,127,127 111111111111111111111
1003 127,127,127,129 11111111111111111111110000001
7031 127,127,127,130 11111111111111111111110000010
1008 127,127,127,121 1111111111111111111111111001
1009 127,127,127,122 1111111111111111111111111010
1005 127,127,127,125 1111111111111111111111111101
2536 127,127,127,125,125 11111111111111111111111111011111101
5859 127,127,127,127 1111111111111111111111111111

12454 rows × 2 columns

The next table shows the dictionary obtained for the high-contrast beans image using LZ78 algorithm:

 index symbol (pixels) code
7003 0 0
11341 0,0 00
5300 0,0,16 0010000
5174 0,0,16,16 001000010000
10221 0,0,16,16,16 00100001000010000
10350 0,16 010000
9145 0,16,0 0100000
1748 0,16,0,0 01000000
12021 0,16,16 01000010000
7706 0,16,16,16 0100001000010000
6343 0,16,16,16,16 010000100001000010000
12355 0,16,16,16,16,18 01000010000100001000010010
6346 0,16,16,16,18 010000100001000010010
12019 0,16,18 01000010010
8496 0,16,18,18 0100001001010010
3459 0,67 01000011
8761 0,67,119 010000111110111
10345 0,18 010010
12295 0,18,80 0100101010000
5934 0,255 011111111
1356 0,255,214 01111111111010110
10272 1 1
1108 2 10
7611 4 100
8187 8 1000
7323 16 10000
9988 16,0 100000
8625 32 100000
10572 64 1000000
2806 16,0,0 1000000
7284 255,224 1111111111100000
10857 255,113 111111111110001
7286 255,228 1111111111100100
9053 255,228,228 111111111110010011100100
7734 255,235 1111111111101011
10853 255,119 111111111110111
1315 255,238 1111111111101110
7105 255,238,238 111111111110111011101110
10212 255,238,238,238 11111111111011101110111011101110
11699 255,30 1111111111110
6120 255,60 11111111111100
6846 255,241 1111111111110001
10205 255,241,241 111111111111000111110001
11894 255,241,241,251 11111111111100011111000111111011
9428 255,60,60 11111111111100111100
7768 255,60,60,60 11111111111100111100111100
111 255,60,60,60,67 111111111111001111001111001000011
5613 255,60,60,60,30 1111111111110011110011110011110
8713 255,60,60,60,30,30 111111111111001111001111001111011110
11729 255,60,60,60,30,30,30 11111111111100111100111100111101111011110
6848 255,244 1111111111110100
6850 255,246 1111111111110110
6845 255,248 1111111111111000
9401 255,248,248 111111111111100011111000
880 255,250 1111111111111010
881 255,251 1111111111111011
3725 255,251,251 111111111111101111111011
11316 255,251,251,235 11111111111110111111101111101011
879 255,253 1111111111111101
7215 255,253,253 111111111111110111111101

12394 rows × 2 columns

## 2. Spatial Domain Watermarking: Embedding images into images

The following figure describes the basics of spatial domain digital watermarking:

The next figures show the watermark image and the grayscale host image to be used for embedding the watermark image inside the host image.

The next animations show the embedding of the most significant bit for each pixel of the watermark image inside the gray-scale host image at the upper-left corner at the bits 0,1,…,7 respectively, for each pixel in the host image. As expected, when embedded into a higher significant bit of the host image, the watermark becomes more prominent than when embedded into a lower significant bit.

The next figure shows how the distribution of  the pixel intensities change when embedding into different bits.

# Some Image and Video Processing: Motion Estimation with Block-Matching in Videos, Noisy and Motion-blurred Image Restoration with Inverse Filter in Python and OpenCV

The following problems appeared in the exercises in the coursera course Image Processing (by Northwestern University). The following descriptions of the problems are taken directly from the exercises’ descriptions.

## 1. Analysis of an Image quality after applying an nxn Low Pass Filter (LPF) for different n

The next figure shows the problem statement. Although it was originally implemented in MATLAB, in this article a python implementation is going to be described.

The following figure shows how the images get more and more blurred after the application of the nxn LPF as n increases.

The following figure shows how the quality of the transformed image decreases when compared to the original image, when an nxn LPF is applied and how the quality (measured in terms of PSNR) degrades as n (LPF kernel width) increases.

## 2. Changing Resolution of an Image with Down/Up-Sampling

The following figure describes the problem:

The following steps are needed to be followed:

1. Smooth the original image with a 3×3 LPF (box1) kernel.
2. Downsample (choose pixels corresponding to every odd rows and columns)
3. Upsample the image (double the width and height)
4. Use the kernel and use it for convolution with the upsampled image to obtain the final image.

Although in the original implementation MATLAB was used, but the following results come from a python implementation.

As we go on increasing the kernel size, the quality fo the final image obtained by down/up sampling the original image decreases as n increases, as shown in the following figure.

## 3. Motion Estimation in Videos using Block matching between consecutive video frames

The following figure describes the problem:

For example we are provided with the input image with known location of an object (face marked with a rectangle) as shown below.

Now we are provided with another image which is the next frame extracted from the same video but with the face unmarked) as shown below. The problem is that we have to locate the face in this next frame and mark it using simple block matching technique (and thereby estimate the motion).

As shown below, using just the simple block matching, we can mark the face in the very next frame.

Now let’s play with the following two videos. The first one is the video of some students working on a university corridor, as shown below (obtained from youtube), extract some consecutive frames, mark a face in one image and use that image to mark all thew faces om the remaining frames that are consecutive to each other, thereby mark the entire video and estimate the motion using the simple block matching technique only.

The following figure shows the frame with the face marked, now we shall use this image and block matching technique to estimate the motion of the student in the video, by marking his face in all the consecutive frames and reconstructing the video, as shown below..

The second video is the video of the Google CEO Mr. Pichai talking, as shown below (obtained from youtube), again extract some consecutive frames, mark his face in one image and use that image to mark all the faces in the remaining frames that are consecutive to each other, thereby mark the entire video and estimate the motion using the simple block matching technique only.

The following figure shows the frame with the face marked, now we shall use this image and block matching technique to estimate the motion of the Google CEO in the video, by marking his face in all the consecutive frames and reconstructing the video, as shown below.
The most interesting thing above is that we have not used any sophisticated image features such as HOG / SIFT, still it did a pretty descent job with simple block matching.

## 4. Using Median Fiter to remove salt and paper noise from an image

The following figure describes the problem:

The following figure shows the original image, the noisy image and images obtained after applying the median filter of different sizes (nxn, for different values of n):

As can be seen from the following figure, the optimal median filter size is 5×5, which generates the highest quality output, when compared to the original image.

## Using Inverse Filter to Restore noisy images with Motion Blurs

The following figure shows the description of the problem:

The following figure shows the theory behind the inverse filters in the (frequency) spectral domain.

The following are the steps for the restoring an image with motion blurs  and noise-corruption using the Inverse Filter:

1. Generate restoration filter in the frequency domain (with Fast Fourier Transform) from frequency response of motion blur and using the threshold T.
2. Get the spectrum of blurred and noisy-corrupted image (the input to restoration).
3. Compute spectrum of restored image by convolving the restoration filter with the blurred noisy image in the frequency domain.
4. Genrate the restored image from its spectrum (with inverse Fourier Transform).

The following figures show the inverse filter is applied with different threshold value T (starting from 0.1 to 0.9 in that order) to restore the noisy / blurred image.

The following figure shows how the Restored PSNR and the ISNR varies with different values if the threshold T.

# Some more Computational Photography: Creating Video Textures in Python and OpenCV

The following problem appeared as an assignment in the Coursera Course Computational Photography (by Georgia Tech, 2013). The following description of the problem is taken directly from the assignment’s description.

## Introduction

In this article, we shall be applying our computational photography magics to video, with the purpose of creating video textures, or infinitely looping pieces of video. The input and output for the homework assignment is provided as a folder of images. The reasoning behind this, and suggestions for moving between different formats of video are given in the video Appendix in section 4.

## Video Volume and SSD

Let’s get more familiar with the 4d coordinate system of a video volume (time x row x column x channel). To compute SSD, we need to find an image distance between every pair of frames in the video. SSD stands for sum of square distances, which as the name suggests, requires to take the pixelwise difference of the two frames, square it, and sum them all together.

## Diff2

This function takes in a difference matrix created by ssd, and updates it with dynamic information. The intuition behind this is as follows when considering the transition cost from frame i to j, we should not only look at the frames themselves, but also consider the preceding and following frames. So, we have:

Or, in other words, we are going to take a weighting function, and sum across the ssd outputs of the preceding and following frames in order to update the transition costs with dynamic information. We are going to use binomial filter for this purpose. The following figure shows the basics of a binomial filter:

## Find the biggest loop and synthesize loop

### find the biggest loop

Now that we have the costs of transitioning from one frame to another, we should find a suitable loop for our video texture. Simply taking the smallest transition distance here might not be desirable what if the resulting loop is only 1 frame long? In order to state within the code that loop size matters, we will use a trick that is frequent in engineering disciplines. We are going to find the transition which is optimal under a metric: score(s, f) = α * (f − s) − diff2[f, s].

Note the two terms of the metric. The first is the difference between the final and starting frame of our loop. This term is large when the loop is large. The second term is the output of our diff2 function, which tells us the cost of transitioning from finish to start. Subtracting this term turns it into a ‘smoothness’ parameter. It is larger when the transition is less noticeable.

The last bit of wizardry is the alpha parameter. Because the size of the loop and the transition cost are likely to be in very different units, we introduce a new parameter to make them comparable. We can manipulate alpha to control the tradeoff between loop size and smoothness. Large alphas prefer large loop sizes, and small alphas prefer smoother transitions. The find biggest loop function has to compute this score for every choice of s and f, and return the s and f that correspond to the largest score.

### synthesize loop

The finishing step is to take our video volume, and turn it back into a series of images, now cropping it to only contain the loop we found. Let’s implement this function does just that. It is pretty much the inverse of the video volume function implemented earlier, except for this time we are starting with a full video volume, and returning a list of only the image frames between start and finish (inclusive).

The following figure explains how we are going to find the video texture from an input candle video with 90 frames.

## Input Video

Let’s use the following candle video as the input video and extract 100 frames from the video. As can be seen that there are some jumps from the end to the beginning of the video and our aim is to remove the jump and create a smooth video texture that can be played infinitely many times without any jump.

The following figure shows the 100×100 distance matrix (diff1) computed in between any two video frames:

The following figure shows the distance matrix (diff2) computed after taking into account the transition costs (weights) and distances of the adjacent frames in a window when computing the distance between any two frames.
Finally, the following figure shows the distance matrix (diff3) computed after taking into account the tradeoff in between the length of the video texture and the smoothness of transition between the start and the end frame (with the controlling parameter alpha=90).

## Output Video Texture

The following video shows the output video texture produced (by choosing the start and end video frames i and j respectively that give the best score according to the above matrix), as can be seen the video now contains no jump and it’s pretty smooth.

The markdown file can be found here.

# Some more Computational Photography: Merging and Blending Images using Gaussian and Laplacian Pyramids in Python and OpenCV

The following problem appeared as an assignment in the coursera course Computational Photography (by Georgia Tech, 2013). The following description of the problem is taken directly from the assignment’s description.

## Introduction

In this article, we shall be putting together a pyramid blending pipeline that will allow us to blend any two images with a mask. The blending procedure takes the two images and the mask, and splits them into their red, green and blue channels. It then blends each channel separately.

Then a laplacian pyramid will be constructed for the two images. It will then scale the mask image to the range [0,1] and construct a gaussian pyramid for it. Finally, it will blend the two pyramids and collapse them down to the output image.

Pixel values of 255 in the mask image are scaled to the value 1 in the mask pyramid, and assign the strongest weight to the image labeled ‘white’ during blending. Pixel values of 0 in the mask image are scaled to the value 0 in the mask pyramid, and assign the strongest weight to the image labeled ‘black’ during blending.

The following figures describe the pyramid blending process with couple of sample images.

## Reduce

This function takes an image and subsamples it down to a quarter of the size (dividing the height and width by two). Before we subsample the image, however, we need to first smooth out the image.

The following 5×5 generating kernel will be used for convolution, for the reduce and later for the expand function.

## Expand

This function takes an image and supersamples it to four times the size (multiplying the height and width by two). After increasing the size, we have to interpolate the missing values by running over it with a smoothing filter.

## Gaussian Pyramid

This function takes an image and builds a pyramid out of it. The first layer of this pyramid is the original image, and each subsequent layer of the pyramid is the reduced form of the previous layer. Within the code, these pyramids are represented as lists of arrays, so pyramid = [layer0, layer1, layer2, …]. The reduce function implemented in the previous part needs to be used in order to implement this function.

## Laplacian Pyramid

This function takes a gaussian pyramid constructed by the previous function, and turns it into a laplacian pyramid. Like with gaussian pyramids, laplacian pyramids are represented as lists and the expand function implemented in the previous part needs to be used in order to implement this function.

## Blend

In this part, the pipeline will be set up by implementing the actual blend function, and then implementing a collapse function that will allow us to disassemble our laplacian pyramid into an output image.

Blend function takes three pyramids:

• white – a laplacian pyramid of an image
• black – a laplacian pyramid of another image

It should perform an alphablend of the two laplacian pyramids according to the mask pyramid. So, we need to blend each pair of layers together using the mask of that layer as the weight. Pixels where the mask is equal to 1 should be taken from the white image, pixels where the mask is 0 should be taken from the black image. Pixels with value 0.5 in the mask should be an equal blend of the white and black images, etc.

## Collapse

This function is given a laplacian pyramid, and is expected to ‘flatten’ it to an image. We need to take the top layer, expand it, and then add it to the next layer. This results in a pyramid of one level less than we started with. We continue this process until we are left with only a single layer.

## Results (some images are taken from the same Computational Photography Course)

The following figures show a few pairs of input images, the masks, the blended output images, along with the gaussian and laplacian pyramids.

|

### Blended Output Image

The next example shows how the pyramid pipeline can be used to edit an image background.

### Blended Output Image

As can be seen from the output image above, the pyramid blending pipleline did a good job in blending the two images.

### Blended Output Image

The markdown file can be found here.

# Some Image Processing and Computational Photography: Convolution, Filtering and Edge Detection with Python and OpenCV

The following problems appeared as an assignment in the coursera course Computational Photography (by Georgia Institute of Technology). The following descriptions of the problems are taken directly from the assignment’s descriptions.

## Introduction

In this article, we shall be playing around with images, filters, and convolution. We will begin by building a function that performs convolution. We will then experiment with constructing and applying a variety of filters. Finally, we will see one example of how these ideas can be used to create interesting effects in our photos, by finding and coloring edges in our images.

## Filtering

In this section, let’s apply a few filters to some images. Filtering basically means replacing each pixel of an image by the linear combination of its neighbors. We need to understand the following concepts in this context:

1. Kernel (mask) for a filter: defines which neighbors to be considered and what weights are to be given.
2. Cross-Correleation vs. Convolution: determines how the kernel is going to be applied on the neighboring pixels to compute the linear combination.

### Convolution

The following figure describes the basic concepts of cross-correlation and convolution. Basically convolution flips the kernel before applying it to an image.

## Cross-Correlation vs. Convolution

The following figures show a custom kernel applied on an impulse response image changes the image both with cross-correlation and convolution.

The impulse response image is shown below:

The below figure shows a 3×3 custom kernel to be applied on the above impulse response image.

The following figure shows the output images after applying cross-correlation and convolution with the above kernel on the above impulse response image. As can be seen, convolution produces the desired output.

The following figures show the application of the same kernel on some grayscale images and the output images after convolution.

Now let’s apply the following 5×5 flat box2 kernel shown below.

The following figures show the application of the box kernel above on some grayscale images and the output images after convolution. on some images and notice how the output images are blurred.

The following figures show the application of the box kernels of different size on the grayscale image lena and the output images after convolution. As expected, the blur effect increases as the size of the box kernel increases.

## Gaussian Filter

The following figure shows a 11×11 Gaussian Kernel generated by taking outer product of the densities of two 1D i.i.d. Gaussians with mean 0 and s.d. 3.

Here is how the impulse response image (enlarged) looks like after the application of the above Gaussian Filter.

The next figure shows the effect of Gaussian filtering / smoothing (blur) on some images.

The following figure shows the 11×11 Gaussian Kernels generated with 1D i.i.d. Gaussians with different bandwidths (s.d. values).

The following figures show the application of the Gaussian kernels of different bandwidth on the following grayscale image and the output images after filtering. As expected, the blur effect increases as the bandwidth of the Gaussian kernel increases.

## Sharpen Filter

The following figure shows a 11×11 Sharpen Kernel generated by subtracting a gaussian kernel (with s.d. 3) from a scaled impulse response kernel with 2 at center.

The next figure shows the effect of Sharpen flitering on some images.

The following figure shows the 11×11 Sharpen Kernels with different bandwidths (s.d. values).
The following figures show the application of the Sharpen kernels of different bandwidth on the following grayscale image and the output images after filtering. As expected, the sharpen effect increases as the bandwidth of the Gaussian kernel increases.

## Median filter

One last thing we shall do to get a feel for is nonlinear filtering. So far, we have been doing everything by multiplying the input image pixels by various coefficients and summing the results together. A median filter works in a very different way, by simply choosing a single value from the surrounding patch in the image.

The next figure shows the effect of Median flitering on some images. As expected, with a 11×11 mask, some of the images are getting quite blurred.

## Drawing with edges

At a high level, we will be finding the intensity and orientation of the edges present in the image using convolutional filters, and then visualizing this information in an aesthetically pleasing way.

The first thing it does is to find the gradient of the image. A gradient is a fancy way of saying “rate of change”. The following figure shows the basic concepts about image gradients.

### Edge Detection

In order to find the edges in our image, we are going to look for places where pixels are rapidly changing in intensity. The following figure shows the concept:

### The Sobel Filter

There are a variety of ways for doing this, and one of the most standard is through the use of Sobel filters, which have the following kernels:

If we think about the x sobel filter being placed on a strong vertical edge:

Considering the yellow location, the values on the right side of the kernel get mapped to brighter, and thus larger, values. The values on the left side of the kernel get mapped to darker pixels which are close to zero. The response in this position will be large and positive.

Compare this with the application of the kernel to a relatively flat area, like the blue location. The values on both sides are about equal, so we end up with a response that is close to zero. Thus, the x-direction sobel filter gives a strong response for vertical edges. Similarly, the y-direction sobel filter gives a strong response for horizontal edges.

The steps for edge detection:

1. Convert the image to grayscale.
2. Blur the image with a gaussian kernel. The purpose of this is to remove noise from the image, so that we find responses only to significant edges, instead of small local changes that might be caused by our sensor or other factors.
3. Apply the two sobel filters to the image.

### Edge orientations and magnitudes

Now we have the rate at which the image is changing in the x and y directions, but it makes more sense to talk about images in terms of edges, and their orientations and intensities. As we will see, we can use some trigonometry to transform between these two representations.

First, let’s see how our sobel filters respond to edges at different orientations from the following figures:

The red arrow on the right shows the relative intensities of the response from the x and y sobel filter. We see that as we rotate the edge, the intensity of the response slowly shifts from the x to the y direction. We can also consider what would happen if the edge got less intense:

Let’s apply the sobel filters on the following butterfly image (taken from the slides of the same computational photography course).

The following figures show the result of applying the sobel filters on the above butterfly image. Notice that the angle of the edges show the direction (in red) for a few edge vectors for which the magnitude of the edge vectors were high (F is the image).

Again, let’s apply the sobel filters on the following tiger image (taken from the slides of the same computational photography course).

The following figures show the result of applying the sobel filters on the above tiger image. Notice that the angle of the edges show the direction (in red) for a few edge vectors for which the magnitude of the edge vectors were high.

Next, let’s apply the sobel filters on the following zebra image (taken from the slides of the same computational photography course).

The following figures show the result of applying the sobel filters on the above zebra image. Notice that the angle of the edges show the direction (in red) for a few edge vectors for which the magnitude of the edge vectors were high.

Finally, let’s apply the sobel filters on the following image of mine.

The following figures show the result of applying the sobel filters on my image. Notice that the angle of the edges show the direction (in red) for a few edge vectors for which the magnitude of the edge vectors were high.

The following figure shows the horizontal (H_x) and vertical (H_y) kernels for a few more filters such as Prewitt and Roberts along with Sobel.

### The Prewitt Filter

The following figures show the result of applying the prewitt filters on the butterfly image. Notice that the angle of the edges show the direction (in red) for a few edge vectors for which the magnitude of the edge vectors were high (F is the image).

### The Roberts Filter

The following figures show the result of applying the roberts filters on the butterfly image. Notice that the angle of the edges show the direction (in red) for a few edge vectors for which the magnitude of the edge vectors were high (F is the image).

### The LOG Filter

The following figure shows the Gaussian Kernels with different bandwidths, a Sharpen Kernel, along with the LOG (Laplacian of Gaussian Kernel), a very useful kernel for edge detection in images and DOG kernel.

The following figures show the results of applying the LOG filter on the above images.

### The DOG Filter

Difference of Gaussian (DOG)
Kernel can also be used to find edges. The following figure shows the results of application of DOG (with sd 2, 5) on the above images:

### The Canny Edge Detector

The following figure shows the Canny Edge Detection algorithm:

The following figures show the results of applying the Canny Edge Detection algorithm on the above images (with the intensity thresholds as 50 and 100 respectively).

### Mapping to color

We now have the magnitude and orientation of the edge present at every pixel. However, this is still not really in a form that we can examine visually. The angle varies from 0 to 180, and we don’t know the scale of the magnitude. What we have to do is to figure out a way to transform this information to some sort of color values to place in each pixel.

The edge orientations can be pooled into four bins edges that are roughly horizontal, edges that are roughly vertical, and edges at 45 degrees to the left and to the right. Each of these bins are assigned a color (vertical is yellow, etc). Then the magnitude is used to dictate the intensity of the edge. For example a roughly vertical edge of moderate intensity would be set to a medium yellow color, or the value (0, 100, 100).

The following figures show some of the edges extracted from some of the images previously used as inputs and color-mapped using sobel filter:

The markdown file can be found here: https://github.com/sandipan/Blogs/blob/master/comp_photo.md