Monday, July 06, 2015

Vyapam Case: Statistically Probable?

The Vyapam Scam seems to be the news these days.

One mystifying aspect of the scam is that approximately 35-40 deaths have been reported of  people involved with the scam , after it got reported and investigation started. The deaths involve  witnesses and accused lodged in various jails.  As per Wikipedia entry, I could count 33 people have died, starting from November 2009.

https://en.wikipedia.org/wiki/Vyapam_scam

Since the number of people under investigation in the scam is sufficiently large  ( reported to be more than two thousand), it is a fit case where statistical analysis can be applied with reasonable accuracy to see if 40 deaths are likely in a set of randomly selected 2000 individuals over a period of 5 years.

So first we need to know the probability of a person aged X, dying within a year. This is given by the 8 year rule, where the probability of any individual dying the same year increases by twice every eight years. The actual probability varies across populations but the 8-year doubling is universal. Here is a good article that explains the idea.

Am I Going To Die This Year? A Mathematical Puzzle

Since probability of dying or "mortality rate" is very important for life insurance companies, standardized tables exist for different countries. The official table for India as published by IRDA is given here

Indian Assured Lives Mortality

Assuming that the 2000 accused and witness are evenly distributed across age-groups of 21 to 61,  I divided the whole age span into 5 categories of 8 years each. Each of the categories I approximated with median age and took mortality figure from the actuarial table given above. Doing the calculation of number of deaths expected for 5 years from among 2000 randomly chosen individuals comes out to be 35 as shown in table below.

Expected Mortality for 2000 individuals across 5 years








So the total deaths expected in any randomly selected set of 2000 individuals across a span of 5 years is approximately 35 in India. This is assuming that their ages are evenly spread between of 21 and 61, which is a reasonable assumption since those who are too young and also those who are too old can be safely excluded from being "linked" to the scam.

Conclusion:
The only limited conclusion we can d is the number of deaths are pretty close to expected. There may still be some foul play at work and some of those people who died may have been murdered etc, but those would be to be a small fraction of the 35-40 dead in past 5 years. Hence, we cannot infer a conspiracy just from the number of dead as our prime-time media circus seems to be implying. In fact if fewer (say 5) or many more (say 60) of the accused were dead within past 5 years, it would have been an obvious statistical anomaly.

There of-course leaves the problem of high proportion of claimed accidental deaths (road-accidents etc) which also looks suspicious. That is also worth looking into ...