Monday, September 30, 2013

Lies, Damned lies and Statistics

Statistics is the science of producing unreliable facts from reliable figures. - Evan Esar
Admission: I absolutely respect our RBI governor Raghuram Rajan.
Confession: I am not a statistician, but I have attended a course on Econometrics during my MBA. And, from what little I have learned there, I can tell you that you can torture your data to the point it say what you want it to say.
The recent report on composite state development index that was prepared under the chairmanship of Mr. Raghuram Rajan was, for the lack of better word, a half ass job. It is complete noise. It hurts when someone you admire produces a work like this.

I didn’t really understand what possible objective it served, though I am assured it is not a political one, although there was a brief twitter war on the issue of Gujarat being called a less developed state. You can read the full report here.
During my MBA classes, our professor always warned us about the situations when statistical exercise will throw out some outcome that might look nonsensical. You should in that case, take hard look at the variables used. You should check for double counting of data, high correlation fallacy and simply using wrong variables. Well, seems all three errors happened in this report.
Many experts in the field have already reviewed it well, some of them do not agree with it including the sole dissenting voice in the panel, Dr. Shaibal Gupta. You can read some of the  reviews and criticism here, here, here and here. Key criticism of the report are highlighted below:
·       Why to include SC/ST share in population when it is an independent variable and all other variables are dependent variables (outcome variables). Independent variable in this case means it is beyond the realm of any state government to control the number of SC/ST population in their state. So, why to assign points to state based on this variable.
·      Why connectivity index, SC/ST population, female literacy, education all get the same weightage, when we know one is more important than the other.
·       In the real world, can Gujarat be ranked as same as Mizoram and fare worse than Tripura? Maybe I am blind to the development happening in our Northeastern states, but I don’t know anyone wanting to go there to work and live.

·       Why did not use per capita income instead of monthly consumption when per capita income also factors in the employment opportunities available in the state. Migrants sending money to home state could simply drive consumption and may not truly reflect the needs of the residents.
·       How reliable is the source of monthly consumption data when the national surveys used to collect this data have credibility issues.

PS: There is no data available, atleast in the report, wherein you can compare how each state fared on different variables used in index construction.