It’s unclear who really said that “there are three kinds of lies: lies, damned lies, and statistics”. But the reality is that statistics are often misunderstood and misused.
In Statistics Done Wrong: The Woefully Complete Guide, author Alex Reinhart, a statistics instructor and PhD student at Carnegie Mellon University, makes the case that most people, even those in the sciences, are misusing statistics and quite frankly don’t know how to do statistics. It’s a bold claim; but this is a bold book, and a fascinating one at that.
The premise of the book is that scientific progress depends on good research, and good research needs good statistics. But statistical analysis is tricky to get right, even for very smart people. His conclusion is that many people, including scientists who should know better (including those who publish in world-class peer reviewed journals such as Nature and Science, are doing statistics wrong. In fact, very wrong.
And it’s not just that the people doing the statistics lack the knowledge. The book notes that there is huge pressure on medical professionals and scientists to get good results.
The books 12 chapters are an enjoyable and relatively easy read. The author tackles such subjects: pseudoreplication: Choose Your Data Wisely, the p Value and the Base Rate Fallacy, Model Abuse and more. The book assumes no prior knowledge of statistics, so it’s a good book for everyone, regardless of their p value.
An issue Reinhart makes a number of times is that even when people get their statistical numbers and figures seemingly right, they often misinterpret and misuse the output. An example he gives is for a traffic study on the safety of allowing rights turns on red lights. The Highway Commissioner who requested the study wrote about its findings that “we can discern no significant hazard to motorists or pedestrians from implementation of right turns on red”. To the untrained eye, the conclusion makes sense. But Reinhart writes that the Commissioner turned statistical insignificance into practical insignificance. This is significant mistake, which happens often, of which the consequences can be significant.
One does not have to have much of a background in statistics to enjoy the book, as Reinhart does a good job of keeping the scary statistical math to a minimum.
Chapter 12 concludes the book with the topic of what can be done? Reinhart notes that there are no easy answers and change will not be easy. The reality is that since most research articles have poor statistical power and researchers have freedom to choose among analysis methods to get favorable results, we are mathematically determined to get a plethora of false positives.
The book does not have a specific information security slant to it. The closest book in making statistics and data work within information security is the superb Data-Driven Security: Analysis, Visualization and Dashboards by Jay Jacobs and Bob Rudis reviewed here.
Statistics Done Wrong: The Woefully Complete Guide is a truly enjoyable book and will forever change the way you view statistics, research findings and those daily radio and TV notices which erroneously proclaim “recent studies indicate..”