Statistics are used by all but understood by few. In fact, studies have shown that 94% of people have little to no understanding of statistical methods. OK, that last statistic was made up; I wrote it to make a point though. I could post something like that on this blog and people would believe it and possibly even repeat it. The sad thing is that it probably isn’t that far from reality. In social science and neuroscience research we use statistics to understand data and support hypotheses. This post will serve as a statistical primer. I will not discuss how to calculate statistics, rather I will write about the underlying assumptions and theory of statistics. I will also discuss how to properly use and understand them (and hopefully avoid misusing them). I hope to help you become a more informed consumer of statistics.
When did we start using statistics and why?
Joel Best wrote a brief history of the use of statistics in his excellent book Damned Lies and Statistics: Untangling Numbers From the Media, Politicians, and Activists. [I urge everyone to read the book to be more informed about statistics. All quotes will be from the book. It provides only a superficial treatment of actual statistical methods – which he states is the case – but it provides a good theoretical background for being a critical thinker about statistics]. He states that statistics rose in popularity as governments and social activists wanted ways to track and “influence debates over social issues” (p.11). Early statistics were used almost exclusively for political purposes, especially to shape social and governmental policies. From the beginning, statistics were used for non-neutral purposes. They gave credibility to arguments.
One assumption that people erroneously make is that statistics are neutral and that they represent truth. They are useful for aggregating a lot of data but the problem is that most statistical methods are based on certain assumptions about the underlying data (e.g., that it is normally-distributed). However, many times researchers use certain statistical methods and make conclusions based on those data when the methods are not appropriate for the data. Even simple descriptive statistics (e.g., averages) can lead to people making erroneous conclusions.
People who create statistics all have a purpose for them. Researchers are all biased and have agendas. It just may be to get their research published or it might be for other ulterior reasons. Social activists use statistics to create social problems (see p. 14); they are not the cause of the “problem” but they try to raise awareness of it by turning it into a “problem” that we need to pay attention to and solve. This can often be a good thing but activists are using statistics to give credibility to their cause (e.g., “According to the World Health Organization, between 12 percent and 25 percent of women around the world have experienced sexual violence at some time in their lives.” source). Governments also use statistics to defend their position (e.g., “Crime rates decreased by XX% from last year. See! we are doing our job.”) and sometimes to counter the claims of activists.
The media pick up on statistics, on activists, because they present a new story and might even be controversial and controversy sells. Businesses also use statistics to promote their causes. Not everyone or entity will collect data in the same way either – one police station might have different criteria for counting an assault than another one has.
The author Best proposes three general questions to ask when seeing a statistic used.
- Who created this statistic?
- Why was this statistic created?
- How was this statistic created? (pp. 27-28).
Many times people don’t even know enough to ask those questions or to research the answers to those questions. After all, as Best points out, we are largely an innumerate society (this holds true world-wide). Innumeracy is the math equivalency of illiteracy. A majority of people are uncomfortable with even basic mathematics and completely oblivious to statistics. After all, mathematics is abstract and requires a lot of mental effort to use and understand. It is often not taught as well as it can and only reluctantly learned in school. Once out of school, people rarely need to use more than basic math and so they forget what they learned. The other problem that we have is that we accept math (and by extension, statistics) to be perfect and infallible (Gödel demonstrated in effect that this is not the case). Best describes this fallacy:
“We sometimes talk about statistics as though they are facts that simply exist, like rocks, completely independent of people, and that people gather statistics much as rock collectors pick up stones. That is wrong. All statistics are created through people’s actions: people have to decide what to count and how to count it, people have to do the counting and the other calculations, and people have to interpret the resulting statistics, to decide what the numbers mean. All statistics are social products, the results of people’s efforts” (p.27; emphasis added).
So what do you do when you view a news program on TV or read an article or hear an activist or politician quote a statistic? If it makes you go, “Wow!” then that is one sign you need to step back and really scrutinize it (which you should do even if it doesn’t surprise/scare/etc. you). If you agree with the point the show, article, or person is trying to make then you really need to step back and critique the statistic. This means you need to understand your biases. It is easy to only want to confirm our hypotheses and beliefs and ignore anything that might contradict them. This is generally adaptive to help us process a lot of information but it can be a problem when we don’t critically view statistics, especially when they are “bad statistics” (which you can never discover without critiquing them). When you view or read a statistic, that is the time to ask yourself those 3 questions Best proposed and go from there. You might discover something interesting.
I’ll post more on this subject later.
Reference
Best, J. (2001). Damned Lies and Statistics: Untangling Numbers From the Media, Politicians, and Activists. University of California Press, Berkeley and Los Angeles, CA.
The descriptive statistics that most people are exposed to are notoriously misunderstood. For example, election polls used to be presented just as a percentage for or against a candidate until Walter Cronkite (I believe, but it could have been another news anchor) started reporting the margin of error. But this still ignores the confidence interval of the poll. Honestly, I don’t think I have ever seen that reported with any poll, even though it is critical to interpreting its accuracy.
And don’t get me started on researchers using improper statistics. It seems that, in the life sciences, most authors, reviewers, and journal editors are unaware of how to treat non-parametric data. I regularly see papers with ordinal data that has been analyzed with an ANOVA and no one even bats an eyelash. I can’t tell you that last time I saw a Kruskall-Wallis or Mann-Whitney U test in a results section.
The confidence interval of most polls is a 95%CI. You can, in part, tell this to be the case from the MoE they report. The MoE, with 95%CI is roughly .98 divided by the square root of n. For most, simply take the square root of the sample size and hit that 1/x button on your calculator… close enough.
Did I just get pwned? Yes, I think so.
I must ammend to say that I read an article on Friday that included a Mann-Whitney U test. So, they do crop up every now and then.