# Statistical Significance…And Insignificance

One of the most damning phrases in the scientific literature is “Not statistically significant.”  In a world where policy is announced by 140 character twits, the wordy “not statistically significant” readily becomes “not significant” and then “irrelevant.”  But “not statistically significant” has a very specific meaning…and only rarely does it mean “irrelevant.”

Formally, we have the following problem:  There is a mass of evidence, and two possible ways that evidence could have been produced, called the null hypothesis and the alternate hypothesis. Your mission, should you choose to accept it, is to decide which hypothesis is correct.

Unfortunately, that’s impossible.  So statisticians take a different approach:  If the evidence is sufficiently unlikely to be generated in the universe where the null hypothesis is true, we’ll reject the null hypothesis and say that the evidence is statistically significant.  But if it is sufficiently likely that the evidence would be generated in a universe where the null hypothesis is true, then we’d say that the evidence is not statistically significant and reject the alternate hypothesis.

For example, consider a coin.  There are two possibilities for the state of the universe:  Either the coin is a fair coin, and will land heads 1/2 the time; or the coin is unfair.  For somewhat technical reasons, “Coin is fair” is the null hypothesis.

Now let’s collect some evidence.  Say we flip the coin 10 times and saw it land heads 8 of those 10 times.  Many people might conclude, based on the evidence, that the coin is unfair.

Herein lies the problem.  By concluding the coin is unfair, we have established a guideline for future experiments:  “If a coin lands heads 8 out of 10 times, conclude that the coin is unfair.”  That works great if someone sees a coin land heads 8 times out of 10 flips.  But what if they see it land heads 9 times out of 10 flips?  It seems reasonable to also conclude the coin is unfair.  Likewise a coin that lands heads 10 times in 10 flips.  And if they see the coin land tails 8, 9, or 10 times out of 10, they might also conclude the coin is unfair.  What this means is that if we make a decision based on the evidence, then any evidence that is at least as compelling should lead us to the same conclusion.

Now suppose you have 100 people testing coins.  If every single one of them has a fair coin, then about 10 of them will see that coin land heads (or tails) 8 or more times out of 10!  So about 10% of those testing fair coins will conclude they are unfair coins.

That 10% corresponds to what statisticians call the level of significance.  What’s important to understand is that the level of significance is completely arbitrary:  it’s based on how often you’re willing to make the wrong decision about the null hypothesis.  The lower the level of significance, the more compelling the evidence must be before you reject the null hypothesis.  And if the evidence isn’t sufficiently compelling, you declare the evidence to be “not statistically significant.”

In this case, at a 5% level of significance, you’d need to see the coin land heads (or tails) at least 9 times out of 10.   With 8 heads in 10 flips, you’d say that the evidence for the coin being unfair is not statistically significant.  And yet, most people would hold that this is compelling evidence that you’re dealing with an unfair coin.

Here’s another way to look at it.  In the movie Dirty Harry (1971), Clint Eastwood utters one of the most iconic lines in movie history.  In case you’ve been living under a rock for the past 40 years, the setup is that Eastwood (a cop and the title character) is facing down a suspect after a chase.  The assailant has a gun just within reach…but Eastwood has his gun drawn and pointed at the suspect.  The problem is:

“Did he fire six shots or only five?” Well to tell you the truth in all this excitement I kinda lost track myself. But being this is a .44 Magnum, the most powerful handgun in the world and would blow your head clean off, you’ve gotta ask yourself one question: “Do I feel lucky?”  Well, do ya, punk?

From the dialog, we can assume there are five confirmed shots.  There are two possibilities:  Either the gun is empty, or the gun has one more round in it.

Let the null hypothesis be “The gun has been emptied.”  A statistically informed punk might reason thusly:  “It is sufficiently likely that ‘five confirmed shots’ could be produced by a now-empty gun.  Therefore the evidence that the gun has one more round is not statistically significant.”  Consequently, they would reject the alternate hypothesis (that the gun has one more round).  In practice, this means they would proceed as if the gun was empty.

(Alert readers will note that the argument works both ways, if we interchange the null and alternate hypotheses.  True enough…but as I said, there are somewhat technical reasons for which hypothesis is the null hypothesis, and if this were my statistics course, I’d use this discussion as a lead-in to how we decide)

# The Sixth Wave

Over the past four thousand years, four waves of mathematical innovation have swept the world, leading to rapid advances and significant changes in society:

• The invention of written number (Fertile Crescent, 3000 BC).  This allowed civilization to exist, because if you want to live with more than your extended family, record keeping is essential…and that means keeping track of numerical amounts.
• The invention of geometry (Greece, 300 BC).  Yes, geometry existed before then; what I’m using is the date of Euclid’s Elements, which is the oldest surviving deductive geometry.  The idea that you could, from a few simple principles, deduce an entire logical structure has a profound impact on society.  How important?  Consider a rather famous line:  “We hold these truths to be self-evident…”  The Declaration of Independence reads like a mathematical theorem, proving the necessity of revolution from some simple axioms.
• The invention of algebra (Iraq, 900).  The problem “A number and its seventh make 19; find the number” appears in a 4000-year-old manuscript from ancient Egypt, so finding  unknown quantities has a very long history.  What algebra adds is an important viewpoint:  Any of the infinite variety of possible problems can be transformed into one of a small number of types.  Thus, “A farmer has 800 feet of fence and wants to enclose the largest area possible” and “Find a number so the sum of the number and its reciprocal is 8” and “The sum of a number is 12 and its product is 20” can all be reduced to $ax^{2} + bx + c = 0$ and solved using the quadratic formula $x = \dfrac{-b \pm \sqrt{b^{2} - 4ac}}{2a}$.
• The invention of calculus (Europe, 1600).  Algebra is the mathematics of what is.  Calculus is the mathematics of how things change.  Calculus makes physics possible, and from physics comes chemistry and engineering.
• The invention of statistics (Europe, 1900).  Both algebra and calculus deal with single objects:  a bridge, a number, a moving planet.  But the universe consists of many similar objects:  the human population; the planetary climate; the trash generated by a city.  Statistics aggregates the data on the individual in a way that can be used to describe a population…then uses the information on a population to predict information about an individual.  Everything in modern society, from the pain relievers you use to the road you travel to work, incorporates such a statistical analysis.

Many people, myself included, believe we are on the verge of a sixth wave.  That sixth wave will have the transformative power of calculus and statistics, and fundamentally reshape society.

The sixth wave is based around discrete mathematics.  That’s not mathematics you whisper in dark corners.  Rather, it’s the mathematics of things that can be counted as opposed to measured.  For example, length is continuous:  a length can have any value, and no one looks at you strangely if you say “I traveled 1.38924 miles today…”  (You might get some strange looks, but it’s because you specified the distance so precisely and not because of the distance itself)  But if you continued “…and met 2.35 people,” you would get strange looks, because the number of people you meet is a counted number:  it’s a discrete quantity.

How important is discrete mathematics?  If calculus is the basis for physics and engineering, then linear algebra is the basis for discrete mathematics.  But a first-year calculus problem would have a hard time solving even a simple question in statics (the physics of structures).  In contrast, Google’s search algorithm is based on mathematics learned in the first half of a standard college linear algebra course.

I’ll talk more about this later.  But if you’re interested in learning some linear algebra, the video lectures for the course I teach are available on YouTube.

# The Geometry of Floods, Part One

In the aftermath of Hurricane Harvey, there’s bound to be questions asked about the wisdom of building in flood plains.

Part of being a mathematician is asking “What can math say?” A lot of what math can say is embedded in actuarial tables and flood insurance premiums, and phrases like “500 year flood.” Those are good topics, but I’m teaching calculus this semester, so my mind turns to geometry.

Consider two homes. One is built some distance from a river, but only a few feet above the normal water level. Another is very close, but much higher up. Which one is in greater danger of being flooded?

To answer this, we need to construct a mathematical model. We do this by making some assumptions about the world, then follow the math. Since I’ll assume you haven’t taken differential equations and calculus, we’ll construct a relatively simple model based on geometry.

We’ll make the following assumptions:

• The normal river surface has constant width $w$. This is unrealistic…but you get what you pay for: a more realistic model is more complex.  (Apologies if that’s not how geologists speak:  I think the last geology course I took listed the Pleistocene as “current events”…)
• The land between the river and House 1 has a gradual but constant slope, and it’s like this for the entirety of the river. I’ll use my expert drawing skills to show you what I mean:
• House 2 is built on the riverbanks.  Again, my artistic skills lend to the following:

As above, we’re assuming the river looks like that for its entirety.
• We’ll model the storm by dumping volume $V$ of water into each river, and seeing how far it rises up the banks.  In particular, what we want to know is that if the water level rises up $x$ above the riverbed around House 2, how far up does it rise up around House 1:
vs.

So here’s the mathematical task:  Suppose the river rises height $x$ about its normal level (the figure on the left).  How far $l$ does it extend past its normal banks (the figure on the right)?

Conversely (mathematicians love this phrase), if you build $l$ away from the normal river bank, how much of a rise above the normal river level are you insulated from?

Now I’m a mathematician…but I’m also a teacher.  And I would be remiss in my duties if I gave you the answer right away.  So mull these over, and I’ll return to the topic next week…

# First Day Jitters

Today is the first day of classes for me.

I’ve been teaching in one form or another for 20 plus years. I’ve survived the Great Calculus War, and even spent time in the Technology Underground engaging in acts of passive resistance against the ancienne regime. And yet the start of every term is fraught with anxiety for me: What am I going to do this semester?

I don’t know how common this problem is among my colleagues. I mention it because it underscores what I believe to be an important fact of life: Never be too comfortable. This doesn’t mean you should go out and buy ill-fitting shoes and eat kale 24/7. Rather, it means that a little discomfort is good, because the general human reaction is to try and change things to become more comfortable.

So I’m anxious about what I’m going to teach this semester. That’s good, because it forces me to look a what I’m teaching and ask myself the all-important question: Why would anyone want to listen to me talk about this subject? (I don’t have an answer for this term yet…which is why the anxiety persists)

On the other side, learning mathematics is all about anxiety too. The critical question is not whether you suffer from math anxiety; it’s what you do about it. Remember that the human reaction is to change to become more comfortable in whatever situation you’re in. For all too many people, the reaction to math anxiety is to avoid math.

And that’s fine, if you live in a stone age society where mathematics can be left to specialists. (Go ahead, ask me about paleomathematics…) But in the modern world, it’s not practical to avoid mathematics. It’s possible, in the same way that it’s possible to avoid reading. But you won’t get very far, and you condemn yourself to being a second or third class citizen.

Instead, the way to fight math anxiety is to accept the discomfort…and push through it anyway. The most important lesson we can learn in life is that we can survive a little discomfort, and when we get through to the other side, we are better for it. And soon enough, you’ll find yourself addicted and actively seek the discomfort because you know you can get through it.

Psychologists no doubt have a litany of strategies for dealing with anxiety disorders. But here’s my suggestion for dealing with math anxiety: Do a little math, every day. The good thing about math is that it’s something you can do in the clamor of your own mind as you go through daily life.

Count things: that’s the beginning of mathematics. Don’t look at the line at Starbucks; count how many people are there. If you do this often enough, you’ll start to think about more efficient ways to count: you’ll find yourself counting by twos, threes, and fives. You’ll also develop what many have called number sense: the ability to estimate quantities with reasonable accuracy.

If the cost is linear, then an 8 + 4 = 12 foot board would cost $3.23 +$1.90 = $5.13. Here the difference between what the board should cost and what it does cost is only a few cents, so we might reasonably conclude that the cost of lumber is linear. Of course, only politicians and pundits base their conclusions on one piece of evidence. What about a 16-foot board? If the cost is linear, then a 12 + 4 = 16 foot board should cost$7.16 + $1.90 =$9.06, and we find it costs $9.19, which is reasonably close. One final note: Even if the cost of lumber is linear, we can’t expect that it will always be linear. Since the 8-foot board costs$3.23, we’d expect a 4-foot board to cost $3.23 –$1.90 (subtract $2, return 10 cents)$1.33, and a 0-foot board would cost $1.33 –$1.90 = $-0.57. But if you go to the checkout with ten 0-foot boards, the cashier is not going to hand you a five spot. And remember, lumber doesn’t grow on trees. (Yes, I meant it…wood grows on trees, but no tree in the forest grows 16-foot boards) A 200-foot board might cost significantly more than linearity would suggest. # Obesity, Poverty, and National Security According to the internet, if you ate only ramen, you’d save thousands of dollars each year in food. That sounds great, except there’s a problem: ramen lacks a wide range of essential nutrients and vitamins. You’d lose your teeth to scurvy, a lack of vitamin D would cause your bones to become brittle and easily broken, you’d suffer nightblindness from a lack of vitamin A, and you’d be tired all the time from a lack of iron and the B vitamins. In short, all the money you saved on food, and much, much, more, would be spent on increased medical care. The problem is that eating healthy is costly. And this leads to a national security crisis. If you want the short version, I’ve summarized the key points in a ten-minute video: A little more mathematics: Food buyers face what mathematicians call a constrained optimization problem: they have to meet certain caloric and nutritional goals (the constraints), which defines a feasible region. Generally speaking, any point in the feasible region defines a solution to the problem; what you want to do is to find the optimal solution. The optimal solution is generally determined by the objective function. For example, if you lived off $x$ packages of ramen and $y$ eggs, the important objective function might be the total cost of your meals. At 15 cents a pack of ramen and 20 cents an egg, the objective function has the form $L = 0.15x + 0.20y$, and we might want to minimize the value of the objective function. In the following, I’ll assume you want to minimize the value of the objective function; the arguments are similar if you’re trying to maximize the value (for example, if you’re designing a set of roads, you might want to maximize the traffic flow through a town center). There’s a theorem in mathematics that says the optimal solution will be found on the boundary of the feasible region. The intuition behind this theorem is the following: Imagine any point inside the feasible region. If you change any one of the coordinates while leaving the others the same, the value of the objective function will generally change. The general idea is to move in the direction that decreases the objective function, and continue moving in that direction until you hit the boundary of the feasible region. At this point, you can’t move any further in that direction. But you can try one of the other directions. Repeating this process allows us to find the optimal solution. We can go further. Suppose our objective function is linear (like the cost function). Then the same analysis tells us the optimal solution will be found at a vertex of the feasible region. This suggests an elegant way to solve linear optimization problems: • Graph the feasible region and locate all the vertices. Generally speaking, the constraints are themselves linear functions, so (in our ramen and egg example) the feasible region will be a polygon. • Evaluate the objective function at each vertex, • Choose the vertex that minimizes the value of the objective function. Easy, huh? Except… • If you have $n$ commodities, you have to work in $\mathbb{R}^{n}$. • This means the feasible region will be some sort of higher solid. • This also means that finding the vertices of the feasible region will require solving systems of $n$ equations in $n$ unknowns. In 1945 ,George Stigler did such an analysis to find a minimal cost diet that met caloric and nutritional requirements. To make the problem tractable, he focused on a diet consisting of just seven food items: wheat flour; evaporated milk; cabbage; spinach; dried navy beans; pancake flour; and pork liver. “Thereafter the procedure is experimental because there does not appear to be any direct method of finding the minimum of a linear function subject to linear conditions.” The problem is that with seven items, you’re working with hyperplanes in $\mathbb{R}^{7}$, and the constraints will give you hundreds of vertices to check. Note the date: 1945. What Stigler didn’t know is that there was a method for finding the minimum value easily. But that’s a story for another post… # The Most Important Letter A question came up on Quora about what letter’s removal would have the greatest impact on the English language. The obvious answer is “E”, since it’s by far the most common letter in English. But let’s consider that. Can you writ a comprhnsibl sntnc that dosnt us ths lttr? Ys, you can! So its not clear that “E” is all that important. So let’s do some mathematics. The key question is: How much information does a given letter provide? Consider the following: I’m thinking of a color. You know the color is either red, green, blue, or fuchsia. (I have no idea what color fuchsia is…I just like the word) Your goal is to determine the color I’m thinking of by asking a sequence of Yes/No questions. One way you could do this is by asking “Are you thinking of red or green?” If the answer is “Yes”, then you might ask “Are you thinking of red?” If the answer is “Yes”, then you know the color is red; if the answer is “No,” then you know the color is green (since I answered “Yes” to the first question). On the other hand, if I answered “No” to the first question, then you know I was thinking of blue or fuchsia, so you might ask “Are you thinking of blue?” A “Yes” tells you I’m thinking blue; a “No” tells you I’m thinking fuchsia. Now reverse it. If you know I’m thinking of the color red, then you have the answer to two Yes/No questions. We say that “red” has an information content of two bits. So far so good. But suppose I’m somewhat dull and can’t think of any color other than red. In that case, you already know what color I’m thinking of, and don’t need to ask any questions. In this situation, “red” has an information content of zero bits. As an intermediate case, suppose that half the time I think of “red,” one-fourth the time I think of “blue”, and one-eighth the time I think of “green” and one-eighth the time I think of “fuchsia.” Then you might ask a different sequence of questions: • Are you thinking of red? (Half the time, I’ll answer “Yes”, so the answer “red”gives you the answer to one question: it’s 1 bit of information) • If the answer is “No,” then “Are you thinking of blue?” Half the time this question is asked (remember it will only be asked if the answer to the first question is “No”), the answer will be “Yes,” so the answer “blue” gives you the answer to two questions: it’s 2 bits of information. • If the answer is “No,” then the final question “Are you thinking of green?” Again, half the time this question is asked, the answer will be “Yes,” which tells you that “green” is worth 3 bits; meanwhile, the answer “No” means I’m thinking of fuchsia, so “fuchsia” is also worth 3 bits. It might seem difficult to determine the information content of an answer, because you have to come up with the questions. But a little theory goes a long way. The best question we could ask are those where half the answers are “Yes” and the other half are “No.” What this means is that if $n$ is the answer to the question $p_{n}$ of the time, then the information content of the answer $n$ will be $-\log_{2} p_{n}$. Thus, if “red” is the color half the time, then “red” has an information content of $-\log_{2} (1/2) = 1$ bit. So what does this mean? “E” makes up about 12.7% of the letters in an English text. But this means that knowing a letter is “E” answers very few questions. So the letter E contains about 3 bits of information. In contrast, “Z” only makes up 0.07% of the letters in an English text, so knowing a letter is “Z” answers many questions. So the letter Z contains about 10.4 bits of information (the maximum). At first glance, this suggests that “Z” may be the most important letter in the English language: losing the letter “Z” will lose the most information. However, there’s a secondary consideration: “Z” doesn’t often appear in a text. So every “Z” you drop from a text loses a lot of information…but you don’t drop that many. And here’s where the greater prevalence of “E” comes in. While the letter “E” only gives you about 3 bits of information, it’s common enough that dropping the letter “E” from a text will lose you more information overall. For example, suppose you had a 10,000 character message. Of these 10,000 characters, you might expect to find 7 Zs, and losing them would lose you about 77 bits of information. In contrast, there would be almost 1300 Es, and losing them would lose about 3800 bits of information. # Exact is Not Accurate Numbers. Over the next few years, you’ll be certain to see a barrage of numbers thrown at you. While researching the latest atrocity promoted by the administration, I came across the following tidbit: The average tuition for private schools is$10,003.

Now, if I want to include this in a blog, vlog, Facebook post, or public speech, I have a conundrum.  Compare the two sentences:

• The average tuition for private schools is $10,003. • The average tuition for private schools is about ten thousand dollars. The first sounds like I know what I’m talking about: that I’ve done some high-level research and wrestled a number to the ground. The second sounds like I spent thirty seconds on Google. (Actually, the first number was based on thirty seconds on Google) The difference is that I sound more convincing with the exact figure. In fact, there’s a story (which might or might not be true) that when the first surveyors found the height of Mount Everest, they came to a value of 29,000 feet…but they published it as 29,002, because that sounded more accurate. The problem is the exact figure might not be accurate. Consider the two statements: • The population of the United States is 324,595,182. • The population of the United States is 325 million. The first gives an exact number, and sounds very accurate. But it is almost certainly false. In particular, even if the population of the US was 324,595,182 at some point, it is almost certainly not 324,595,182 right now. On the other hand, it’s still about 325 million, and will be so for awhile. (I talk about this in my FOCUS article). There’s a concept in the sciences called significant figures. The gist of it is this: When I give you a number, I am giving you a guarantee that the non-zero digits of the number are correct. (The zeroes are a little more complicated: if you want a crash course on signficant figures, here’s the video I have my students watch) • If I claim 324,595,182, then I’m guaranteeing each and every digit is exact…and if the population is 324,595,183, then I’ve fed you misinformation. • If I claim 324 million, then I’m guaranteeing that the population is somewhere between 323,500,000 and 324,499,999 (since anything in this range would round to 324 million). What’s the big deal? One problem with statistics is that people don’t believe them. You’ve heard the quote: “There are three types of lies: Lies, damned lies, and statistics.” I suspect part of the reason is that if someone says “The average tuition at private schools is$10,003,” they can respond with “But at our school, it’s $7500, so how do you get an average of$10,003?”   This generally leads to a discussion of how to calculate averages, and often degenerates into accusations of skewed samples.

On the other hand, if you say “The average tuition at private schools is around $10,000,” then to the person who says “But we only pay$7500,” the response is “Which is around $10,000.” By avoiding the mechanics of computing the number, we focus on the value itself. # Math for Democracy I surrender. I’ve been trying to keep this blog politics free, or at least minimize the politics: when I talked about the I focused on estimating the crowd size and not on the reasons behind it. I’m still going to minimize the politics. But it’s clear that we’re heading towards a major crisis. I’m not talking about the person in the White House, or Russian interference, or anything that minor. I’m talking about the denial of basic fact-finding. You’ve heard the term “fake news.” The problem is that most Americans get their information from one or two sources, which they don’t verify. If those sources are unreliable, then they’re going to get a warped view of the world. So I have a new mantra: Five minutes a day. Take five minutes a day to track down a fact. You might start with the news story, but don’t end with it. Who did they interview? If they’re reporting on a piece of research, track down the original article and check out the legitimacy of the publisher. If they’re reporting on an incident, go to the local newspapers and see what their coverage is. If they’re talking about waste in government spending, go to USAspending.gov and see how your money is spent. So let’s talk about that. One of the promises of the new administration is to drastically curtail the U.S. Department of Education, returning control of schools to the states. Sounds good, right? But go to USAspending.gov to see how the Department of Education actually spends your money. Note that I’m giving you the source, so you should feel free to check my claims. (A guaranteed way to identify something as “not a fact” is that lack of a source: If there’s no source, it’s not a fact. Keep in mind this does not work in reverse: you can cite a source and still spew non-facts) Most federal agencies suck in a lot of taxpayer dollars…and then shovel them back to the states in the form of grants. Find the government department you’re interested in, then download the grants database: this tells you who they’ve given money to, and how much. You can import it into Excel, or download it as a CSV and use your own spreadsheet software. Then the fun begins… You can sort the grants by any category you want. The cost of the elected President’s recent trips to Mar-a-Lago have been in the news: current estimates for the three weekend trips (out of five weekends in office) are around$12 million, so here’s a few grants made by the Department of Education that are around this much.  I’ve deliberately chosen programs that benefits states where Trump support was very strong:  yes, New York, California, and other states get money from the Department of Education, so of course we’re concerned…the point is that states that supported Trump need to be even more concerned, because here are some of the things they’re going to lose:

• Nevada: $9,928,139 for Vocational Rehabilitation training. Nevada has received almost$200 million in grants from the Department of Education since January 1, 2016.
• Kansas: $10,669,790 for Department for Children and Families for Vocational Rehabilitation training. Kansas has received more than$210 million in grants since January 1, 2016.
• Texas: $11,187,178 to Bexar County Texas for “Impact Aid.” The army base Fort Sam Houston occupies a good part of Bexar County, and this land can’t be taxed, impacting the county’s ability to pay for schools. That’s money local taxpayers don’t have to pay. The Department of Education has given more than$600 million in Impact Aid grants since October 2016, reducing tax burdens around the country.

Now for some math.  On a dollar basis, California, Texas, and New York have received the most from the Department of Education.  But they’re also the biggest states in the country.  An easily googlable fact is the population of these states; if you divide how much each states gets by its population, you obtain a per capita figure.

These are interesting.  A few more states that stand to lose big if Trump eliminates the Department of Education:

• Alabama: $15,912,537 for preschool programs. Alabama received more than$400 million in grants. On a per person basis, that’s 26% more than Connecticut gets.
• Louisiana: $9,177,379 for preschool education programs. Louisiana has received nearly$500 million in grants. On a per-person basis, that 37% more than California receives.
• West Virginia: $9,828,491 for vocational and rehabilitation services. West Virginia has received more than$160 million from the Department of Education. On a per person basis, that’s 50% more than Massachusetts.

# Lies, Damned Lies, and Statistics

We all know the quote:  “There are three types of lies: lies, damned lies, and statistics.”

But like many things that are short enough to tweet, this statement is misleading.

Statistics don’t lie.  People do, generally by omitting key pieces of information.   Any statistic worth repeating should include two other numbers.   If these are missing, the whole truth is being kept from you.

The two numbers to look for are:

• The sample size.  This is the number of cases examined.  If you base a conclusion on one example, you’re a politician or a pundit, relying on anecdotal evidence and shouting instead of facts and logic.  While a large sample won’t guarantee reliability,  a small sample will almost always be untrustworthy.
• The p-value.   This is a little more complicated,  but roughly speaking, it measures how convinced you should be.   A  small p-value (0.05 or less) means the evidence is very convincing.

I’ll talk more about these later.  Until then,  remember: if someone doesn’t give you these values,  they’re not telling you the whole truth.