Exact is Not Accurate


Over the next few years, you’ll be certain to see a barrage of numbers thrown at you.  While researching the latest atrocity promoted by the administration, I came across the following tidbit:  The average tuition for private schools is $10,003.

Now, if I want to include this in a blog, vlog, Facebook post, or public speech, I have a conundrum.  Compare the two sentences:

  • The average tuition for private schools is $10,003.
  • The average tuition for private schools is about ten thousand dollars.

The first sounds like I know what I’m talking about:  that I’ve done some high-level research and wrestled a number to the ground.  The second sounds like I spent thirty seconds on  Google.  (Actually, the first number was based on thirty seconds on Google)

The difference is that I sound more convincing with the exact figure.  In fact, there’s a story (which might  or might not be true) that when the first surveyors found the height of Mount Everest, they came to a value of 29,000 feet…but they published it as 29,002, because that sounded more accurate.

The problem is the exact figure might not be accurate.  Consider the two statements:

  • The population of the United States is 324,595,182.
  • The population of the United States is 325 million.

The first gives an exact number, and sounds very accurate.  But it is almost certainly false.  In particular, even if the population of the US was 324,595,182 at some point, it is almost certainly not 324,595,182 right now.  On the other hand, it’s still about 325 million, and will be so for awhile.  (I talk about this in my FOCUS article).

There’s a concept in the sciences called significant figures.  The gist of it is this:  When I give you a number, I am giving you a guarantee that the non-zero digits of the number are correct.  (The zeroes are a little more complicated:  if you want a crash course on signficant figures, here’s the video I have my students watch)

  • If I claim 324,595,182, then I’m guaranteeing each and every digit is exact…and if the population is 324,595,183, then I’ve fed you misinformation.
  • If I claim 324 million, then I’m guaranteeing that the population is somewhere between 323,500,000 and 324,499,999 (since anything in this range would round to 324 million).

What’s the big deal?  One problem with statistics is that people don’t believe them.  You’ve heard the quote:  “There are three types of lies:  Lies, damned lies, and statistics.” I suspect part of the reason is that if someone says “The average tuition at private schools is $10,003,” they can respond with “But at our school, it’s $7500, so how do you get an average of $10,003?”   This generally leads to a discussion of how to calculate averages, and often degenerates into accusations of skewed samples.

On the other hand, if you say “The average tuition at private schools is around $10,000,” then to the person who says “But we only pay $7500,” the response is “Which is around $10,000.”  By avoiding the mechanics of computing the number, we focus on the value itself.


Math for Democracy

I surrender.

I’ve been trying to keep this blog politics free, or at least minimize the politics:  when I talked about the March Across the Hudson, I focused on estimating the crowd size and not on the reasons behind it.

I’m still going to minimize the politics.  But it’s clear that we’re heading towards a major crisis.  I’m not talking about the person in the White House, or Russian interference, or anything that minor.  I’m talking about the denial of basic fact-finding.

You’ve heard the term “fake news.”  The problem is that most Americans get their information from one or two sources, which they don’t verify.  If those sources are unreliable, then they’re going to get a warped view of the world.  So I have a new mantra:

Five minutes a day.

Take five minutes a day to track down a fact.  You might start with the news story, but don’t end with it.  Who did they interview?  If they’re reporting on a piece of research, track down the original article and check out the legitimacy of the publisher.  If they’re reporting on an incident, go to the local newspapers and see what their coverage is.  If they’re talking about waste in government spending, go to USAspending.gov and see how your money is spent.

So let’s talk about that.  One of the promises of the new administration is to drastically curtail the U.S. Department of Education, returning control of schools to the states. Sounds good, right?  But go to USAspending.gov to see how the Department of Education actually spends your money.  Note that I’m giving you the source, so you should feel free to check my claims.   (A guaranteed way to identify something as “not a fact” is that lack of a source:  If there’s no source, it’s not a fact.  Keep in mind this does not work in reverse:  you can cite a source and still spew non-facts)

Most federal agencies suck in a lot of taxpayer dollars…and then shovel them back to the states in the form of grants.  Find the government department you’re interested in, then download the grants database: this tells you who they’ve given money to, and how much.  You can import it into Excel, or download it as a CSV and  use your own spreadsheet software.  Then the fun begins…

You can sort the grants by any category you want.  The cost of the elected President’s recent trips to Mar-a-Lago have been in the news:  current estimates for the three weekend trips (out of five weekends in office) are around $12 million, so here’s a few grants made by the Department of Education that are around this much.  I’ve deliberately chosen programs that benefits states where Trump support was very strong:  yes, New York, California, and other states get money from the Department of Education, so of course we’re concerned…the point is that states that supported Trump need to be even more concerned, because here are some of the things they’re going to lose:

  • Nevada: $9,928,139 for Vocational Rehabilitation training. Nevada has received almost $200 million in grants from the Department of Education since January 1, 2016.
  • Kansas: $10,669,790 for Department for Children and Families for Vocational Rehabilitation training. Kansas has received more than $210 million in grants since January 1, 2016.
  • Texas: $11,187,178 to Bexar County Texas for “Impact Aid.” The army base Fort Sam Houston occupies a good part of Bexar County, and this land can’t be taxed, impacting the county’s ability to pay for schools. That’s money local taxpayers don’t have to pay.   The Department of Education has given more than $600 million in Impact Aid grants since October 2016, reducing tax burdens around the country.

Now for some math.  On a dollar basis, California, Texas, and New York have received the most from the Department of Education.  But they’re also the biggest states in the country.  An easily googlable fact is the population of these states; if you divide how much each states gets by its population, you obtain a per capita figure.

These are interesting.  A few more states that stand to lose big if Trump eliminates the Department of Education:

  • Alabama: $15,912,537 for preschool programs. Alabama received more than $400 million in grants. On a per person basis, that’s 26% more than Connecticut gets.
  • Louisiana: $9,177,379 for preschool education programs. Louisiana has received nearly $500 million in grants. On a per-person basis, that 37% more than California receives.
  • West Virginia: $9,828,491 for vocational and rehabilitation services. West Virginia has received more than $160 million from the Department of Education. On a per person basis, that’s 50% more than Massachusetts.


Lies, Damned Lies, and Statistics

We all know the quote:  “There are three types of lies: lies, damned lies, and statistics.”

But like many things that are short enough to tweet, this statement is misleading.

Statistics don’t lie.  People do, generally by omitting key pieces of information.   Any statistic worth repeating should include two other numbers.   If these are missing, the whole truth is being kept from you. 

The two numbers to look for are:

  • The sample size.  This is the number of cases examined.  If you base a conclusion on one example, you’re a politician or a pundit, relying on anecdotal evidence and shouting instead of facts and logic.  While a large sample won’t guarantee reliability,  a small sample will almost always be untrustworthy. 
  • The p-value.   This is a little more complicated,  but roughly speaking, it measures how convinced you should be.   A  small p-value (0.05 or less) means the evidence is very convincing. 

I’ll talk more about these later.  Until then,  remember: if someone doesn’t give you these values,  they’re not telling you the whole truth. 

Underreporting of Deaths

The White House released a listing of 78 terrorist attacks it claims were underreported by western media. Both the BBC and the New York Times have responded by posting links to the numerous stories they ran on these incidents, debunking the belief that these incidents weren’t reported in detail.

We can go further. The vast majority (56) of the terrorist attacks resulted in one or fewer deaths. Of these, only 19 people actually died; the remaining victims were wounded. The articles run by the New York Times on these 19 deaths had an average length of 705 words.

Of course, this number alone doesn’t tell us much. To be meaningful, we need some basis for comparison. One possibility is the average word length of articles on single murders. Unfortunately, there’s no shortage of such articles:

  • On February 6, 2017, a Virginia woman shoots her 6-year-old daughter.
  • On February 3, 2017, a 12-year-old shoots a store clerk in Arkansas.
  • On February 2, 2017, a 14-year-old girl shoots her brother over a video game in Toledo.
  • On February 2, 2017, two men shoot another man during a Craigslist robbery.
  • On January 9, 2017, a Florida police officer is killed.
  • On December 24, 2016, a man in Arkansas shoots at a car for tailgeting, killing a toddler.
  • On December 1, 2016, Joe McKnight is killed in what appears to be an incident of road rage.
  • On August 3, 2016, the body of Karina Vetrano is found in a Queens park.

I’m still collecting data, because it seems there’s a journal article here, but the preliminary data is too interesting to ignore.

These eight articles have an average length of 386 words. Actually, this figure is probably higher than the average for single murders: Joe McKnight was a NFL football player, and Karina Vetrano had considerably more coverage because of its local nature.

What this suggests is that if you’re killed by a terrorist attack, your death is likely to receive twice as much coverage as it would if you were merely killed as part of an ordinary crime. A similar analysis of stories from the BBC suggests that terrorist attacks get four times as much coverage as ordinary crimes.

Manufacturing and Mining

One of the oft-quoted statistics is that if it were an independent country, California’s economy would be among the ten largest.  In 2015, it was sixth, just behind the UK and just ahead of France.  Texas and New York are also major players:  Texas’s economy is slightly larger than Canada’s (in 10th place), while New York’s is just behind Canada.

However, there’s an important factor:  California (where I was born) is also more populous than New York (where I live) and Texas (where I have relatives).  Thus, while China’s economy is larger than New York’s, China has more people; as a result, the standard of living in China is lower.  From the ground, the important question isn’t “How much does my country make?” but rather “How much do make?”

For that, you want to look at the per capita figures:  that’s the total GDP divided by the total population.  I won’t do the comparisons for other countries, but only for California, New York, and Texas.  Under this comparison (again for 2015), New York comes out 2nd, California is 10th, and Texas is 13th.  Put another way:  If these states had the same populations, New York’s economy would be about 20% larger than California’s. (I’m using data from Wikipedia, if you want to play with the numbers yourself).

Now, unless you’ve been living under a rock, you know that the United States has a new President who’s rather controversial.   However, one of his campaign promises is that he’ll bring manufacturing and mining back to the US, and much of his appeal is in the so-called “Rust Belt,” where over the past twenty years millions of jobs have been lost.

Part of the argument is that the various free trade agreements made over the past forty years have destroyed manufacturing and mining, by making it easier to ship jobs.  That’s probably true, though almost every economist who’s studied trade has concluded that it also generates quite a lot of jobs here.  Again, the problem from the ground is that the jobs it generates are very different from the jobs that disappear:  It’s little consolation to a steelworker than the financial services industry is booming.

I’m going to take a look at one very specific industry, that seemed to support the new President very strongly:  coal.  In fact, one of the very first things the new government has done is ease regulations on coal mining companies regarding what they can dump into streams; this is being hailed as a way to re-open mines and get more miners to work.

Sounds good, right?  Except there’s a problem:  Productivity.

  • In 1985, the US coal industry produced about 900 million tons of coal, and employed about 180,000 workers.
  • In 2015, the US coal industry produced about 900 million tons of coal, and employed about 65,000 workers.

What should be clear from these figures is that coal mining jobs haven’t disappeared because they’ve been shipped overseas:  we’re producing as much coal as we did thirty years ago.  But we’re using one-third as many workers.  So what does this mean?  Reopening mines and restarting coal production will produce a bump in employment.  But the vast majority of coal jobs are never coming back.  

This problem is true across the spectrum:  technology is being used to get more done with fewer people.  We can bring back coal mining…but not the jobs.  We can bring back auto production…but not the jobs.  We can bring back textile manufacturing…but not the jobs.  The vast majority of manufacturing jobs are never coming back.

So what can be done?  We can revive the industry…but the jobs won’t be there.  We can take consolation in that other industries are booming…but that doesn’t help the displaced workers.

The only viable solution is education and retraining.  Rather than waste time, money, and effort trying to rebuild an industry that won’t employ many workers, it would be far more useful to spend that time, money, and effort to retraining our workers so they can build their own industries.