The Mathematics of Medicare For All

Single payer is back in the political landscape (though, as even its supporters acknowledge, it has 0 chance of getting anywhere this cycle).  Let’s run the numbers.

First:  the US spends about $10,000 per person per year on health care.  If you follow the link, you’ll see that the actual value is different…but this is a math blog, so I’ll take it as a teachable moment of how to use estimates.

Second:  No one really knows how much MediCare for all would cost, but the most repeated estimate is around $1.5 trillion dollars.    Again, the link gives a slightly different number…

Third:  The US population is about 300 million.  

So divide one by the other, and you’ll find the MediCare for all would cost $5000 per person per year.  So one good argument in favor of such a plan is that most people, who buy their own insurance or who get insurance through their employer, pay more than this amount for their insurance, and when you add in the amount their employer chips in, you’re talking about an incredibly rare situation:  a government policy that benefits both employees (who take home more money) and their employers (who spend less on employee benefits).

Now here’s where the numbers get tricky.  If I’m a single person, then I win.  But what if I’m the sole wage earner supporting a family of four?  At $5000 per person, I’m now on the line for $20,000 in health insurance costs, which is far more than I would have paid.  Under these conditions, MediCare for all  is a losing proposition for me.

Here’s where things get complicated.  We’ll take an example from another context to see why:  If I’m a single person, then I spend about 10% of my income on food.  But if I’m the sole wage earner supporting a family of four, the amount I spend on food will not be 40%.  Instead:

  • If my single income can support a family of 4 in the same lifestyle that it supports a single person, then my food outlay is still going to be around 10%.
  • On the other hand, if my single income can only support one person, then that food budget will expand significantly.

A better way to look at it is through the federal budget (since it will, sooner or later, be paid for by taxes).  Currently, MediCare runs around $500 billion, so MediCare for all would add $1 trillion to the federal budget.  The federal budget itself is around $4 trillion, so we’re talking a 25% increase in the federal budget.  

There’s many ways to pay for this, but the simplest is an across-the-board increase in taxes by 25%.  (Sanders plan incorporates a variety of methods to reduce the “average” pain, but that would complicate the analysis below…I’m not a policy wonk)

Now before you reach for your phone (email, pen and paper) to write your Congresscritter, let’s put this 25% increase in perspective.  Remember that MediCare for all would replace what you’ve spent on health insurance.  So the key equation is

\text{Health Insurance} \overset{?}{>} 25\% \text{Tax Increase}

If your current health insurance is more than a 25% tax increase would be, you’ve won. Otherwise, you lose.  To determine this, multiply by four:

4 \times \text{Health Insurance} \overset{?}{>} 100\% \text{Tax Increase} = \text{Current Taxes}

This gives you a gauge of whether you win or lose under MediCare for all:

  • Take the amount you spend on health insurance.   Multiply it by 4.  If the amount is greater than your taxes, then you win.
  • Otherwise, you lose.

If you’re making $100,000 a year, you’re probably paying about $20,000 in federal taxes (unless you have a really, really bad accountant…or a really, really good one).  This means that if you’re paying more than $5000 a year in health insurance, you win under MediCare for all.  


God and Probability

In 1710, John Arbuthnot, an English physician, published an article proving the existence of God.  Arbuthnot’s argument was based on the following:

  • For 82 years, the number of boys born in London has been greater than the number of girls born in London.
  • This is too unlikely to happen by chance.  Therefore, “Divine Providence” arranged it.

To see why this argument isn’t a good one, consider the analogous argument: I have a rock. When I let it go, it can either fall up or fall down. So if I let it go 100 times, the chance of it falling every time is 1 in 2^{100}, which is so small that we may regard this occurrence as impossible.

And yet, the rock falls every time. Does this improbable event prove the existence of “not a sparrow falls” God?

This example should make clear the flaw in Arbuthnot’s argument, as well as those of his modern emulators. The probability assumptions being made are questionable. Arbuthnot assumed boys and girls were born with equal frequency, and when he found they weren’t, concluded God exists. But a better conclusion is that boys and girls aren’t born with equal frequency… because the evidence says they’re not.

This leads to the following conclusion: no probability argument can be used to prove the existence of God. This is because the underlying probabilities are always subject to debate.

For example, if I flip a coin and it lands heads 99 times in a row, I could believe I’ve just witnessed an extremely improbable event. But I’m more inclined to conclude the coin isn’t a fair coin.

Or say we find intelligent alien life out there, a la Star Trek, with one difference: every species we encounter is genetically identical to modern humans. Would this be evidence for a God who made man in His image?

I’ll admit, I’d wonder. But ultimately, I’d question the assumption that the human form is the product of random selection in the genetic lottery: Maybe there are undiscovered laws of physics that favor five phalanged bipeds without feathers. (Science fiction offers other scenarios, e.g. alien gardeners who make species in the form they want, and if you choose to believe God is an alien gardener, then this is an argument that would be compelling)

Flip Your Class! (Or not)

There’s been some buzz about a new style of teaching called a flipped or inverted class. The proponents make grand claims about how it will revolutionize education. Of course, we’ve heard these claims before, so we’re inclined to be a little hesitant.

Before addressing these claims, let’s consider what the flipped class replaces. In a traditional math class, students go to class to hear a lecture, then go home to do an assignment.

Let’s consider a good, conscientious student. Even if they do the work, the earliest they’ll get feedback on whether they’ve done it correctly is two days after they were exposed to the material: If the lesson was taught on Monday and the homework turned in Tuesday, they won’t get it back until Wednesday. That means that if they didn’t understand the material, they are now two days into new material. This is a common problem in math: If you get behind, you stay behind.

So how can we solve this problem? The best way is to have the student working a problem under guidance, with constant feedback. That way, any misunderstandings or misconceptions can be cleared up before the student moves on to the next section.

The ideal situation is to have the teacher give a lecture, then work with the students to solve problems. That works well…as long as the lecture is short enough to allow time to solve problems. Some of the classes I teach are 3 hours and 45 minutes, and I can lecture and work with students.

The problem is that with shorter classes you have to choose: lecture or student work? The problem gets magnified when you have to “cover” material and complete a set list of topics in a semester.

The solution offered by a flipped class is to move the lecture offline…by putting it online. Then all of class can be spent on student work. This is the heart and soul of a flipped class: students sit through the lecture at home, then do assignments in class with guidance and supervision.

First, let me extol the advantages. There are many:

  • Taking notes in mathematics is like writing down the score of a symphony. Even if you capture everything perfectly, you will still miss the nuances, because math is a process, not a product. Ever try to learn woodworking by reading a book? I have…and amazingly enough, I can still count to 9 and 1/2. But I fixed the belt on our dryer by watching the repair guys on YouTube.
  • Lectures proceed at the pace of the instructor. But that will almost always be too fast for some students and simultaneously too slow for others. Videos can be played at half speed or double time. And if you miss something, there’s always rewind.
  • A good lecture tries to get audience engagement by asking for input: “So what’s the next step?” The problem is either that no one knows, so there’s along, awkward pause before the lecture continues, or someone does and gives the answer. While the latter is a good thing, in practice only a few students have the opportunity to answer, and the students who might be able to answer, but take a little longer to get there, lose the chance to contribute. But they can hit pause and take as long as they need to collect their thoughts.

Now if you read the foregoing carefully, you’ll realize that, while these are arguments for using math videos, they aren’t necessarily arguments for flipped classes.

And that’s because flipped classes aren’t for everyone. I’ll talk about that in a post.

Statistical Significance…And Insignificance

One of the most damning phrases in the scientific literature is “Not statistically significant.”  In a world where policy is announced by 140 character twits, the wordy “not statistically significant” readily becomes “not significant” and then “irrelevant.”  But “not statistically significant” has a very specific meaning…and only rarely does it mean “irrelevant.”

Formally, we have the following problem:  There is a mass of evidence, and two possible ways that evidence could have been produced, called the null hypothesis and the alternate hypothesis. Your mission, should you choose to accept it, is to decide which hypothesis is correct.

Unfortunately, that’s impossible.  So statisticians take a different approach:  If the evidence is sufficiently unlikely to be generated in the universe where the null hypothesis is true, we’ll reject the null hypothesis and say that the evidence is statistically significant.  But if it is sufficiently likely that the evidence would be generated in a universe where the null hypothesis is true, then we’d say that the evidence is not statistically significant and reject the alternate hypothesis.

For example, consider a coin.  There are two possibilities for the state of the universe:  Either the coin is a fair coin, and will land heads 1/2 the time; or the coin is unfair.  For somewhat technical reasons, “Coin is fair” is the null hypothesis.

Now let’s collect some evidence.  Say we flip the coin 10 times and saw it land heads 8 of those 10 times.  Many people might conclude, based on the evidence, that the coin is unfair.

Herein lies the problem.  By concluding the coin is unfair, we have established a guideline for future experiments:  “If a coin lands heads 8 out of 10 times, conclude that the coin is unfair.”  That works great if someone sees a coin land heads 8 times out of 10 flips.  But what if they see it land heads 9 times out of 10 flips?  It seems reasonable to also conclude the coin is unfair.  Likewise a coin that lands heads 10 times in 10 flips.  And if they see the coin land tails 8, 9, or 10 times out of 10, they might also conclude the coin is unfair.  What this means is that if we make a decision based on the evidence, then any evidence that is at least as compelling should lead us to the same conclusion.

Now suppose you have 100 people testing coins.  If every single one of them has a fair coin, then about 10 of them will see that coin land heads (or tails) 8 or more times out of 10!  So about 10% of those testing fair coins will conclude they are unfair coins.

That 10% corresponds to what statisticians call the level of significance.  What’s important to understand is that the level of significance is completely arbitrary:  it’s based on how often you’re willing to make the wrong decision about the null hypothesis.  The lower the level of significance, the more compelling the evidence must be before you reject the null hypothesis.  And if the evidence isn’t sufficiently compelling, you declare the evidence to be “not statistically significant.”

In this case, at a 5% level of significance, you’d need to see the coin land heads (or tails) at least 9 times out of 10.   With 8 heads in 10 flips, you’d say that the evidence for the coin being unfair is not statistically significant.  And yet, most people would hold that this is compelling evidence that you’re dealing with an unfair coin.

Here’s another way to look at it.  In the movie Dirty Harry (1971), Clint Eastwood utters one of the most iconic lines in movie history.  In case you’ve been living under a rock for the past 40 years, the setup is that Eastwood (a cop and the title character) is facing down a suspect after a chase.  The assailant has a gun just within reach…but Eastwood has his gun drawn and pointed at the suspect.  The problem is:

“Did he fire six shots or only five?” Well to tell you the truth in all this excitement I kinda lost track myself. But being this is a .44 Magnum, the most powerful handgun in the world and would blow your head clean off, you’ve gotta ask yourself one question: “Do I feel lucky?”  Well, do ya, punk?

From the dialog, we can assume there are five confirmed shots.  There are two possibilities:  Either the gun is empty, or the gun has one more round in it.

Let the null hypothesis be “The gun has been emptied.”  A statistically informed punk might reason thusly:  “It is sufficiently likely that ‘five confirmed shots’ could be produced by a now-empty gun.  Therefore the evidence that the gun has one more round is not statistically significant.”  Consequently, they would reject the alternate hypothesis (that the gun has one more round).  In practice, this means they would proceed as if the gun was empty.


(Alert readers will note that the argument works both ways, if we interchange the null and alternate hypotheses.  True enough…but as I said, there are somewhat technical reasons for which hypothesis is the null hypothesis, and if this were my statistics course, I’d use this discussion as a lead-in to how we decide)

The Sixth Wave

Over the past four thousand years, four waves of mathematical innovation have swept the world, leading to rapid advances and significant changes in society:

  • The invention of written number (Fertile Crescent, 3000 BC).  This allowed civilization to exist, because if you want to live with more than your extended family, record keeping is essential…and that means keeping track of numerical amounts.
  • The invention of geometry (Greece, 300 BC).  Yes, geometry existed before then; what I’m using is the date of Euclid’s Elements, which is the oldest surviving deductive geometry.  The idea that you could, from a few simple principles, deduce an entire logical structure has a profound impact on society.  How important?  Consider a rather famous line:  “We hold these truths to be self-evident…”  The Declaration of Independence reads like a mathematical theorem, proving the necessity of revolution from some simple axioms.
  • The invention of algebra (Iraq, 900).  The problem “A number and its seventh make 19; find the number” appears in a 4000-year-old manuscript from ancient Egypt, so finding  unknown quantities has a very long history.  What algebra adds is an important viewpoint:  Any of the infinite variety of possible problems can be transformed into one of a small number of types.  Thus, “A farmer has 800 feet of fence and wants to enclose the largest area possible” and “Find a number so the sum of the number and its reciprocal is 8” and “The sum of a number is 12 and its product is 20” can all be reduced to ax^{2} + bx + c = 0 and solved using the quadratic formula x = \dfrac{-b \pm \sqrt{b^{2} - 4ac}}{2a}.
  • The invention of calculus (Europe, 1600).  Algebra is the mathematics of what is.  Calculus is the mathematics of how things change.  Calculus makes physics possible, and from physics comes chemistry and engineering.
  • The invention of statistics (Europe, 1900).  Both algebra and calculus deal with single objects:  a bridge, a number, a moving planet.  But the universe consists of many similar objects:  the human population; the planetary climate; the trash generated by a city.  Statistics aggregates the data on the individual in a way that can be used to describe a population…then uses the information on a population to predict information about an individual.  Everything in modern society, from the pain relievers you use to the road you travel to work, incorporates such a statistical analysis.

Many people, myself included, believe we are on the verge of a sixth wave.  That sixth wave will have the transformative power of calculus and statistics, and fundamentally reshape society.

The sixth wave is based around discrete mathematics.  That’s not mathematics you whisper in dark corners.  Rather, it’s the mathematics of things that can be counted as opposed to measured.  For example, length is continuous:  a length can have any value, and no one looks at you strangely if you say “I traveled 1.38924 miles today…”  (You might get some strange looks, but it’s because you specified the distance so precisely and not because of the distance itself)  But if you continued “…and met 2.35 people,” you would get strange looks, because the number of people you meet is a counted number:  it’s a discrete quantity.

How important is discrete mathematics?  If calculus is the basis for physics and engineering, then linear algebra is the basis for discrete mathematics.  But a first-year calculus problem would have a hard time solving even a simple question in statics (the physics of structures).  In contrast, Google’s search algorithm is based on mathematics learned in the first half of a standard college linear algebra course.

I’ll talk more about this later.  But if you’re interested in learning some linear algebra, the video lectures for the course I teach are available on YouTube.

The Geometry of Floods, Part One

In the aftermath of Hurricane Harvey, there’s bound to be questions asked about the wisdom of building in flood plains.

Part of being a mathematician is asking “What can math say?” A lot of what math can say is embedded in actuarial tables and flood insurance premiums, and phrases like “500 year flood.” Those are good topics, but I’m teaching calculus this semester, so my mind turns to geometry.

Consider two homes. One is built some distance from a river, but only a few feet above the normal water level. Another is very close, but much higher up. Which one is in greater danger of being flooded?

To answer this, we need to construct a mathematical model. We do this by making some assumptions about the world, then follow the math. Since I’ll assume you haven’t taken differential equations and calculus, we’ll construct a relatively simple model based on geometry.

We’ll make the following assumptions:

  • The normal river surface has constant width w. This is unrealistic…but you get what you pay for: a more realistic model is more complex.  (Apologies if that’s not how geologists speak:  I think the last geology course I took listed the Pleistocene as “current events”…)
  • The land between the river and House 1 has a gradual but constant slope, and it’s like this for the entirety of the river. I’ll use my expert drawing skills to show you what I mean:
  • House 2 is built on the riverbanks.  Again, my artistic skills lend to the following:

    As above, we’re assuming the river looks like that for its entirety.
  • We’ll model the storm by dumping volume V of water into each river, and seeing how far it rises up the banks.  In particular, what we want to know is that if the water level rises up x above the riverbed around House 2, how far up does it rise up around House 1:

So here’s the mathematical task:  Suppose the river rises height x about its normal level (the figure on the left).  How far l does it extend past its normal banks (the figure on the right)?

Conversely (mathematicians love this phrase), if you build l away from the normal river bank, how much of a rise above the normal river level are you insulated from?

Now I’m a mathematician…but I’m also a teacher.  And I would be remiss in my duties if I gave you the answer right away.  So mull these over, and I’ll return to the topic next week…


First Day Jitters

Today is the first day of classes for me.

I’ve been teaching in one form or another for 20 plus years. I’ve survived the Great Calculus War, and even spent time in the Technology Underground engaging in acts of passive resistance against the ancienne regime. And yet the start of every term is fraught with anxiety for me: What am I going to do this semester?

I don’t know how common this problem is among my colleagues. I mention it because it underscores what I believe to be an important fact of life: Never be too comfortable. This doesn’t mean you should go out and buy ill-fitting shoes and eat kale 24/7. Rather, it means that a little discomfort is good, because the general human reaction is to try and change things to become more comfortable.

So I’m anxious about what I’m going to teach this semester. That’s good, because it forces me to look a what I’m teaching and ask myself the all-important question: Why would anyone want to listen to me talk about this subject? (I don’t have an answer for this term yet…which is why the anxiety persists)

On the other side, learning mathematics is all about anxiety too. The critical question is not whether you suffer from math anxiety; it’s what you do about it. Remember that the human reaction is to change to become more comfortable in whatever situation you’re in. For all too many people, the reaction to math anxiety is to avoid math.

And that’s fine, if you live in a stone age society where mathematics can be left to specialists. (Go ahead, ask me about paleomathematics…) But in the modern world, it’s not practical to avoid mathematics. It’s possible, in the same way that it’s possible to avoid reading. But you won’t get very far, and you condemn yourself to being a second or third class citizen.

Instead, the way to fight math anxiety is to accept the discomfort…and push through it anyway. The most important lesson we can learn in life is that we can survive a little discomfort, and when we get through to the other side, we are better for it. And soon enough, you’ll find yourself addicted and actively seek the discomfort because you know you can get through it.

Psychologists no doubt have a litany of strategies for dealing with anxiety disorders. But here’s my suggestion for dealing with math anxiety: Do a little math, every day. The good thing about math is that it’s something you can do in the clamor of your own mind as you go through daily life.

Count things: that’s the beginning of mathematics. Don’t look at the line at Starbucks; count how many people are there. If you do this often enough, you’ll start to think about more efficient ways to count: you’ll find yourself counting by twos, threes, and fives. You’ll also develop what many have called number sense: the ability to estimate quantities with reasonable accuracy.

Once you’ve gotten into the habit of counting so that it’s second nature…in other words, once you’re comfortable with it…introduce a little anxiety and start to do arithmetic on a regular basis. There’s eight people ahead of you in line at Starbucks; how long is it going to take before you get your coffee? How much is Starbucks making off the people in line? What’s the average wait time? Soon enough, you’ll be running through calculations like “If those eight people are like me and ordering a $2.35 coffee then that’s 8 \times 2.35 in revenue for ten minutes…”

And again, every time you get comfortable…move it up a notch.

Linearity and the Cashier’s Method

Linearity is an important concept in life and mathematics. It’s closely tied to another concept, proportionality, and in fact it’s often confused with that concept.

Two quantities are proportional if k times one quantity corresponds to k times the other.

Below are boards and their prices.  The first picture shows a 12-foot board costing $5.26, while the second shows that a 16-foot board costing $7.16.

Now 16 feet is 1 and 1/2 times as much as 12 feet, so we might ask:  Is $7.16 is 1 and 1/2 times as much as $5.26?  There are several ways we could answer this question.  Just for fun, and to build up your mathematical abilities, let’s do it without a calculator.

Here’s one way:  1 and 1/2 times as much as $5.26 is 1 times $5.26 (which would be $5.26) and 1/2 times $5.26.  There are many ways to get half of $5.26, but here’s how a cashier might do it:  half of $5 is $2.50; half of 26 cents is 13 cents.  So half of $5.26 is $2.50 and 13 cents, or $2.63.

So 1 and 1/2 times $5.26 is $5.26 + $2.63.  Again, a cashier might do it this way:  That’s $5 and $2, for $7; and 26 and 63 cents, for 89 cents.  So 1 and 1/2 times $5.26 would be $7.89.

What does this mean?  It means that if the cost of wood were proportional, then the 16-foot section would cost $7.89.  Instead, it costs $7.16, a little less.  While we might expect some deviation from an exact proportionality, the difference (76 cents, or about 10%) is rather sizable, and so we might conclude the cost of wood is not proportional.

However, it might still be linear.  In many ways, linear is an easier concept:  an extra k corresponds to an extra m.  So here, the extra 4 feet cost an extra $7.16 – 5.26.  Again, think of it like a cashier:  You have $7.16 on the table, and want to remove $5.26.  I see $7.16 as a $5, two $1s, and 16 cents.  So I remove a $5, then make change for one of the $1s to remove 26 cents.  This leaves me with a $1, 74 cents, and 16 cents:  $1.90.

To check for linearity, we need to look at another value.  In this case, an 8-foot board (not shown) will cost $3.23.

If the cost is linear, then an 8 + 4 = 12 foot board would cost $3.23 + $1.90 = $5.13.  Here the difference between what the board should cost and what it does cost is only a few cents, so we might reasonably conclude that the cost of lumber is linear.

Of course, only politicians and pundits base their conclusions on one piece of evidence.  What about a 16-foot board?  If the cost is linear, then a 12 + 4 = 16 foot board should cost $7.16 + $1.90 = $9.06, and we find it costs $9.19, which is reasonably close.

One final note:  Even if the cost of lumber is linear, we can’t expect that it will always be linear.  Since the 8-foot board costs $3.23, we’d expect a 4-foot board to cost $3.23 – $1.90 (subtract $2, return 10 cents) $1.33, and a 0-foot board would cost $1.33 – $1.90 = $ -0.57.  But if you go to the checkout with ten 0-foot boards, the cashier is not going to hand you a five spot.

And remember, lumber doesn’t grow on trees.  (Yes, I meant it…wood grows on trees, but no tree in the forest grows 16-foot boards)  A 200-foot board might cost significantly more than linearity would suggest.


Obesity, Poverty, and National Security

According to the internet, if you ate only ramen, you’d save thousands of dollars each year in food.

That sounds great, except there’s a problem: ramen lacks a wide range of essential nutrients and vitamins. You’d lose your teeth to scurvy, a lack of vitamin D would cause your bones to become brittle and easily broken, you’d suffer nightblindness from a lack of vitamin A, and you’d be tired all the time from a lack of iron and the B vitamins. In short, all the money you saved on food, and much, much, more, would be spent on increased medical care.

The problem is that eating healthy is costly. And this leads to a national security crisis.

If you want the short version, I’ve summarized the key points in a ten-minute video:

A little more mathematics:

Food buyers face what mathematicians call a constrained optimization problem: they have to meet certain caloric and nutritional goals (the constraints), which defines a feasible region. Generally speaking, any point in the feasible region defines a solution to the problem; what you want to do is to find the optimal solution.

The optimal solution is generally determined by the objective function. For example, if you lived off x packages of ramen and y eggs, the important objective function might be the total cost of your meals. At 15 cents a pack of ramen and 20 cents an egg, the objective function has the form L = 0.15x + 0.20y, and we might want to minimize the value of the objective function.

In the following, I’ll assume you want to minimize the value of the objective function; the arguments are similar if you’re trying to maximize the value (for example, if you’re designing a set of roads, you might want to maximize the traffic flow through a town center).

There’s a theorem in mathematics that says the optimal solution will be found on the boundary of the feasible region. The intuition behind this theorem is the following: Imagine any point inside the feasible region. If you change any one of the coordinates while leaving the others the same, the value of the objective function will generally change. The general idea is to move in the direction that decreases the objective function, and continue moving in that direction until you hit the boundary of the feasible region.

At this point, you can’t move any further in that direction. But you can try one of the other directions. Repeating this process allows us to find the optimal solution.

We can go further. Suppose our objective function is linear (like the cost function). Then the same analysis tells us the optimal solution will be found at a vertex of the feasible region. This suggests an elegant way to solve linear optimization problems:

  • Graph the feasible region and locate all the vertices. Generally speaking, the constraints are themselves linear functions, so (in our ramen and egg example) the feasible region will be a polygon.
  • Evaluate the objective function at each vertex,
  • Choose the vertex that minimizes the value of the objective function.

Easy, huh? Except…

  • If you have n commodities, you have to work in \mathbb{R}^{n}.
  • This means the feasible region will be some sort of higher solid.
  • This also means that finding the vertices of the feasible region will require solving systems of n equations in n unknowns.

In 1945 ,George Stigler did such an analysis to find a minimal cost diet that met caloric and nutritional requirements. To make the problem tractable, he focused on a diet consisting of just seven food items: wheat flour; evaporated milk; cabbage; spinach; dried navy beans; pancake flour; and pork liver.

“Thereafter the procedure is experimental because there does not appear to be any direct method of finding the minimum of a linear function subject to linear conditions.” The problem is that with seven items, you’re working with hyperplanes in \mathbb{R}^{7}, and the constraints will give you hundreds of vertices to check.

Note the date: 1945. What Stigler didn’t know is that there was a method for finding the minimum value easily. But that’s a story for another post…

The Most Important Letter

A question came up on Quora about what letter’s removal would have the greatest impact on the English language.  The obvious answer is “E”, since it’s by far the most common letter in English.

But let’s consider that.  Can you writ a comprhnsibl sntnc that dosnt us ths lttr?  Ys, you can!  So its not clear that “E” is all that important.

So let’s do some mathematics.  The key question is:  How much information does a given letter provide?    Consider the following:  I’m thinking of a color.  You know the color is either red, green, blue, or fuchsia.  (I have no idea what color fuchsia is…I just like the word)  Your goal is to determine the color I’m thinking of by asking a sequence of Yes/No questions.

One way you could do this is by asking “Are you thinking of red or green?”  If the answer is “Yes”, then  you might ask “Are you thinking of red?”  If the answer is “Yes”, then you know the color is red; if the answer is “No,” then you know the color is green (since I answered “Yes” to the first question).  On the other hand, if I answered “No” to the first question, then you know I was thinking of blue or fuchsia, so you might ask “Are you thinking of blue?”  A “Yes” tells you I’m thinking blue; a “No” tells you I’m thinking fuchsia.

Now reverse it.  If you know I’m thinking of the color red, then you have the answer to two Yes/No questions.  We say that “red” has an information content of two bits.

So far so good.  But suppose I’m somewhat dull and can’t think of any color other than red. In that case, you already know what color I’m thinking of, and don’t need to ask any questions.  In this situation, “red” has an information content of zero bits.

As an intermediate case, suppose that half the time I think of “red,” one-fourth the time I think of “blue”, and one-eighth the time I think of “green” and one-eighth the time I think of “fuchsia.”  Then you might ask a different sequence of questions:

  • Are you thinking of red?  (Half the time, I’ll answer  “Yes”, so the answer “red”gives you the answer to one question:  it’s 1 bit of information)
  • If the answer is “No,” then “Are you thinking of blue?”  Half the time this question is asked (remember it will only be asked if the answer to the first question is “No”), the answer will be “Yes,” so the answer “blue” gives you the answer to two questions:  it’s 2 bits of information.
  • If the answer is “No,” then the final question “Are you thinking of green?”  Again, half the time this question is asked, the answer will be “Yes,” which tells you that “green” is worth 3 bits; meanwhile, the answer “No” means I’m thinking of fuchsia, so “fuchsia” is also worth 3 bits.

It might seem difficult to determine the information content of an answer, because you have to come up with the questions.  But a little theory goes a long way.  The best question we could ask are those where half the answers are “Yes” and the other half are “No.”  What this means is that if n is the answer to the question p_{n} of the time, then the information content of the answer n will be -\log_{2} p_{n}.  Thus, if “red” is the color half the time, then “red” has an information content of -\log_{2} (1/2) = 1 bit.

So what does this mean?  “E” makes up about 12.7% of the letters in an English text.  But this means that knowing a letter is “E” answers very few questions.  So the letter E contains about 3 bits of information.  In contrast, “Z” only makes up 0.07% of the letters in an English text, so knowing a letter is “Z” answers many questions.  So the letter Z contains about 10.4 bits of information (the maximum).

At first glance, this suggests that “Z” may be the most important letter in the English language:  losing the letter “Z” will lose the most information.  However, there’s a secondary consideration:  “Z” doesn’t often appear in a text.  So every “Z” you drop from a text loses a lot of information…but you don’t drop that many.

And here’s where the greater prevalence of “E” comes in.  While the letter “E” only gives you about 3 bits of information, it’s common enough that dropping the letter “E” from a text will lose you more information overall.  For example, suppose you had a 10,000 character message.  Of these 10,000 characters, you might expect to find 7 Zs, and losing them would lose you about 77 bits of information.  In contrast, there would be almost 1300 Es, and losing them would lose about 3800 bits of information.