# Obesity, Poverty, and National Security

According to the internet, if you ate only ramen, you’d save thousands of dollars each year in food.

That sounds great, except there’s a problem: ramen lacks a wide range of essential nutrients and vitamins. You’d lose your teeth to scurvy, a lack of vitamin D would cause your bones to become brittle and easily broken, you’d suffer nightblindness from a lack of vitamin A, and you’d be tired all the time from a lack of iron and the B vitamins. In short, all the money you saved on food, and much, much, more, would be spent on increased medical care.

The problem is that eating healthy is costly. And this leads to a national security crisis.

If you want the short version, I’ve summarized the key points in a ten-minute video:

A little more mathematics:

Food buyers face what mathematicians call a constrained optimization problem: they have to meet certain caloric and nutritional goals (the constraints), which defines a feasible region. Generally speaking, any point in the feasible region defines a solution to the problem; what you want to do is to find the optimal solution.

The optimal solution is generally determined by the objective function. For example, if you lived off $x$ packages of ramen and $y$ eggs, the important objective function might be the total cost of your meals. At 15 cents a pack of ramen and 20 cents an egg, the objective function has the form $L = 0.15x + 0.20y$, and we might want to minimize the value of the objective function.

In the following, I’ll assume you want to minimize the value of the objective function; the arguments are similar if you’re trying to maximize the value (for example, if you’re designing a set of roads, you might want to maximize the traffic flow through a town center).

There’s a theorem in mathematics that says the optimal solution will be found on the boundary of the feasible region. The intuition behind this theorem is the following: Imagine any point inside the feasible region. If you change any one of the coordinates while leaving the others the same, the value of the objective function will generally change. The general idea is to move in the direction that decreases the objective function, and continue moving in that direction until you hit the boundary of the feasible region.

At this point, you can’t move any further in that direction. But you can try one of the other directions. Repeating this process allows us to find the optimal solution.

We can go further. Suppose our objective function is linear (like the cost function). Then the same analysis tells us the optimal solution will be found at a vertex of the feasible region. This suggests an elegant way to solve linear optimization problems:

• Graph the feasible region and locate all the vertices. Generally speaking, the constraints are themselves linear functions, so (in our ramen and egg example) the feasible region will be a polygon.
• Evaluate the objective function at each vertex,
• Choose the vertex that minimizes the value of the objective function.

Easy, huh? Except…

• If you have $n$ commodities, you have to work in $\mathbb{R}^{n}$.
• This means the feasible region will be some sort of higher solid.
• This also means that finding the vertices of the feasible region will require solving systems of $n$ equations in $n$ unknowns.

In 1945 ,George Stigler did such an analysis to find a minimal cost diet that met caloric and nutritional requirements. To make the problem tractable, he focused on a diet consisting of just seven food items: wheat flour; evaporated milk; cabbage; spinach; dried navy beans; pancake flour; and pork liver.

“Thereafter the procedure is experimental because there does not appear to be any direct method of finding the minimum of a linear function subject to linear conditions.” The problem is that with seven items, you’re working with hyperplanes in $\mathbb{R}^{7}$, and the constraints will give you hundreds of vertices to check.

Note the date: 1945. What Stigler didn’t know is that there was a method for finding the minimum value easily. But that’s a story for another post…

# The Most Important Letter

A question came up on Quora about what letter’s removal would have the greatest impact on the English language.  The obvious answer is “E”, since it’s by far the most common letter in English.

But let’s consider that.  Can you writ a comprhnsibl sntnc that dosnt us ths lttr?  Ys, you can!  So its not clear that “E” is all that important.

So let’s do some mathematics.  The key question is:  How much information does a given letter provide?    Consider the following:  I’m thinking of a color.  You know the color is either red, green, blue, or fuchsia.  (I have no idea what color fuchsia is…I just like the word)  Your goal is to determine the color I’m thinking of by asking a sequence of Yes/No questions.

One way you could do this is by asking “Are you thinking of red or green?”  If the answer is “Yes”, then  you might ask “Are you thinking of red?”  If the answer is “Yes”, then you know the color is red; if the answer is “No,” then you know the color is green (since I answered “Yes” to the first question).  On the other hand, if I answered “No” to the first question, then you know I was thinking of blue or fuchsia, so you might ask “Are you thinking of blue?”  A “Yes” tells you I’m thinking blue; a “No” tells you I’m thinking fuchsia.

Now reverse it.  If you know I’m thinking of the color red, then you have the answer to two Yes/No questions.  We say that “red” has an information content of two bits.

So far so good.  But suppose I’m somewhat dull and can’t think of any color other than red. In that case, you already know what color I’m thinking of, and don’t need to ask any questions.  In this situation, “red” has an information content of zero bits.

As an intermediate case, suppose that half the time I think of “red,” one-fourth the time I think of “blue”, and one-eighth the time I think of “green” and one-eighth the time I think of “fuchsia.”  Then you might ask a different sequence of questions:

• Are you thinking of red?  (Half the time, I’ll answer  “Yes”, so the answer “red”gives you the answer to one question:  it’s 1 bit of information)
• If the answer is “No,” then “Are you thinking of blue?”  Half the time this question is asked (remember it will only be asked if the answer to the first question is “No”), the answer will be “Yes,” so the answer “blue” gives you the answer to two questions:  it’s 2 bits of information.
• If the answer is “No,” then the final question “Are you thinking of green?”  Again, half the time this question is asked, the answer will be “Yes,” which tells you that “green” is worth 3 bits; meanwhile, the answer “No” means I’m thinking of fuchsia, so “fuchsia” is also worth 3 bits.

It might seem difficult to determine the information content of an answer, because you have to come up with the questions.  But a little theory goes a long way.  The best question we could ask are those where half the answers are “Yes” and the other half are “No.”  What this means is that if $n$ is the answer to the question $p_{n}$ of the time, then the information content of the answer $n$ will be $-\log_{2} p_{n}$.  Thus, if “red” is the color half the time, then “red” has an information content of $-\log_{2} (1/2) = 1$ bit.

So what does this mean?  “E” makes up about 12.7% of the letters in an English text.  But this means that knowing a letter is “E” answers very few questions.  So the letter E contains about 3 bits of information.  In contrast, “Z” only makes up 0.07% of the letters in an English text, so knowing a letter is “Z” answers many questions.  So the letter Z contains about 10.4 bits of information (the maximum).

At first glance, this suggests that “Z” may be the most important letter in the English language:  losing the letter “Z” will lose the most information.  However, there’s a secondary consideration:  “Z” doesn’t often appear in a text.  So every “Z” you drop from a text loses a lot of information…but you don’t drop that many.

And here’s where the greater prevalence of “E” comes in.  While the letter “E” only gives you about 3 bits of information, it’s common enough that dropping the letter “E” from a text will lose you more information overall.  For example, suppose you had a 10,000 character message.  Of these 10,000 characters, you might expect to find 7 Zs, and losing them would lose you about 77 bits of information.  In contrast, there would be almost 1300 Es, and losing them would lose about 3800 bits of information.