**Making Economic Sense**

by Murray Rothbard

(Contents by Publication Date)

by Murray Rothbard

(Contents by Publication Date)

**Chapter 6**

Statistics: Destroyed From Within?

Statistics: Destroyed From Within?

As improbable as this may seem now, I was at one time in college a statistics major. After taking all the undergraduate courses in statistics, I enrolled in a graduate course in mathematical statistics at Columbia with the eminent Harold Hotelling, one of the founders of modern mathematical economics. After listening to several lectures of Hotelling, I experienced an epiphany: the sudden realization that the entire "science" of statistical inference rests on one crucial assumption, and that that assumption is utterly groundless. I walked out of the Hotelling course, and out of the world of statistics, never to return.

Statistics, of course, is far more than the mere
collection of data. Statistical *inference* is
the conclusions one can draw from that data. In particular,
since--apart from the decennial US
census of population--we never know all the data, our conclusions must
rest on very small
samples drawn from the population. After taking our sample or samples,
we have to find a way to
make statements about the population as a whole. For example, suppose
we wish to conclude
something about the average height of the American male population.
Since there is no way that
we can mobilize every male American and measure everyone's height, we
take samples of a
small number, say 500 people, selected in various ways, from which we
presume to say what the
average American's height may be.

In the science of statistics, the way we move from our known samples to the unknown population is to make one crucial assumption: that the samples will, in any and all cases, whether we are dealing with height or unemployment or who is going to vote for this or that candidate, be distributed around the population figure according to the so-called "normal curve."

The normal curve is a symmetrical, bell-shaped
curve familiar to all statistics textbooks.
Because all samples are assumed to fall around the population figure
according to this curve, the
statistician feels justified in asserting, from his one or more limited
samples, that** **the
height of the American population, or the unemployment rate, or
whatever, *is* definitely XYZ
within a "confidence level" of 90 or 95 %. In short, if, for example, a
sample height for the
average male is 5 feet 9 inches, 90 or 95 out of every 100 such samples
will be within a certain
definite range of 5 feet 9 inches. These precise figures are arrived at
simply by assuming that all
samples are distributed around the population according to this normal
curve.

It is because of the properties of the normal curve, for example, that the election pollsters could assert, with overwhelming confidence, that Bush was favored by a certain percentage of voters, and Dukakis by another percentage, all within "three percentage points" or "five percentage points" of "error." It is the normal curve that permits statisticians not to claim absolute knowledge of all population figures precisely but instead to claim such knowledge within a few percentage points.

Well, what is the evidence for this vital assumption of distribution around a normal curve? None whatever. It is a purely mystical act of faith. In my old statistics text, the only "evidence" for the universal truth of the normal curve was the statement that if good riflemen shoot to hit a bullseye, the shots will tend to be distributed around the target in something like a normal curve. On this incredibly flimsy basis rests an assumption vital to the validity of all statistical inference.

Unfortunately, the social sciences tend to follow the same law that the late Dr. Robert Mendelsohn has shown is adopted in medicine: never drop any procedure, no matter how faulty, until a better one is offered in its place. And now it seems that the entire fallacious structure of inference built on the normal curve has been rendered obsolete by high-tech.

Ten years ago, Stanford statistician Bradley Efron
used high-speed computers to generate
"artificial data sets" based on an original sample, and to make the
millions of numerical
calculations necessary to arrive at a population estimate without using
the normal curve, or any
other arbitrary, mathematical assumption of how samples are distributed
about the unknown
population figure. After a decade of discussion and tinkering,
statisticians have agreed on
methods of practical use of this "bootstrap" method, and it is now
beginning to take over the
profession.** **
Stanford statistician Jerome H. Friedman, one of the pioneers of the
new
method, calls it "the most important new idea in statistics in the last
20 years, and probably the
last 50."

At this point, statisticians are finally willing to let the cat out of the bag. Friedman now concedes that "data don't always follow bell-shaped curves, and when they don't, you make a mistake" with the standard methods. In fact, he added that "the data frequently are distributed quite differently than in bell-shaped curves." So that's it; now we find that the normal curve Emperor has no clothes after all. The old mystical faith can now be abandoned; the Normal Curve god is dead at long last.