'science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation.'

The quote in the headline is direct from this article in Science News for which I’ve posted an excerpt below. I found this article interesting for two reasons. 1- It challenges use of statistical methods that have come into question in climate science recently, such as Mann’s tree ring proxy hockey stick and the Steig et al statistical assertion that Antarctica is warming. 2- It pulls no punches in pointing out an over-reliance on statistical methods can produce competing results from the same base data. Skeptics might ponder this famous quote:

“If your experiment needs statistics, you ought to have done a better experiment.” – Lord Ernest Rutherford

There are many more interesting quotes about statistics here.

– Anthony

UPDATE: Luboš Motl has a rebuttal also worth reading here. I should make it clear that my position is not that we should discard statistics, but that we shouldn’t over-rely on them to tease out signals that are so weak they may or may not be significant. Nature leaves plenty of tracks,  and as Lord Rutherford points out better experiments make those tracks clear. – A

==================================

Odds Are, It’s Wrong – Science fails to face the shortcomings of statistics

By Tom Siegfried

March 27th, 2010; Vol.177 #7 (p. 26)

P valueA P value is the probability of an observed (or more extreme) result arising only from chance. S. Goodman, adapted by A. Nandy

For better or for worse, science has long been married to mathematics. Generally it has been for the better. Especially since the days of Galileo and Newton, math has nurtured science. Rigorous mathematical methods have secured science’s fidelity to fact and conferred a timeless reliability to its findings.

During the past century, though, a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.

It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.

Replicating a result helps establish its validity more securely, but the common tactic of combining numerous studies into one analysis, while sound in principle, is seldom conducted properly in practice.

Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals. Over the years, hundreds of published papers have warned that science’s love affair with statistics has spawned countless illegitimate findings. In fact, if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature.

“There is increasing concern,” declared epidemiologist John Ioannidis in a highly cited 2005 paper in PLoS Medicine, “that in modern research, false findings may be the majority or even the vast majority of published research claims.”

Ioannidis claimed to prove that more than half of published findings are false, but his analysis came under fire for statistical shortcomings of its own. “It may be true, but he didn’t prove it,” says biostatistician Steven Goodman of the Johns Hopkins University School of Public Health. On the other hand, says Goodman, the basic message stands. “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”

Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense.”

====================================

Read much more of this story here at Science News

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

238 Comments
Inline Feedbacks
View all comments
Steve Goddard
March 22, 2010 9:12 pm

Anthony,
I don’t need a Stevenson screen, because it all averages out with Monte Carlo statistics. ;^)
Seriously though, it makes no difference which way I am riding, day or night – it is always warmer downtown.
REPLY: True dat. I made the west to east transect drive on Colfax Ave once with a car that had a thermometer. Same result -A

Editor
March 22, 2010 9:20 pm

Steve Goddard (12:19:23)

Willis,
Drop it, please. Fort Collins has grown much faster than Boulder, particularly around the weather station.

When people start asking someone to please “drop it”, most people can draw the obvious conclusion … This is particularly true when Boulder grew faster than Fort Collins for 30 years, from 1940 to 1970, but there was no effect on the increasing difference between the two stations … so your hypothesis cannot be proven in the way that you have chosen. If your theory were correct, the temperature difference should have shown a big jump starting in 1970 … but it did no such thing.
Note that this is does not mean that your hypothesis is false. It just means that you can’t prove it by population figures as you claim. Now if you can show logarithmically increasing growth from 1897 to 2010 around the Fort Collins site, and no corresponding growth around the Boulder site, you might have something. Until then, it’s just math-free handwaving.

Steve Goddard
March 22, 2010 9:34 pm

Willis,
I am not sure why you are invoking the third person in your “most people” claim. The fact that you are having difficulty seeing something is your own business.
This overlay of the 1895-2008 temperature records makes it painfully obvious that somewhere between 1950 and 1975 a divergence started and has accelerated since. This corresponds to Fort Collins period of rapid growth. Boulder population has actually decreased during the last decade.
http://docs.google.com/View?id=ddw82wws_468cpnbv7fd
And as Tom Moriarity pointed out, the Boulder station faces open space on one side, meaning it is less affected by population growth than the Fort Collins station – which is downtown.

Steve Goddard
March 22, 2010 9:46 pm

Willis,
I’m not sure why it isn’t clear to you, but the points of the article are:
1. Fort Collins temperatures have risen strongly in correspondence to rapid growth of the city. Fort Collins has increased in size by 300% over the last few decades.
2. Boulder temperatures have risen much less, and the city has grown much less. Boulder has grown less than 50% during that same time period.
You claim to be doing some sort of precise mathematics, but at the same time you chose to arbitrarily subtract about two degrees from all post-1941 Boulder temperatures, to try to prove your point.

Editor
March 23, 2010 2:05 am

Steve Goddard (21:46:20) : edit

Willis,
I’m not sure why it isn’t clear to you, but the points of the article are:
1. Fort Collins temperatures have risen strongly in correspondence to rapid growth of the city. Fort Collins has increased in size by 300% over the last few decades.
2. Boulder temperatures have risen much less, and the city has grown much less. Boulder has grown less than 50% during that same time period.

When we look at the Fort Collins minus Boulder dataset, we see:
Trend 1942-1970 = 0.2°C/decade
Trend 1970-present = 0.2°C/decade
Therefore, there is no change in 1970, despite the pre- and post-1970 differences in the growth of the cities.
Therefore, the difference in the trends and the difference in the growth of the cities are not related.

You claim to be doing some sort of precise mathematics, but at the same time you chose to arbitrarily subtract about two degrees from all post-1941 Boulder temperatures, to try to prove your point.

I haven’t a clue what you are talking about. I have not subtracted two degrees from post-1941 Boulder temperatures, I haven’t touched them at all. I adjusted the difference dataset (Fort Collins – Boulder) by 0.6°C in January 1941, to correct for a mathematically identified discontinuity. Period. End of story. No other adjustments. Your two degree adjustment doesn’t exist.

Editor
March 23, 2010 2:12 am

Steve Goddard (21:34:37) : edit

… This overlay of the 1895-2008 temperature records makes it painfully obvious that somewhere between 1950 and 1975 a divergence started and has accelerated since. This corresponds to Fort Collins period of rapid growth. Boulder population has actually decreased during the last decade.
http://docs.google.com/View?id=ddw82wws_468cpnbv7fd

Oh, great, you’re back to “look at this diagram, it’s painfully obvious” … no, it’s not obvious at all.
We are discussing the difference between the two city temperatures. If you want to graph something, graph what we are discussing – graph the difference between the two city temperatures, 1897 to present.
Then point to where the trend starts.
I await your graph …

Steve Goddard
March 23, 2010 5:44 am

Willis,
I have posted the difference graph you are asking for many times in the last two weeks, in posts specifically directed at you.
https://spreadsheets.google.com/oimg?key=0AnKz9p_7fMvBdElxNDA4Vlh2OGhvOUdEX1N0bm1CeWc&oid=2&v=1269347628911
The trend from 1895-1965 is 0.011 with low significance
The trend from 1966-2008 is 0.031 with high significance
The UHI effect is very apparent during the last 40-50 years.
OTOH, you wanted a to prove a linear trend through the entire series, and in order to do that you made an adjustment to all Boulder post 1941 data. How very Hansenesque.

Steve Goddard
March 23, 2010 7:54 am

Willis,
Hopefully we can agree on these points.
1. Temperatures in Fort Collins have increased much more than they have in Boulder over the last 50 years.
2. Temperatures in Fort Collins started increasing rapidly about 40-50 years ago.
3. Population in Fort Collins has increased much more than it has in Boulder over the last 50 years.
4. As Tom Moriarty pointed out, the Boulder station is probably less sensitive to population growth than the Fort Collins station, due to it’s proximity to open space.
You are attempting to do very precise math to prove a long term trend based on a major post-1941 correction of your own device. This in itself is a mistake (the trend doesn’t exist) but just as bad – the Boulder station history shows many moves and changes prior to 1980 which puts any sort precision in the trash bin. Your analysis is flawed and you are missing the forest for the tree.

Charlie Barnes
March 23, 2010 10:10 am

I’ve read the article but not all of the responses – so this might be duplication.
I agree with the thrust of the article in that there is much misuse and misinterpretation of statistical analysis in research, usually but not only, by those with inadequate training, experience or even motivation to ‘think statistically’ (a little knowledge can be a dangerous thing). The apparent belief that a P-value of 5% (or any other value) somehow represents a hard and fast dividing line between truth and falsehood is a major difficulty, but it is by no means the only one in reaching valid conclusions.
However, I don’t think that this means that the statistical approach should be abandoned. By analogy, the (supposed) principle of English law is that an accused person is innocent until ‘proven guilty’. In a trial, the court hears the evidence and then either acquits the defendant or finds him or her guilty, beyond reasonable doubt; that is, rejects the null hypothesis of innocence.
There are thus two correct outcomes to the trial – if actually innocent, the accused is acquitted or, if actually guilty, the accused is convicted. There are also two possible incorrect outcomes; that is, an innocent person is wrongly convicted or a guilty person goes free.
Society generally views either of these latter two outcomes as undesirable. A solicitor of my acquaintance (social not professional!) argued vehemently that the trial process should be such that an innocent person could never be convicted (equivalent to a P-value of zero). He refused to accept that the only way to ensure this would happen, would be to acquit everybody whatever the evidence. This would mean, as a consequence, that there would be no need for any of the trappings of the current criminal justice system. The lynch mob would reign supreme.
Against that scenario, civilised society would probably prefer the current justice system with its equivalent of non-zero P-values and the occasional acquittal of felons (corresponding to less than 100% power of a statistical test).
The difficulty lies not with statistical methods of analysis per se (and of the design of the associated data collection process), whether frequentist or Bayesian, but with the many users who have inadequate knowledge of the shortcomings of the techniques and the related pitfalls. It is not easy to see how the situation can be improved without throwing away the benefits of properly applied statistical design of data collection and analysis. The ready availability of powerful software makes things worse because it reduces the need for the analyst to think very much about their analysis and its validity.
A related but separate issue is the deliberate misuse and falsification of the interpretation of observational data. Could this apply to the topics of man-made global warming, its averred consequential climate change and the dire predictions of its ‘inevitable effects?
Charlie Barnes
P.S. A slight nit-pick – probability (as indicated by the shaded P-value is measured by area under the curve between two horizontal (abscissa) values. The height of the curve refers to the probability density (function) rather than probability itself. CB.

George E. Smith
March 23, 2010 10:36 am

“”” Brian G Valentine (17:47:01) :
Hi George how are you? Been a while.
Statistics are only properly applied to “random” processes in some way, or applied to determine what, if any, meaning “random” has. That’s all they are. “””
Hey Brian, I’ve been noticing your shingle pop up now and then; and meaning to contact you. With the help of some very nice helpful folks, I have been slowly refreshing a lot of what I forgot from 50 years of lack of use.
Prof Will Happer at Princeton, has been particularly helpful and gracious.
The more I get into this, the more convinced I become, that “It’s the Water !”
I never thought I would have to relearn Quantum Mechanics, just to go outside, and see if it is cloudy or humid.
You might remember I once asked you how the hell the CO2 band at 15 microns came up as a comb of equally spaced lines; and you replied it was “harmonics”.
Silly me, never dreamed that the molecular energy levels would also be quantized (duh!). As I told Prof Happer; I quit chemistry about one year too soon; to focus on the Physics, and Radio-Physics.
If you still have my e-mail address, drop me a line; I’ll look in the archives for yours.
George

phlogiston
March 23, 2010 6:54 pm

This article sounds like it is written by someone fundamentally anti-science – no shortage of those in the “intelligentsia”. The author expresses shock at the discovery that 95% probability of something being true means 5% chance of it NOT being true. His jowl-flapping indignation at the basic principles of probability is given no coherent factual basis.
Application of statistics to the scientific method arises from the complexity of natural phenomena and the need to assess significance of observations in the face of such complexity. When studying factor “A”, the role of all the other factors “B …Z” must either be excluded by experimental design or allowed for in the statistical calculation of the results strength (plus initial calculation of the study design, number of subjects needed etc.). This attack on statistics in science seems to reflect an absence of basic understanding of or curiosity towards natural phenomena and their highly complex nature.
And the philosophical arguments against statistics in science appear to be on a level with the famous quote from Douglas Adams’ “Life the Universe and Everything” – the conversation among the philosopher custodians of the deep thought computer (which found the answer to life the universe and everything to be 42 but gave no +/- error bars):
“We’re philosophers. But we might not be!”

Steve Goddard
March 23, 2010 8:48 pm

One particularly important area of statistics is the use of gaussians from Monte Carlo simulations. Many areas of science, engineering, medicine and business could not survive without top notch random number generators. In high school I heard one of the famous Polish mathematicians say “there is no such thing as a perfect random number generator, and even if there was it would be impossible to prove it.”
Even a lousy random number generator would do better than the Met Office seasonal predictions. Why? Because they always predict warming during a period of cooling.

George E. Smith
March 24, 2010 11:02 am

Steve,
Arguably, with a perfect “random number” generator, it would be impossible to predict what the next number generated would be. No matter how long a sdequence of numbers had already been generated; nothing would give you a clue as to the next number.
Thinking of Gaussian White Noise, as being a sequence of random numbers, that have a Gaussian distribution; one can then make the argument that no signal contains more information than Gaussian White Noise; there is no redundancy in such a signal at all, so it is nothing but information about the signal.
Of course it is also totally useless information; but information nonetheless.

1 8 9 10