'science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation.'

The quote in the headline is direct from this article in Science News for which I’ve posted an excerpt below. I found this article interesting for two reasons. 1- It challenges use of statistical methods that have come into question in climate science recently, such as Mann’s tree ring proxy hockey stick and the Steig et al statistical assertion that Antarctica is warming. 2- It pulls no punches in pointing out an over-reliance on statistical methods can produce competing results from the same base data. Skeptics might ponder this famous quote:

“If your experiment needs statistics, you ought to have done a better experiment.” – Lord Ernest Rutherford

There are many more interesting quotes about statistics here.

– Anthony

UPDATE: Luboš Motl has a rebuttal also worth reading here. I should make it clear that my position is not that we should discard statistics, but that we shouldn’t over-rely on them to tease out signals that are so weak they may or may not be significant. Nature leaves plenty of tracks, and as Lord Rutherford points out better experiments make those tracks clear. – A

==================================

Odds Are, It’s Wrong – Science fails to face the shortcomings of statistics

By Tom Siegfried

March 27th, 2010; Vol.177 #7 (p. 26)

P_value — P valueA P value is the probability of an observed (or more extreme) result arising only from chance. S. Goodman, adapted by A. Nandy

For better or for worse, science has long been married to mathematics. Generally it has been for the better. Especially since the days of Galileo and Newton, math has nurtured science. Rigorous mathematical methods have secured science’s fidelity to fact and conferred a timeless reliability to its findings.

During the past century, though, a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.

It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.

Replicating a result helps establish its validity more securely, but the common tactic of combining numerous studies into one analysis, while sound in principle, is seldom conducted properly in practice.

Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals. Over the years, hundreds of published papers have warned that science’s love affair with statistics has spawned countless illegitimate findings. In fact, if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature.

“There is increasing concern,” declared epidemiologist John Ioannidis in a highly cited 2005 paper in PLoS Medicine, “that in modern research, false findings may be the majority or even the vast majority of published research claims.”

Ioannidis claimed to prove that more than half of published findings are false, but his analysis came under fire for statistical shortcomings of its own. “It may be true, but he didn’t prove it,” says biostatistician Steven Goodman of the Johns Hopkins University School of Public Health. On the other hand, says Goodman, the basic message stands. “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”

Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense.”

====================================

Read much more of this story here at Science News

0 0 votes

Article Rating

238 Comments

mark in austin

March 20, 2010 12:23 am

hmm…interesting. i wish there were a bit more details so we gained some tools in our tool belt. this is too vague to do anything with beyond speculate. maybe some rigorous statistical analysis would be helpful? ; )

supercritical

March 20, 2010 12:36 am

Well, it looks like Francis Bacon (the inventor of Science) warned of this problem.
“mathematics …. ought only to give definiteness to natural philosophy, not to generate or give it birth. From a natural philosophy pure and unmixed, better things are to be expected.“

Phillip Bratby

March 20, 2010 12:44 am

Typo: It should be “Steig et al”

Paul Z.

March 20, 2010 12:56 am

Dear Friends of Truth,
Please do not allow our governments to ram down our throats a new carbon tax and emmissions trading ponzi scheme, that is based on the pseudoscience of the IPCC. How can you trade something you can’t see or hold?
The reality is, despite the daily lies spouted in the mainstream media, there is no conclusive evidence of man-made global warming caused by CO2 (a harmless gas plants need to make food). The only conclusive evidence we have is of IPCC-linked scientists, bankers, and politicians who all have their hands in the cookie jar of carbon commissions.
I ask you to judge these proponents of man-made global warming on these three simple rules:
1) Tell the truth.
2) Don’t hide or spin the truth.
3) Admit and take responsibility for your errors.
Have any of these following individuals fulfilled these three simple rules?
– Al Gore
– Rajendra Pachauri
– Michael Mann
– Phil Jones
– Kevin Rudd
– Ed Miliband
– Barack Obama
Have these people shown integrity and responsibility in the conduct of their affairs? Should we base the entire overhaul of our economic system and way of life on the words of these people?
When I was young, I was taught this and it still rings true today:
“Your word is your bond,
Once broken, the trust is gone.”
Please do not allow the greedy bankers, politicians, yes-man scientists, and unelected UN bureaucrats to dictate how we should live our lives.
Already, the UN secretary Ban Ki-Moon is pushing for global carbon taxation. THEY ARE TRYING TO SNEAK THIS UNDER OUR NOSES. For example, see here:
http://www.theeastafrican.co.ke/news/IMF%20proposes%20climate%20change%20kitty%20/-/2558/878408/-/yco3d3z/-/
The rich and wealthy people don’t care. Laws that affect the middle class don’t apply to them. “Let them eat carbon” for all they care. Remember, most of these rich people own the hedge funds and venture capital that are all heavily invested in “green” technology — and they are lobbying hard for carbon emmissions trading because they are going to make a lot of money at the expense of middle-class taxpayers.
The issue of global warming was never about saving the environment. It’s all about scaring, extorting, and controlling middle class taxpayers. For the UN, this carbon emmissions ponzi scheme is the perfect cash cow to fund their New World Order agenda–no need for accountability and cannot be prosecuted by law.
Even worse, they are now attempting the unforgivable, which is to brainwash and indoctrinate our children into believing the global warming hogwash. Hitler youth, anyone? You can try and scare me but HANDS OFF MY KIDS. Growing up is hard enough, they don’t need the added burden and guilt of a false idealogy.
It’s time to close the IPCC. It’s time to close the UN. Warn everyone you know about how the UN is hijacking our democracy and pushing their one world government agenda, so they can one day control the masses as they wish. So much power in the hands of a few unelected people, how can abuse and corruption not take place?
PLEASE BE VIGILANT. Our way of life is currently under serious threat from this unelected clique of elites under the guise of man-made global warming. THEY ARE TRYING AND WILL KEEP TRYING TO SNEAK CARBON TAXATION AND CARBON EMISSIONS TRADING UNDER OUR NOSES.
Please write to or call your representatives and tell them that you do not accept the pseudoscience of the IPCC/UN. Call for independent inquiries into the various AGW-related scandals errupting now but conveniently being swept under the carpet. Ask the AG to investigate Al Gore for fraud. Vote out any representative who continues to push this false religion of AGW.
Thank you.
Special note on Obama: When Obama was first elected, I had great hopes for him and his administration. In recent times, it has dawned on me that he is just a self-serving politician no different from Al Gore, out to make a quick buck at the expense of American taxpayers. Don’t be fooled by the current healthcare reform nonsense–this is just a smokescreen for Obama and his wealthy patrons’ real agenda: to push through a carbon emissions trading system in the US. Note how Obama tries to stay away from this issue at the same time instructing the EPA to regulate CO2. Further note how Obama played a crucial role in founding the Chicago Climate Exchange, of which now Rajendra Pachauri and Maurice Strong are board members.

Ibrahim

March 20, 2010 1:03 am

There’s a new joke in the statistics world:
lies, big lies, statistics and taminos….

FatBigot

March 20, 2010 1:05 am

On any issue of cause-and-effect the position is either than A will cause B or that A will not cause B. It might be that A will cause B provided certain external factors are in place or that A will not cause B unless certain external factors are absent. But the position is always either yes or no, it is not and never can be a percentage.
One can collate a list containing all possible scenarios starting with “A happens, will it cause B?”, then “A happens and external factor Z is present, will A cause B?” (and so on through the alphabet), then “A happens with external factors Z & Y present, will A cause B?” (and so on for all possible combinations of external factors). The answer is always yes or no even if we cannot assess whether it is yes or no.
To say there is a 25%, 50%, 75% or even 99% chance is to say “we don’t know”.
The answer can only be “yes 75% of the time but no 25% of the time” if one can identify that which causes three quarters of occasions to give an affirmative response and one quarter to give a negative, and that can only be done by further refining the external factors. One then has a more detailed analysis and a longer list of yeses and noes.
For so long as the answer to a cause-and-effect question is qualified with a percentage (whether it is expressed as “we expect it to happen 75% of the time” or “we are 75% sure it will happen every time”), the truthful answer is “we don’t know”.
My unscientific mind tells me that is the position in relation to every issue about the effect of increased atmospheric carbon dioxide on climate.
Incidentally, at one time I was required to give statistical responses. There is a scheme for allowing people of limited means to pursue litigation by having their legal costs paid by “Legal Aid”. In order for a civil claim to be allowed to be funded at taxpayers’ expense it used to be necessary to have a formal written opinion from a barrister approving the use of public funds, which is where I became involved. At one time about fifteen or twenty years ago they required the chance of success to be expressed as a percentage, 75% or higher and you got funding.
It was utterly absurd because there were so many variables, the most obvious of which were – how likely was it that your client was telling the truth, how many supporting witnesses would have to be believed for him to win, which witnesses were likely to be believed, how would your client come across compared to the opposing litigant and who was the judge going to be? It was impossible to give accurate weight to any of those variables so any percentage view expressed was completely meaningless. I refused to play the game without qualifying my opinions to reflect the variables applicable to the particular case.
And you know what? However much I qualified my opinions, funding was granted if the figure 75 featured anywhere in my writing and refused every time that figure did not appear. There was never a case in which I said there was a 75% chance of success. A civil service clerk was assigned to read opinions and look for “75” and then “ding” the bell rang and money was made available. It was completely absurd but a fine example of how box-ticking bureaucracy works.

Daniel H

March 20, 2010 1:06 am

Here’s a quote from a man who knows a lot about the value of using statistics to influence public policy:

Fundamentally, the frequentist paradigm assumes that the underlying probability distribution is known and asks whether our observations are consistent with the known distribution. In reality, the underlying distribution is unknown (or only partially known), yet we want to know whether our hypothesis is likely to be true based on our observations — which are often incomplete. Thus, determining the likelihood of our hypothesis is easier said than done. An alternative is to use Bayesian, or subjective, probabilities that compile all the information we can possibly bring to bear on the problem, including, but not limited to, direct measurements and statistics on various components of the problem. Use of these methods can be extremely controversial. Some frequentist die-hards believe that if we can’t measure it directly, it isn’t science, what I playfully call “the tyranny of the null hypothesis.” However, the belief that the frequentist paradigm is superior to the subjective paradigm is epistemological advocacy; in short, a bias. In fact, dogmatic adherence to a frequentist paradigm limits the dissemination of valuable expert judgment that doesn’t fit into conventional evaluation of scientific knowledge, yet is crucial information for both scientific understanding and social processes like-decision making.

— Dr. Stephen H. Schneider, February 2005
http://stephenschneider.stanford.edu/Mediarology/MediarologyFrameset.html

Stan

March 20, 2010 1:11 am

Interesting that this rise in scientific reliance on statistics has coincided with the rise of “social sciences” – which rely entirely on statistics.
Personally, I don’t have much time for social science which I consider to be barely short of quackery. This alone wouldn’t be so bad if it was just gullible individuals being taken in by snake oil doctors – it’s their money, they can waste it how they like – but governments increasingly use social science (and their dubious reports) to justify various expensive policies – and this is particularly happening with global warming – and governments pay for these projects with our money.

Adam Gallon

March 20, 2010 1:23 am

The much-vaunted “p” value is highly likely to be much abused.
Some years back, when working as a medical rep, I talked to a Professor of Cardiology.
Medical trials invariably use p values, with p<.05 being "significant".
He said that the .05 level was used for sudies with only a few tens of subjects, the more subjects in the study, the higher the significance value needed to be considered "proof"
Lies, damned lies & statistics?

Roger Knights

March 20, 2010 1:32 am

MODS! TYPO in article:
6th line, change “Stieg” to “Steig”

JER0ME

March 20, 2010 1:45 am

As a born mathematician, I have revelled in many aspects of the the subject. I must admit that imaginary numbers and the like made me feel a bit queasy, but I ‘took it like a man’, and accepted it all in the end once I saw the benefit of something that seem so wrong.
But statistics? I have never, ever, been even slightly comfortable with them. It is all so easy to manipulate, even for very bright people. I have enormous respect for those that can delve into this area and come out with any kind of truth. It is all so easy to be misled, and, indeed, to mislead.
I have a strong belief that mathematics is a pure subject in its own right. It also forms the basis, or foundation, for physics. Without mathematics we cannot accurately describe the physical world.
Chemistry then rests on top of Physics, as we eventually find that we canny explain or describe Chemistry without Physics. So further, Biology rests in exactly the same way on Chemistry, we also find.
Where does statistics come into the equation? Pretty much nowhere IMO.
Of course, it is almost certainly possible to prove me wrong … with statistics…

Leif Svalgaard

March 20, 2010 1:48 am

It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation
Clearly not written by a scientist. We validate a hypothesis by its predictions or explanatory power or even ‘usefulness’ [even if actually not the correct one – e.g. the Bohr atom]. Statistics is only used as a rough guide to whether the result is worth looking into further. Now, if a prediction has been made, statistics can be used as a rough gauge of how close to the observations the prediction came, but the ultimate test is if the predictions hold up time after time again. This is understood by scientists, but often not by Joe Public [his dirtiest secret perhaps 🙂 ].

Richard Graves

March 20, 2010 2:03 am

Watts new? Lies.Damn lies.Statistics
? Old saying I thought.

John Peter

March 20, 2010 2:11 am

May be slightly off topic but perhaps still relevant. As I speak Leif Svalgaard’s original tongue I found this on Danish newspaper Berlingske Tidende:
http://www.berlingske.dk/verden/delstater-sender-usas-klima-til-domstol
Essentially the article states that at least 15 US states have raised court action against the USA Government over its demands that they reduce the emission of CO2 and other greenhouse gasses. The states accuse the EPA of basing their rules on erroneous information from the UN’s Climate Panel. The states say that if the EPA does not re-evaluate its analysis they will get a court of law to stop the new regulations etc.
Not heard about proposed court action by the states against the federal government before. Maybe this would be worth further investigation and perhaps a separate thread. Perhaps the misuse of statistics could be a key ingredient in the battle of US states against the Federal Government and EPA.

Vuk etc.

March 20, 2010 2:13 am

“Lies, damned lies, and statistics” is a phrase describing the persuasive power of numbers, particularly the use of statistics to bolster weak arguments, and the tendency of people to disparage statistics that do not support their positions.
The term was popularized in the United States by Mark Twain (among others), who attributed it to the 19th Century British Prime Minister Benjamin Disraeli (1804-1881): “There are three kinds of lies: lies, damned lies, and statistics.” However, the phrase is not found in any of Disraeli’s works and the earliest known appearances were years after his death. Other coiners have therefore been proposed. The most plausible, given current evidence, is Charles Wentworth Dilke (1843-1911).
http://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics

JEF

March 20, 2010 2:15 am

One issue is the effect size of your findings. If you have large dataset, you have the power to detect differences. But there’s a practical aspect to this also. Even though an effect size is small, but significant, how important is it in the scheme of thing.

John Edmondson

March 20, 2010 2:20 am

“Lies, damned lies and statistics”

Pingo

March 20, 2010 2:23 am

(I’m not a statistician)

Mike Haseler

March 20, 2010 2:28 am

I’m sorry this article really is about micky mouse statistics. The real temperature signal is a complex time dependent signal with time dependent noise and time dependent potential drivers (CO2, methane). Moreover, the length of the signal is quite inadequate to distinguish easily between “normal” natural variation and “abnormal” – and this kind of statistical analysis is entirely the wrong way to approach such a complex subject.
The proper way to analyse this situation, is to characterise the “normal” (pre-CO2) signal in terms of the frequency distribution of the natural variation (which approximates to 1/f^n noise) and then to compare this “normal” signal to the signal under test (the post CO2 signal) and see whether the frequency components of the post CO2 signal is statistically inconsistent with the normal signal.
Where this approach differs dramatically from the noddy statistics used by climate alarmists, is that they take a short-term signal, work out the “variation” based on Gaussian (white/ none-frequency dependent) noise and then in a quite delusional way say: “look the temperature has gone above the ‘normal’ variation”. This is hocus pocus BS. 1/f noise will always exceed this bogus ‘normal’ variation, because in 1/f type noise the long-term variation is always much greater than the short-term noise, so the short-term measurement of variation is wholly biased toward the much smaller short-term variation, and fails to account for the fact that each time you make a longer-term sample, it will show more variation, and hence longer periods will always appear to have shifted from this mickey mouse “norm”.
To restate that another way: if you sample the climate and wait a fairly short time, the climate will always exceed the (noddy statistics) ‘normal’ variation.
Now that’s the theory, now if only someone could tell me how to calculate the Fourier-statistical analysis of short time series, ….

TinyCo2

March 20, 2010 2:40 am

Another example of how the scientific process is being corrupted came out this week. A report in the BMJ (British Medical Journal) highlighted the problems of undeclared financial interests in a particular drug or it’s rival when they expressed views about it. It all sounds depressingly familiar.
http://www.independent.co.uk/life-style/health-and-families/health-news/glaxo-funded-backers-of-danger-drug-1923852.html
Writing in the British Medical Journal, the Mayo Clinic researchers say: “In the heat of the [drug] controversy, patients and clinicians alike were exposed to many arguments on both sides of the debate. How could interpretation of the same evidence result in disparate and impassioned positions? We aimed to determine whether financial conflicts of interest with pharmaceutical manufacturers could be fuelling this fire. From our findings, it appears that the answer is yes.”
Mohammad Murad, assistant professor of medicine, who led the study, said yesterday he had been disappointed at the low rates of disclosure of financial conflicts of interest, given the clear link between them and the authors’ views. “This thing [the influence of financial links] could be subconscious,” he said. “We are not saying it is necessarily deliberate. But the implication is that there should be better disclosure. People [with financial links to the companies] should realise they are probably biased and as readers we should be aware of probable bias.”

Willis Eschenbach

Editor

March 20, 2010 2:41 am

An excellent article. My rule of thumb is, if you can’t see it in a graph, it’s not real. Which is much like the quotation above, “If your experiment needs statistics, you ought to have done a better experiment.” – Lord Ernest Rutherford.
One place where statistics are egregiously misused in climate science is in the search for the ever-elusive “fingerprint” of the postulated human effect on the climate. People search assiduously through a variety of datasets, looking for some proof that humans are affecting the climate. Occasionally, someone finds one that is significant at the p < 0.05 level.
What’s the problem with that? Well, as you look through datasets, the odds of finding one with a spurious (occurring by random chance) p value less than 0.05 go up rapidly. Here’s the odds of finding a spurious “fingerprint” by random chance given how many datasets you’ve examined:
1 dataset, 5% (as you’d expect, that’s what p < 0.05 means)
2 datasets, 10%
3 datasets, 14%
4 datasets, 19%
5 datasets, 23%
6 datasets, 26%
7 datasets, 30%
8 datasets, 34%
9 datasets, 37%
10 datasets, 40%
11 datasets, 43%
12 datasets, 46%
Once you’ve looked at a dozen datasets, it’s almost fifty/fifty that you’ll find a spurious result. Given the hundreds of scientists out there examining datasets looking for the “fingerprint” …

Paul Vaughan

March 20, 2010 2:49 am

Basing judgement on laughably untenable assumptions is not sensible. Smirking high-priests of the Church of Statistics have the wool well over the uncritical eyes of the publishing masses.
“Too much of this p-value stuff.” (anonymous statistician devoted to common sense)

Sou

March 20, 2010 2:52 am

Statistics are indispensible for much scientific research, but a lot of researchers are not specialists in this field. The general public has much less of an understanding. Most in the media have almost no understanding as evidenced by their reporting of health matters for instance.
Quality research institutions employ specialist statisticians and biometricians who provide quality control of statistics in research projects, to make sure experiments are properly designed, analyses are robust and conclusions are sound.
(Amateur statisticians are the cause of much confusion and misinformation in climate science.)

toyotawhizguy

March 20, 2010 3:00 am

My Calculus prof taught:
“The biggest liars in this world are politicians and statistics”.
Putting that aside, when properly used, statistics help us to comprehend our data. It’s when statistics serve as a substitute for data that the science gets corrupted.
In addition, to obtain a high level of confidence in science:
– A large quantity of studies are needed. If a high percentage of the later studies are not in agreement, this is a signal that more studies are required.
– Shortcomings and errors in early studies must not be repeated in later studies.
– All studies should employ blind analysis techniques. When studies involve human subjects, these must be double blind.
– Early studies, even if found deficient, shouldn’t be discarded completely, rather should be compiled as the basis for later and improved studies, however all studies should be scrutinized for bias and even fraud. (There have been cases where a study involved scientific fraud, and the fraud was not detected and documented until more than 10 years later.)
– Scientists must be willing to modify their hypothesis if not supported by the data. All hypotheses must survive the “test by fire”.
Acting on conclusions based on a single study is anti-science. For example, the alarmist ban of Saccharin by Canada in 1977 was based on a single study of injections of the artificial sweetener Saccharin into lab rats that was several hundreds of times higher in concentration (dose to body mass) than the normal levels seen in human consumers. The same variety of rat was later found to develop cancer when the study was repeated sans Saccharin, i.e. were injected with the same doses of pure water with the dose of Saccharin completely removed.

Louis Hissink

March 20, 2010 3:04 am

In the mining industry from many reconciliations of what we mined and what we initially estimated was in the ground using statistics, has led to a pretty rigorous geostatistical methodology, and some basic axioms defined.
1. Intensive variables are never to be averaged in isolation, but must always be used to factor an extensive variable (volume) to yield a physically real, countable number. This includes using meteoroligical temperature readings to factor an extensive variable, be it some volume of air, or an area of land surface defined by a specific characteristic. Aggregating temperatures into cells of lat/longs might well create a plausible number, but its physically meaningless – temperature of what? An abstraction?
2. Samples of physical matter need to be, every thing else being equal, of equal volume otherwise the problem of sample volume variance comes into play. Mostly it does but in some cases it doesn’t, and we don’t know why.
I’ve rejected the AGW hypothesis from the start not because the hypothesis might be true, but because the initial data aggregation and analysis was wrong. If I used variables the way climate science uses them to estimate the metal content of a mine prior to mining, I would be bankrupt very quickly.
It’s interesting to note that its the engineers, including the exploration and mining geologists who are essentially geo-engineers, who are deconstructing the man-made global warming hypothesis from their real-world experience of what actually works and what doesn’t.