'science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation.'

The quote in the headline is direct from this article in Science News for which I’ve posted an excerpt below. I found this article interesting for two reasons. 1- It challenges use of statistical methods that have come into question in climate science recently, such as Mann’s tree ring proxy hockey stick and the Steig et al statistical assertion that Antarctica is warming. 2- It pulls no punches in pointing out an over-reliance on statistical methods can produce competing results from the same base data. Skeptics might ponder this famous quote:

“If your experiment needs statistics, you ought to have done a better experiment.” – Lord Ernest Rutherford

There are many more interesting quotes about statistics here.

– Anthony

UPDATE: Luboš Motl has a rebuttal also worth reading here. I should make it clear that my position is not that we should discard statistics, but that we shouldn’t over-rely on them to tease out signals that are so weak they may or may not be significant. Nature leaves plenty of tracks,  and as Lord Rutherford points out better experiments make those tracks clear. – A

==================================

Odds Are, It’s Wrong – Science fails to face the shortcomings of statistics

By Tom Siegfried

March 27th, 2010; Vol.177 #7 (p. 26)

P valueA P value is the probability of an observed (or more extreme) result arising only from chance. S. Goodman, adapted by A. Nandy

For better or for worse, science has long been married to mathematics. Generally it has been for the better. Especially since the days of Galileo and Newton, math has nurtured science. Rigorous mathematical methods have secured science’s fidelity to fact and conferred a timeless reliability to its findings.

During the past century, though, a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.

It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.

Replicating a result helps establish its validity more securely, but the common tactic of combining numerous studies into one analysis, while sound in principle, is seldom conducted properly in practice.

Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals. Over the years, hundreds of published papers have warned that science’s love affair with statistics has spawned countless illegitimate findings. In fact, if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature.

“There is increasing concern,” declared epidemiologist John Ioannidis in a highly cited 2005 paper in PLoS Medicine, “that in modern research, false findings may be the majority or even the vast majority of published research claims.”

Ioannidis claimed to prove that more than half of published findings are false, but his analysis came under fire for statistical shortcomings of its own. “It may be true, but he didn’t prove it,” says biostatistician Steven Goodman of the Johns Hopkins University School of Public Health. On the other hand, says Goodman, the basic message stands. “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”

Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense.”

====================================

Read much more of this story here at Science News

Advertisements

238 thoughts on “'science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation.'

  1. hmm…interesting. i wish there were a bit more details so we gained some tools in our tool belt. this is too vague to do anything with beyond speculate. maybe some rigorous statistical analysis would be helpful? ; )

  2. Well, it looks like Francis Bacon (the inventor of Science) warned of this problem.
    “mathematics …. ought only to give definiteness to natural philosophy, not to generate or give it birth. From a natural philosophy pure and unmixed, better things are to be expected.

  3. Dear Friends of Truth,
    Please do not allow our governments to ram down our throats a new carbon tax and emmissions trading ponzi scheme, that is based on the pseudoscience of the IPCC. How can you trade something you can’t see or hold?
    The reality is, despite the daily lies spouted in the mainstream media, there is no conclusive evidence of man-made global warming caused by CO2 (a harmless gas plants need to make food). The only conclusive evidence we have is of IPCC-linked scientists, bankers, and politicians who all have their hands in the cookie jar of carbon commissions.
    I ask you to judge these proponents of man-made global warming on these three simple rules:
    1) Tell the truth.
    2) Don’t hide or spin the truth.
    3) Admit and take responsibility for your errors.
    Have any of these following individuals fulfilled these three simple rules?
    – Al Gore
    – Rajendra Pachauri
    – Michael Mann
    – Phil Jones
    – Kevin Rudd
    – Ed Miliband
    – Barack Obama
    Have these people shown integrity and responsibility in the conduct of their affairs? Should we base the entire overhaul of our economic system and way of life on the words of these people?
    When I was young, I was taught this and it still rings true today:
    “Your word is your bond,
    Once broken, the trust is gone.”
    Please do not allow the greedy bankers, politicians, yes-man scientists, and unelected UN bureaucrats to dictate how we should live our lives.
    Already, the UN secretary Ban Ki-Moon is pushing for global carbon taxation. THEY ARE TRYING TO SNEAK THIS UNDER OUR NOSES. For example, see here:
    http://www.theeastafrican.co.ke/news/IMF%20proposes%20climate%20change%20kitty%20/-/2558/878408/-/yco3d3z/-/
    The rich and wealthy people don’t care. Laws that affect the middle class don’t apply to them. “Let them eat carbon” for all they care. Remember, most of these rich people own the hedge funds and venture capital that are all heavily invested in “green” technology — and they are lobbying hard for carbon emmissions trading because they are going to make a lot of money at the expense of middle-class taxpayers.
    The issue of global warming was never about saving the environment. It’s all about scaring, extorting, and controlling middle class taxpayers. For the UN, this carbon emmissions ponzi scheme is the perfect cash cow to fund their New World Order agenda–no need for accountability and cannot be prosecuted by law.
    Even worse, they are now attempting the unforgivable, which is to brainwash and indoctrinate our children into believing the global warming hogwash. Hitler youth, anyone? You can try and scare me but HANDS OFF MY KIDS. Growing up is hard enough, they don’t need the added burden and guilt of a false idealogy.
    It’s time to close the IPCC. It’s time to close the UN. Warn everyone you know about how the UN is hijacking our democracy and pushing their one world government agenda, so they can one day control the masses as they wish. So much power in the hands of a few unelected people, how can abuse and corruption not take place?
    PLEASE BE VIGILANT. Our way of life is currently under serious threat from this unelected clique of elites under the guise of man-made global warming. THEY ARE TRYING AND WILL KEEP TRYING TO SNEAK CARBON TAXATION AND CARBON EMISSIONS TRADING UNDER OUR NOSES.
    Please write to or call your representatives and tell them that you do not accept the pseudoscience of the IPCC/UN. Call for independent inquiries into the various AGW-related scandals errupting now but conveniently being swept under the carpet. Ask the AG to investigate Al Gore for fraud. Vote out any representative who continues to push this false religion of AGW.
    Thank you.
    Special note on Obama: When Obama was first elected, I had great hopes for him and his administration. In recent times, it has dawned on me that he is just a self-serving politician no different from Al Gore, out to make a quick buck at the expense of American taxpayers. Don’t be fooled by the current healthcare reform nonsense–this is just a smokescreen for Obama and his wealthy patrons’ real agenda: to push through a carbon emissions trading system in the US. Note how Obama tries to stay away from this issue at the same time instructing the EPA to regulate CO2. Further note how Obama played a crucial role in founding the Chicago Climate Exchange, of which now Rajendra Pachauri and Maurice Strong are board members.

  4. On any issue of cause-and-effect the position is either than A will cause B or that A will not cause B. It might be that A will cause B provided certain external factors are in place or that A will not cause B unless certain external factors are absent. But the position is always either yes or no, it is not and never can be a percentage.
    One can collate a list containing all possible scenarios starting with “A happens, will it cause B?”, then “A happens and external factor Z is present, will A cause B?” (and so on through the alphabet), then “A happens with external factors Z & Y present, will A cause B?” (and so on for all possible combinations of external factors). The answer is always yes or no even if we cannot assess whether it is yes or no.
    To say there is a 25%, 50%, 75% or even 99% chance is to say “we don’t know”.
    The answer can only be “yes 75% of the time but no 25% of the time” if one can identify that which causes three quarters of occasions to give an affirmative response and one quarter to give a negative, and that can only be done by further refining the external factors. One then has a more detailed analysis and a longer list of yeses and noes.
    For so long as the answer to a cause-and-effect question is qualified with a percentage (whether it is expressed as “we expect it to happen 75% of the time” or “we are 75% sure it will happen every time”), the truthful answer is “we don’t know”.
    My unscientific mind tells me that is the position in relation to every issue about the effect of increased atmospheric carbon dioxide on climate.
    Incidentally, at one time I was required to give statistical responses. There is a scheme for allowing people of limited means to pursue litigation by having their legal costs paid by “Legal Aid”. In order for a civil claim to be allowed to be funded at taxpayers’ expense it used to be necessary to have a formal written opinion from a barrister approving the use of public funds, which is where I became involved. At one time about fifteen or twenty years ago they required the chance of success to be expressed as a percentage, 75% or higher and you got funding.
    It was utterly absurd because there were so many variables, the most obvious of which were – how likely was it that your client was telling the truth, how many supporting witnesses would have to be believed for him to win, which witnesses were likely to be believed, how would your client come across compared to the opposing litigant and who was the judge going to be? It was impossible to give accurate weight to any of those variables so any percentage view expressed was completely meaningless. I refused to play the game without qualifying my opinions to reflect the variables applicable to the particular case.
    And you know what? However much I qualified my opinions, funding was granted if the figure 75 featured anywhere in my writing and refused every time that figure did not appear. There was never a case in which I said there was a 75% chance of success. A civil service clerk was assigned to read opinions and look for “75” and then “ding” the bell rang and money was made available. It was completely absurd but a fine example of how box-ticking bureaucracy works.

  5. Here’s a quote from a man who knows a lot about the value of using statistics to influence public policy:

    Fundamentally, the frequentist paradigm assumes that the underlying probability distribution is known and asks whether our observations are consistent with the known distribution. In reality, the underlying distribution is unknown (or only partially known), yet we want to know whether our hypothesis is likely to be true based on our observations — which are often incomplete. Thus, determining the likelihood of our hypothesis is easier said than done. An alternative is to use Bayesian, or subjective, probabilities that compile all the information we can possibly bring to bear on the problem, including, but not limited to, direct measurements and statistics on various components of the problem. Use of these methods can be extremely controversial. Some frequentist die-hards believe that if we can’t measure it directly, it isn’t science, what I playfully call “the tyranny of the null hypothesis.” However, the belief that the frequentist paradigm is superior to the subjective paradigm is epistemological advocacy; in short, a bias. In fact, dogmatic adherence to a frequentist paradigm limits the dissemination of valuable expert judgment that doesn’t fit into conventional evaluation of scientific knowledge, yet is crucial information for both scientific understanding and social processes like-decision making.

    — Dr. Stephen H. Schneider, February 2005
    http://stephenschneider.stanford.edu/Mediarology/MediarologyFrameset.html

  6. Interesting that this rise in scientific reliance on statistics has coincided with the rise of “social sciences” – which rely entirely on statistics.
    Personally, I don’t have much time for social science which I consider to be barely short of quackery. This alone wouldn’t be so bad if it was just gullible individuals being taken in by snake oil doctors – it’s their money, they can waste it how they like – but governments increasingly use social science (and their dubious reports) to justify various expensive policies – and this is particularly happening with global warming – and governments pay for these projects with our money.

  7. The much-vaunted “p” value is highly likely to be much abused.
    Some years back, when working as a medical rep, I talked to a Professor of Cardiology.
    Medical trials invariably use p values, with p<.05 being "significant".
    He said that the .05 level was used for sudies with only a few tens of subjects, the more subjects in the study, the higher the significance value needed to be considered "proof"
    Lies, damned lies & statistics?

  8. As a born mathematician, I have revelled in many aspects of the the subject. I must admit that imaginary numbers and the like made me feel a bit queasy, but I ‘took it like a man’, and accepted it all in the end once I saw the benefit of something that seem so wrong.
    But statistics? I have never, ever, been even slightly comfortable with them. It is all so easy to manipulate, even for very bright people. I have enormous respect for those that can delve into this area and come out with any kind of truth. It is all so easy to be misled, and, indeed, to mislead.
    I have a strong belief that mathematics is a pure subject in its own right. It also forms the basis, or foundation, for physics. Without mathematics we cannot accurately describe the physical world.
    Chemistry then rests on top of Physics, as we eventually find that we canny explain or describe Chemistry without Physics. So further, Biology rests in exactly the same way on Chemistry, we also find.
    Where does statistics come into the equation? Pretty much nowhere IMO.
    Of course, it is almost certainly possible to prove me wrong … with statistics…

  9. It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation
    Clearly not written by a scientist. We validate a hypothesis by its predictions or explanatory power or even ‘usefulness’ [even if actually not the correct one – e.g. the Bohr atom]. Statistics is only used as a rough guide to whether the result is worth looking into further. Now, if a prediction has been made, statistics can be used as a rough gauge of how close to the observations the prediction came, but the ultimate test is if the predictions hold up time after time again. This is understood by scientists, but often not by Joe Public [his dirtiest secret perhaps 🙂 ].

  10. May be slightly off topic but perhaps still relevant. As I speak Leif Svalgaard’s original tongue I found this on Danish newspaper Berlingske Tidende:
    http://www.berlingske.dk/verden/delstater-sender-usas-klima-til-domstol
    Essentially the article states that at least 15 US states have raised court action against the USA Government over its demands that they reduce the emission of CO2 and other greenhouse gasses. The states accuse the EPA of basing their rules on erroneous information from the UN’s Climate Panel. The states say that if the EPA does not re-evaluate its analysis they will get a court of law to stop the new regulations etc.
    Not heard about proposed court action by the states against the federal government before. Maybe this would be worth further investigation and perhaps a separate thread. Perhaps the misuse of statistics could be a key ingredient in the battle of US states against the Federal Government and EPA.

  11. “Lies, damned lies, and statistics” is a phrase describing the persuasive power of numbers, particularly the use of statistics to bolster weak arguments, and the tendency of people to disparage statistics that do not support their positions.
    The term was popularized in the United States by Mark Twain (among others), who attributed it to the 19th Century British Prime Minister Benjamin Disraeli (1804-1881): “There are three kinds of lies: lies, damned lies, and statistics.” However, the phrase is not found in any of Disraeli’s works and the earliest known appearances were years after his death. Other coiners have therefore been proposed. The most plausible, given current evidence, is Charles Wentworth Dilke (1843-1911).
    http://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics

  12. One issue is the effect size of your findings. If you have large dataset, you have the power to detect differences. But there’s a practical aspect to this also. Even though an effect size is small, but significant, how important is it in the scheme of thing.

  13. I’m sorry this article really is about micky mouse statistics. The real temperature signal is a complex time dependent signal with time dependent noise and time dependent potential drivers (CO2, methane). Moreover, the length of the signal is quite inadequate to distinguish easily between “normal” natural variation and “abnormal” – and this kind of statistical analysis is entirely the wrong way to approach such a complex subject.
    The proper way to analyse this situation, is to characterise the “normal” (pre-CO2) signal in terms of the frequency distribution of the natural variation (which approximates to 1/f^n noise) and then to compare this “normal” signal to the signal under test (the post CO2 signal) and see whether the frequency components of the post CO2 signal is statistically inconsistent with the normal signal.
    Where this approach differs dramatically from the noddy statistics used by climate alarmists, is that they take a short-term signal, work out the “variation” based on Gaussian (white/ none-frequency dependent) noise and then in a quite delusional way say: “look the temperature has gone above the ‘normal’ variation”. This is hocus pocus BS. 1/f noise will always exceed this bogus ‘normal’ variation, because in 1/f type noise the long-term variation is always much greater than the short-term noise, so the short-term measurement of variation is wholly biased toward the much smaller short-term variation, and fails to account for the fact that each time you make a longer-term sample, it will show more variation, and hence longer periods will always appear to have shifted from this mickey mouse “norm”.
    To restate that another way: if you sample the climate and wait a fairly short time, the climate will always exceed the (noddy statistics) ‘normal’ variation.
    Now that’s the theory, now if only someone could tell me how to calculate the Fourier-statistical analysis of short time series, ….

  14. Another example of how the scientific process is being corrupted came out this week. A report in the BMJ (British Medical Journal) highlighted the problems of undeclared financial interests in a particular drug or it’s rival when they expressed views about it. It all sounds depressingly familiar.
    http://www.independent.co.uk/life-style/health-and-families/health-news/glaxo-funded-backers-of-danger-drug-1923852.html
    Writing in the British Medical Journal, the Mayo Clinic researchers say: “In the heat of the [drug] controversy, patients and clinicians alike were exposed to many arguments on both sides of the debate. How could interpretation of the same evidence result in disparate and impassioned positions? We aimed to determine whether financial conflicts of interest with pharmaceutical manufacturers could be fuelling this fire. From our findings, it appears that the answer is yes.”
    Mohammad Murad, assistant professor of medicine, who led the study, said yesterday he had been disappointed at the low rates of disclosure of financial conflicts of interest, given the clear link between them and the authors’ views. “This thing [the influence of financial links] could be subconscious,” he said. “We are not saying it is necessarily deliberate. But the implication is that there should be better disclosure. People [with financial links to the companies] should realise they are probably biased and as readers we should be aware of probable bias.”

  15. An excellent article. My rule of thumb is, if you can’t see it in a graph, it’s not real. Which is much like the quotation above, “If your experiment needs statistics, you ought to have done a better experiment.” – Lord Ernest Rutherford.
    One place where statistics are egregiously misused in climate science is in the search for the ever-elusive “fingerprint” of the postulated human effect on the climate. People search assiduously through a variety of datasets, looking for some proof that humans are affecting the climate. Occasionally, someone finds one that is significant at the p < 0.05 level.
    What’s the problem with that? Well, as you look through datasets, the odds of finding one with a spurious (occurring by random chance) p value less than 0.05 go up rapidly. Here’s the odds of finding a spurious “fingerprint” by random chance given how many datasets you’ve examined:
    1 dataset, 5% (as you’d expect, that’s what p < 0.05 means)
    2 datasets, 10%
    3 datasets, 14%
    4 datasets, 19%
    5 datasets, 23%
    6 datasets, 26%
    7 datasets, 30%
    8 datasets, 34%
    9 datasets, 37%
    10 datasets, 40%
    11 datasets, 43%
    12 datasets, 46%
    Once you’ve looked at a dozen datasets, it’s almost fifty/fifty that you’ll find a spurious result. Given the hundreds of scientists out there examining datasets looking for the “fingerprint” …

  16. Basing judgement on laughably untenable assumptions is not sensible. Smirking high-priests of the Church of Statistics have the wool well over the uncritical eyes of the publishing masses.
    “Too much of this p-value stuff.” (anonymous statistician devoted to common sense)

  17. Statistics are indispensible for much scientific research, but a lot of researchers are not specialists in this field. The general public has much less of an understanding. Most in the media have almost no understanding as evidenced by their reporting of health matters for instance.
    Quality research institutions employ specialist statisticians and biometricians who provide quality control of statistics in research projects, to make sure experiments are properly designed, analyses are robust and conclusions are sound.
    (Amateur statisticians are the cause of much confusion and misinformation in climate science.)

  18. My Calculus prof taught:
    “The biggest liars in this world are politicians and statistics”.
    Putting that aside, when properly used, statistics help us to comprehend our data. It’s when statistics serve as a substitute for data that the science gets corrupted.
    In addition, to obtain a high level of confidence in science:
    – A large quantity of studies are needed. If a high percentage of the later studies are not in agreement, this is a signal that more studies are required.
    – Shortcomings and errors in early studies must not be repeated in later studies.
    – All studies should employ blind analysis techniques. When studies involve human subjects, these must be double blind.
    – Early studies, even if found deficient, shouldn’t be discarded completely, rather should be compiled as the basis for later and improved studies, however all studies should be scrutinized for bias and even fraud. (There have been cases where a study involved scientific fraud, and the fraud was not detected and documented until more than 10 years later.)
    – Scientists must be willing to modify their hypothesis if not supported by the data. All hypotheses must survive the “test by fire”.
    Acting on conclusions based on a single study is anti-science. For example, the alarmist ban of Saccharin by Canada in 1977 was based on a single study of injections of the artificial sweetener Saccharin into lab rats that was several hundreds of times higher in concentration (dose to body mass) than the normal levels seen in human consumers. The same variety of rat was later found to develop cancer when the study was repeated sans Saccharin, i.e. were injected with the same doses of pure water with the dose of Saccharin completely removed.

  19. In the mining industry from many reconciliations of what we mined and what we initially estimated was in the ground using statistics, has led to a pretty rigorous geostatistical methodology, and some basic axioms defined.
    1. Intensive variables are never to be averaged in isolation, but must always be used to factor an extensive variable (volume) to yield a physically real, countable number. This includes using meteoroligical temperature readings to factor an extensive variable, be it some volume of air, or an area of land surface defined by a specific characteristic. Aggregating temperatures into cells of lat/longs might well create a plausible number, but its physically meaningless – temperature of what? An abstraction?
    2. Samples of physical matter need to be, every thing else being equal, of equal volume otherwise the problem of sample volume variance comes into play. Mostly it does but in some cases it doesn’t, and we don’t know why.
    I’ve rejected the AGW hypothesis from the start not because the hypothesis might be true, but because the initial data aggregation and analysis was wrong. If I used variables the way climate science uses them to estimate the metal content of a mine prior to mining, I would be bankrupt very quickly.
    It’s interesting to note that its the engineers, including the exploration and mining geologists who are essentially geo-engineers, who are deconstructing the man-made global warming hypothesis from their real-world experience of what actually works and what doesn’t.

  20. Great article. Flavour of the month in medicine is the QRisk score for assessing risk of CHD, similar to the Framingham risk score. Once all of my patient’s CHD risk factors are inputted, the computer magically calculates the patient’s risk, expressed as a percentage; based upon which we decide to prescribe statins or not.
    There are many flaws with this. Say a patient scores 20%. What is the accuracy of that score? We don’t know. In order to test the validity, we would have to do a prospective study recruiting a statistically siginificsnt number of patients with a Qrisk of 20% and measuring their CHD outcomes over the following 10 years.
    There are many other problems with the studies used to justify statin use for primary prevention of coronary heart disease. The fact that we give patients a percentage risk score, the accuracy or confidence limits over which we have no idea, is pure voodoo medicine.

  21. “This means that it is 95 percent certain that the observed difference between groups, or sets of samples, is real and could not have arisen by chance.”

    “That interpretation commits an egregious logical error (technical term: “transposed conditional”): confusing the odds of getting a result (if a hypothesis is true) with the odds favoring the hypothesis if you observe that result.”

    Someone ought to tell the IPCC that!
    So, statistically, it’s impossible to distinguiush a hockey stick from a pogo stick.
    Good article, and a list of books to read.

  22. I have an amateur interest in archaeology and ancient history. Even here use and misuse of statistics is commonplace. For example some seeds from a site might be radiocarbon tested. For whatever reason it is commonplace to get a range, often a wide range, of dates from the samples tested.
    To get a date for the site what is then done is either a mean is taken of all of the dates and published as the date of the site or a few ‘anomalous’ dates are ignored and the mean date then calculated.
    This is all very understandable and is probably the best that can be done with the present imperfect science. But only one of the readings at most can be the true date, possibly one of the anomalous ones and all of the dates might be wrong
    Mathematically it has seemed to me to be like trying to measure the height of your son by averaging the heights of all of the other boys in his school.

  23. Rutherford did physics. His dictum is laughable in, say, agricultural research.
    As for the shaky foundation of stats: yes, reasonably well known i.e. I’ve known for 40 years -:). But not much to do with the lousy (or even dishonest) methods that pollute Climate Science.

  24. Yes pure math gives what appears to be the comfort of certainty and although many pure scientists believe this to be absolutely true, most philosophers can show that no system of logic is both complete and consistent.
    That being the case, that purity is only relatively true and there is an element of uncertainty inherent in all calculations and proofs.
    Statistics at least states from the start that it is dealing with probabilities and not absolutes.
    I do agree with the author that, bunching together a whole load of probablys from disparate research findings is just as likely to compound errors in conclusions as it is to diminish them.
    However, there isn’t really any gold standard to work from that can be equally applied to all fields of enquiry and he needs to get a little more comfortable with uncertainty.
    Nor should there be. Man the toolmaker has to develop tools that will give results most of the time in every field of endeavour. Some are better than others but I’m not throwing my nutcracker away just because it won’t crack almonds. I hit them with it instead!

  25. one quotation missing is from Winston Churchill
    “There are three types of lie; lies, damned lies and statistics!”

  26. Hmmm, regardless of the numbers and quality of data sets how can one model something like climate if all the subtleties aren’t understood?
    I’m sorry for asking what might be a seriously dumb question but I keep tripping over the butterfly effect…

  27. It’s a frustrating thing, to see a thick black line graph representing the average of a model’s predictions over 50 cycles. I think it should be a legal requirement to graph climate model projections in a more directly representative fashion. If you plot the result of a model you ran 50 times, all the plotted points of all the model’s run-time projections should be shown at 1/50th density.
    The result will look like the end of a frayed piece of string, but it will at least be more visually representative of the confidence that should be placed in the model’s predictions. There would be nothing visually discernible at the right-hand side of a graph plotted this way, and that’s exactly what it should look like.

  28. Stats can be excellent in the right disciplines and if deployed appropriately – without this branch of maths, agriculture could not have progressed nearly as far as it has. Natural biological variation needs appropriate tools to determine effects of treatments on yield, growth, etc.
    In this case, we have the effective methods, which have been fully tested for 150+ years and are responsible for advances which feed billions of people. However, these tools depend on utterly scrupulous and trustworthy scientists not massaging data for their own theories and Lysenkoist delusions.
    Dishonest pseudoscientists using fabricated and worthless techniques damage every area of science they contact. Climate fraud risks public confidence in so many respects – I’m disappointed in the likes of The Royal Society for being so utterly spineless in failing to defend their own constituencies.

  29. “Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted.” … Matches are a good thing, but in the hands of children or the careless can burn the house down.
    Thanks for the compendium of statistics quotes. Missing was one of my all-time favorites:
    “There are three kinds of lies: lies, damned lies and statistics.”
    -Benjamin Disraeli (attributed by Mark Twain)

  30. Very informative. It comports with my actual experiences over the course of my career. Knowing I was to become a scientist, I elected to take courses in experiment design, statistics, and computer programming. Thought I was “normal”. When I worked at NASA, I found I was one the very very very few.
    So often, I look at how statistics are used in climate related studies (even those I would like to support) and, without being able to put my finger on it, think — this has to be a misuse of statistics.

  31. I think the author is too hard on “maths” rather than “incompetence”. As he states:
    “Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals.”
    …Sounds like a certain Canadian blogger!

  32. the diagram of p values is ok but one vital piece of information is left out. The p value is the probability of an observed (or more extreme) result arising only from chance but UNDER THE NULL HYPOTHESIS.
    There is a venerable literature on the null hypothesis and a venerable tradition, followed by generations of weak students, of ignoring it and what methodologically it means.
    The method of science (summarised by Popper’s idea of ‘conjectures and refutations’) is neatly caught by the concept of the null hypothesis but consideration of its proper use seems to have gone out of fashion in some quarters.

  33. What drives me nuts are statistical analyses of rare, unevenly distributed but probably not random events.
    If there are only about 6 Category 5 hurricanes hitting the US mainland in a century, then you’re almost certainly going to get more in one half than in the other, even if it means nothing at all.

  34. After reading some of the “random walk” posts recently, I thought I’d try a little experiment, which consisted of writing a bit of C code which generated pseudo-temperature records, based on a random +- 0.1 annual deviation from the previous year, centered around 15C, and with a bias factor that made the temperature drift towards 15C if it starts drifting away.
    So this 15-minute job produced 10,000 years of temperature records which I imported into a spreadsheet and drew some pictures.
    There’s basically with a boring average close to 15, and lots of apparent noise between 13 and 17C. But zoom in a bit, and you see features like little ice ages, medieval warm periods, “hockey stick” features, and all sorts.
    And apply some of the trend analysis functions to selected parts of the “noise” and it finds all sorts of things.
    And it’s all random.

  35. A bit off topic but vaguely relevant. Next to this article on my computer this morning was an add for the London Speakers Bureau advertising the services of Rajendra Pachauri. What words in the article could have prompted the link: “dirtiest secret”, “flimsy foundation” or “countless illegitimate findings” ? I wonder.

  36. KISS comes to mind…Keep It Simple Stupid.
    When you start to get into more complex math married to science, a scientist are not mathematician.
    A times, even simple math will loose people when trying to show something or prove a point.
    My wife will go into a fog when I show interesting points of science as it is not her interest. As well, I’ll tune out when she talks about cooking and recipes.
    We all have our different areas that will peek our interest or disinterest which will turn the foggy eyes on.

  37. If using P-values correctly are so “problematical,” then perhaps the alternative, Bayesian statistics, should be considered.
    E.T. Jaynes wrote a famous book “Probability Theory – The Logic of Science” and I note that the term “P-value” does not even appear in the index. Maybe the world would be better understood without P-values.

  38. This is why a growing number of young scientists are Bayensians.
    One of the crucial features of the Bayesian view is that a probability is assigned to a hypothesis, whereas under the frequentist view, a hypothesis is typically rejected or not rejected without directly assigning a probability.
    http://en.wikipedia.org/wiki/Bayesian_probability
    So the statistical approach with the “flimsy foundations” is known as the frequentist approach. The same one that says “the debate is over”: the hypothesis that GHG are causing a dangerous linear warming has been accepted. By contrast Bayensians (that most of us here are without even realising it) will work out how probable it is that the dangerous GHG hypothesis is correct, and compare this probability to competing climate theories.

  39. Agree strongly. When dealing with something as complex as human-drug interaction, even when you use statistics rigorously you can often get some very odd results, hence the almost weekly ‘wine is good/bad for you’ headlines.
    To grace something as intangible as climatology with the moniker “science” is to give it a source of legitimacy that it really shouldn’t have. Maybe “climate prediction” or “climate educated guesswork” would be closer to the mark.

  40. In almost anything related to human beings direct experiment is not possible so statistical analysis of measurements is all we have & probably all we can have.
    Sometimes we rely on worse than that for example the LNT theory that low level radiation is harmful relies on absolutely no evidence whatsoever wherea the opposing theory of hormesis (that at low levels it is good for you) has a considerable amount of statistical evidence in humans & an unquestioned base in experimental results of animals & plants. Once again we see real science being ignored for the sake of politics.

  41. actually the major problem is the ‘Gaussian’ assumption. One can only use use statistic’s if you know the error distribution, typically scientists assume that their error distribution is Gaussian and use statistics (packages that they don’t understand) to define standard errors and confidence levels.
    A large number of processes are not Gaussian, probably the majority of things that are measured in biology are not. In biology one tends to have bimodal populations where you have a significant overlap between members of the two populations.
    It is rather like trying to find the average speed of human movement in New York, some are sedate (mean 0 mph), some people are walking (mean 4 mph), some are on the subway (mean 8 mph) and some are in car (mean 11 mph). During the course of the day the average speed changes; but mostly due to the transition from one state to another. The movement for sub-populations between states is very difficult to test, but calculating the mean is easy.
    The best thing to do is to experimentally populate or depopulate a state, negating the need for statistics. Clever experimental design is a lot better than clever statistics.

  42. FatBigot (01:05:20) :
    I think this depends on the nature of the problem, where an uncertain answer can be completely correct.
    For example, try running across a busy fast-moving road. Once you’ve got across, the answer to “did a car kill me” is clearly “no”. Or else, if you failed to cross, the answer is clearly “yes”.
    So if the question is “will I be killed running across a busy road” a probabilistic answer seems completely reasonable, but in your view the answer can only be “I don’t know”.
    If the question is “is it dangerous to run across a busy road” then the answer is clearly “yes” despite the possibility of survival.

  43. richard (05:14:44) : “To grace something as intangible as climatology with the moniker “science” is to give it a source of legitimacy that it really shouldn’t have. Maybe “climate prediction” or “climate educated guesswork” would be closer to the mark.”
    I concur. Climatology has far more in common with Astrology than Geology. Whoever would call Astrology a science? In the immortal words of Maureen Lipman in the 80s British Telecom TV ad, “You get an ‘ology’? You’re a SCIENTIST!” – Youtube: http://www.youtube.com/watch?v=vEfKEzX9QLE
    :o)

  44. Close enough, ‘IS’ government work… because they then have to reveiw and update their findings of fact, that will then need to be kept from the public. It is all about the ‘O’ flow. All of this needs to take place first, before the scientists throw it all out to save ‘SPACE’. You know, they are moving into a bigger and better facility after they hold the ‘ACLUE Meeting’ in Madrid…

  45. @ Willis Eschenbach (02:41:51)
    Thank you for that. I got a warning in one of my stats classes about “fishing expeditions” through the data looking for results. I was told that I would always find -something- significant in the data somewhere if I tweaked my tests enough.
    But finding significance on your umpteenth attempt missed the whole point: if your hypothesis was right and the test result was truly significant, it would have jumped out at you on the first try.
    Each try after that is like doubling the size of your net and draining half the pond looking that monster catfish that you just know is in there somewhere.
    When you finally do find a minnow flopping around in the mud at the bottom of the drained pond, the stats will still show it to be significant, and guess what gets submitted for publication.
    That’s why the point in the article about replication is so important. And perhaps why stonewalling on providing data for replication has been the biggest problem with climatology.

  46. Having used statistical methods in my research, I always distilled my results down to one aspect of analysis.
    Signal to noise ratio.
    If the effect could not be attributed to random fluctuations (no matter what the cycle state currently affecting the result) then it was a real effect and could be reproduced under all conditions.

  47. First read “Odds are, it’s wrong” http://www.sciencenews.org/view/feature/id/57091/title/Odds_Are,_Its_Wrong That looks like it is the piece from where Anthony Watts got his excerpt.
    Then, if you have more than half a day, follow the comments of VS on Bart’s blog here: http://ourchangingclimate.wordpress.com/2010/03/01/global-average-temperature-increase-giss-hadcru-and-ncdc-compared/#comment-1216
    Interesting discussion until Tamino’s trolls join the fray.
    See Bishop Hill’s comment on March 17th

  48. Anthony,
    We should see if we could get VS to post an intro/overview of econometrics here. The real/worst issue with statistics and climate science is the picking and choosing of what methods to use – how many model tweaks do you think Mann went through before he got his Hockey Stick *just* right?
    According to VS, CO2 and temperature don’t pass the first – mandatory – econometric test for correlation. In his field, if you don’t get past that, the theory is invalidated – no matter what fundamental theory underlies it or how similar/correlated the graphs look. *Real* statistics conflicts with *their* statistics – and advanced econometrics, especially applied to climate data sets, are much more appropriate than Statistics for Science I

  49. Juandos (3:58:07): You wrote: “Regardless of the numbers and quality of data sets how can one model something like climate if all the subtleties aren’t understood? I’m sorry for asking what might be a seriously dumb question but I keep tripping over the butterfly effect…”
    Not dumb in the least. I am beginning to firm up these two ideas:
    (i) That many areas of modern life qualitative understanding is being shouldered aside by quantitative methods, to the detriment of common sense and ‘feel’. The lawyer’s piece above (Fat Bigot, 1:05:20) is a perfect example.
    (ii) That the warmists and the sceptics stand either side of a profound philosophical gulf. They are determinists, confident that the forecasts are founded on such solid science and such solid initial conditions that the future of the climate is more pridictable than it actually is. We are Chaoticists, conscious of “known unknowns” and wondering whether there remain “unknown unknowns” yet to emerge.
    I recently tried to discuss this philosophical divide with a bunch of warmists, but was labelled a know-nothing-numpty.

  50. FatBigot (01:05:20) :
    To say there is a 25%, 50%, 75% or even 99% chance is to say “we don’t know”.
    The answer can only be “yes 75% of the time but no 25% of the time” if one can identify that which causes three quarters of occasions to give an affirmative response and one quarter to give a negative, and that can only be done by further refining the external factors. One then has a more detailed analysis and a longer list of yeses and noes.

    Which is what troubled Erwin Schrodinger. But with those little electromagnetic globules (technical term) they can’t do any better.
    ——————-
    There are ways in which it is good to be a ‘frequentist,’ but not at my age.

  51. Oh I get it now, when natural scientists say they dont trust statistics, they mean they dont trust the use of quantative probability techniques to prove hypothesis .
    Daniel H’s quote from Schneider finds me (if I understand Schneider correctly) in sympathy with Schneider’s view (oh dear!) – which itself reminds me of a forgotten text by the young John Maynard Keynes, A Treatise on Probability 1921.
    Keynes demolishes the frequency theory of probability as found from Bernoulli to Laplace. His critique of Laplace’s approach to probability (and stats generally) is marvelous:

    It seemed to follow from the Laplacian doctrine that the primary qualification for one who would be well informed was an equally balanced ingnorace. p85

    It’s like: What are the chances of tossing heads? On ignorance/frequency says its 1/2. OK, so now I have tossed heads 15 times, and you get a million buck if you guess the 16th toss. On that knowledge, what do you choose? He notices that while the frequency doctrine prevailed in his time in science, it did not previal in the field of insurance – where failure meant insolvency.
    Keynes was a founder of modern economic (and not in the narrow way you might think from the term ‘Keynsian’) and one of the big things he advocated was statistical indicators – as accurate public knowledge upon which investors could make decisions. Injecting this knowledge in the market would help to avoid panics and bubbles based on rumour and misinformation. So here we have a social scientist who would approve of this critique of the over reliance on quantative probability in validation of hypothesis but alsa an advocate of making the primary data publically available…I think this social scientist is on our side.

  52. Holy cow, it is a silly article.
    There is a substantial portion of science where work without statistics would be almost impossible – and be sure that you’re hearing this from a person who almost always used “non-statistical” arguments about everything. That people make errors or add their biases or misinterpret findings can’t reduce the importance of statistics. People do mistakes, misinterpretations, and distortions outside statistics, too.
    The notion that statistics itself should be blamed for these human problems or that it is inconsistent because of them is preposterous. Even in the most accurate disciplines, like particle physics, it’s inevitable to work with statistics. It’s a large part of the job. And people usually don’t do flagrant errors because scientists in this discipline don’t suck.
    One can have his opinions about the ideal methodology and/or required confidence level, but dismissing all of statistics is surely about the throwing of the baby out with the bath water.

  53. channon (03:54:04) :
    “Yes pure math gives what appears to be the comfort of certainty and although many pure scientists believe this to be absolutely true, most philosophers can show that no system of logic is both complete and consistent.
    That being the case, that purity is only relatively true and there is an element of uncertainty inherent in all calculations and proofs. […]”
    Sooo… when is it that 1pebble + 1pebble doesn’t equal 2pebbles? Did I make an error in logic somewhere?

  54. You can never use a probability to prove a hypothesis. You can only use it to reject a hypothesis with a certain probability of being wrong. Further than that, the methods of sample survey are non-trivial. You cannot just take a batch of data from where every you might get it and regard it as a state. It represents a sample. Errors introduced by poor sample survey designs can overwhelm any purported results of the survey.

  55. There is an excellent book “How to lie with statistics” by Darrell Huff, and was first published in 1954.
    Chapter 5 is titled ‘The Gee-Whiz Graph’ and the writer goes on to describe how a graph that doesn’t have an impact can be made to have an impact. One way is to remove the space at the bottom from 0 to where the data makes an appearance on the graph. This immediately isolates any slope and makes it look more important. Next is to expand the scale on the left to increase the incline.
    This book was written before AGW and I read it from an unrelated recommendation. So when I see the usual graphs of increasing CO2 in a graph I see the lower bit from 0 to where the data makes an appearance has been removed, and the scale has increased to ensure the gradient is steep. So what should the scale on the left be? I would have thought that as what we are measuring is parts per million then the scale should be a million? I tried that and the problem was I couldn’t see the CO2 on the graph!
    In addition we hear that CO2 has increased by 30%, but compared to what? I would argue that the CO2 properties of the atmosphere as a whole has increased by 1% of 1%. Have I got this wrong?

  56. A little bit of fun here: Years ago I was listening to NPR (yes, I still do occassionally, but with a VERY jaundiced eye/ear these days! I’ve learned…how completely left biased they are) and they gave a report on the attempts in N.J. to raise the “standard tests scores” for high school graduates.
    The announcer said this: “Despite 4 years of efforts, 50% of all students still fall below the mean on the standard tests…”
    I almost drove off the road (I was in my car) I laughed so hard.
    My thought: “We know which side of the MEAN this announcer fell on…”
    Remember: 78% of all statistics are actually made up on the spot. (Like that one.)

  57. Interesting article, and proof of “auditing” is necessary when statistics are involved.
    It is nice to get the perspectives in that article from scientists outside the climatology furball. How will the AGW proponents attack them –funded by Exxon-Mobil? Oh, I know. . . “associated with industries that are significant C02 producers!” As if there are any that aren’t.

  58. Alas, not even calculus is perfect:
    I was devastated when was shown Gabriel’s Horn:
    Gabriel’s Horn is obtained by rotating the curve y=1/x around the x axis for 1<or =x <or = infinity . Remarkably, the resulting surface of revolution has a finite volume and an infinite surface area. It is interesting to note that as the horn extends to infinity, the volume of the horn approaches pi .
    After having so many things in the world betray me, I thought at least math is perfect and truthful… I was crushed.

  59. I still say:
    Show me the data,
    Show me how you measured the data,
    Show me how you manipulated the data,
    and I will show you the truth.

  60. My favourite variation on the famous Mark Twain quote:
    There are three kinds of liars: Liars, Outliers, and Out-and-out Liars.
    Maybe we should amend Twain say that there are three kinds of lies: Lies, Stastics and Climate Science.
    Fun rhetoric aside, as a professional mathematician I have hated statistics. In my own studies I maneuvered my course choices at UBC to avoid the subject, and took my PhD in Pure Math. I was, therefore, not prepared when the first university that hired me asked me to teach Statistics. I guess they liked that so much they started giving me two statistics classes at a time, both introductory statistics and a second-year course in multivariate analysis of variance. The next place I taught also handed me a Stats load. Where I am now, Statistics is a separate department, and I like it that way.
    All that said, I wish to pour a bit of cold water on the discussion by saying that modern statistics itself is not poorly founded but rather poorly understood or deliberately abused by many who would draft statistics to preconceived ends. This includes many in the “hard” sciences, and even more in the soft sciences “humanities” and medicine. What we need, however, is not to ridicule and curtail the use of statistics but more, and better, rigour in its use, and more scrutiny of how conclusions are drawn from it.
    There are two kinds of statistics: Descriptive statistics (which is 100% accurate — it merely describes the characteristics of a data set, but may be used to “lie” simply by selective reporting) and Inferential statistics (in which sampled data is used to infer facts about unsampled data or predictions about the behavior of that which is sampled, such as weather or climate systems).
    Inferential statistics is not wrong by virtue of having built-in uncertainty, but the uncertainty is an integral part of any conclusions one draws with it, and more rigour must be applied in the analysis of uncertainty than in the principal figures themselves. In inferential statistics there are more places where one can make errors (deliberate or accidental) and misinterpret or otherwise simply get conclusions wrong. This is why people like Steve McIntyre, who devote great energy and care to scrutinizing others’ use of statistics on matters of great importance, are heros.
    As Willis points out above, it is easy to create a surface impression that statistics is done right. A rule of thumb in the world of polling is that a properly randomized poll of 30 citizens (regardless of the size of the population) on a binary question (will you vote X in the upcoming election) gives a result that is accurate to within 3%, 19 times out of 20. This means that the actual number reported (52% of voters sampled say they will vote X) can be expected to be within 3% of the actual proportion of the population (so in fact you can expect 49 to 55% with something like 95% certainty).
    But this also means that, if you carried out this poll 20 times there is a very good chance that one of the 20 will give a number MORE than 3% from the actual proportion. This makes it easy to cherry-pick results to suit one’s desired conclusion (which necessitates strict controls over statistical methodology).
    But this is not the only way to screw up results. As Anthony has well established in climate science, it is easy to advertently or inadvertenty introduce bias by systematic problems in sampling methodology. In my polling example, a telephone poll may be problematic: How are the telephone numbers selected? Will the fact that you won’t get responses from people unwilling to give time to answer telephone polls have an effect on your results (for example, what if you’re doing a consumer survey on how people feel about telemarketers)? A phone survey leaves out people who don’t have the financial means to own a phone, or who have no permanent address — not a good way to sample the homeless. A land-phone survey will miss those who rely only on cell phones, skype and internet-based phone services, a growing demographic, heavily skewed to young people and retirees. And so on.
    It’s important to remember that figures don’t lie — but at the same time, liars can figure! And that’s the critical datum here.

  61. Leif Svalgaard (01:48:47) :
    It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation
    Clearly not written by a scientist. We validate a hypothesis by its predictions or explanatory power or even ‘usefulness’ [even if actually not the correct one – e.g. the Bohr atom]. Statistics is only used as a rough guide to whether the result is worth looking into further. Now, if a prediction has been made, statistics can be used as a rough gauge of how close to the observations the prediction came, but the ultimate test is if the predictions hold up time after time again. This is understood by scientists, but often not by Joe Public [his dirtiest secret perhaps 🙂 ].

    Leif, you are too much the idealist here. You are exactly right, of course, in the role of prediction or explanatory power or utility in “validating” hypotheses. (Be careful, though. We do not “validate” hypotheses; strictly speaking we fail to invalidate them. But I know you know that.)
    But did you read the entire article? It is actually quite good, and is calling attention to something that is quite true: the misuse and frequent misunderstanding of the results of statistical tests.
    Let’s apply this discussion to the IPCC exercise. What, exactly, are they predicting? As I understand it, nothing. So how is that science?
    More pointedly, I’d like to see some serious discussion of the IPCC’s use of statistics in its “Treatment of Uncertainty.” They take the outcome of statistical tests and interpret them in this way:
    “Where uncertainty in specific outcomes is assessed using expert judgment and statistical analysis of a body of evidence (e.g. observations
    or model results), then the following likelihood ranges are used to express the assessed probability of occurrence: virtually certain >99%;
    extremely likely >95%; very likely >90%; likely >66%; more likely than not > 50%; about as likely as not 33% to 66%; unlikely <33%; very
    unlikely <10%; extremely unlikely <5%; exceptionally unlikely <1%.

    I think there is indeed a “dirty little secret” here that the article we’re discussing exposes. The IPCC “Treatment of Uncertainty” is based on the fallacy of the “transposed conditional.” When a paper publishes a statistical result with, say, a 95% “level of significance,” that doesn’t mean that there is a 95% likelihood that the finding is 95% “true.” Properly constructed, it means that there is a 95% likelihood that the finding is “not not true” or “not false.” In statistics (conditional reasoning), the fact that something is “not false” does not make it “true.” Or, to use the terminology of the IPCC, the fact that there is a 95% probability that something is “not unlikely” does mean that it is 95% likely. Formally, the problem here is that in most cases we cannot properly state the conditions under which something is likely to be true. So we cannot say, with any degree of measurable certainty, whether or not our results were a fluke. All the “null hypothesis” can do is assign a probability to the likelihood that are results are not a fluke. But if the results are a fluke, the fact that it was that one time in twenty is little consolation. And since it only takes one fluke to disprove an hypothesis, we don’t stop with one test. We keep testing, and testing. Or, as you put it, we use our 95% (or whatever) statistically significant result only “as a rough guide to whether the result is worth looking into further.” And I agree further that the goal is a prediction, and that prediction cannot be validated or invalidated by statistics. It is validated or invalidated by observation. Either the prediction holds true, or it does not.
    When I started out doing statistical research for publication (back in the 1990’s), a 95% level of significance was a kind of threshold for concluding the possibility of a meaningful inference (in my field, at least). A 90% level of significance was considered a weak finding in support of an hypothesis. I think this was an intuitive way of avoiding the “fallacy of the transposed conditional.” So when I see the IPCC treating >66% as “likely” I find it revolting. And yet these are so-called scientists!
    Give the author a little slack. He’s calling attention to a real problem. He may have not said everything exactly the way you would have, but that doesn’t mean everything he said is unworthy of your consideration.

  62. I hit the post button too quickly.
    1) When I bold Leif’s “rough guide” comment, I’m supplying the emphasis.
    2) I started out publishing in the 1970’s, not 1990’s.

  63. This has been a issue between scientists and statisticians for a long time. For anyone who wishes to explore the issue from the statistical side, I’ll recommend the American Statistical Association http://www.amstat.org/ . There have been many papers and discussion of climate and the use of statistics therein. Suggest searching the site for “climate” and/or “Bayes + climate”.
    In addition there is a meeting coming up on this very issue:
    Richard L. Smith, L. Mark Berliner, and Peter Guttorp, the authors of the article: Statisticians Comment on Status of Climate Change Science in the March issue of Amstat News, are going to discuss their article online LIVE from noon to 1 p.m. EST on Wednesday, March 31. Be sure to check back here or Amstat News online on the 31st for the link!
    That said, there is also considerable disagreement within the statistics community about philosophy and methods used to analyze climate data. Bayesian vs. Orthodox, for example.
    My own personal philosophy is: If you ask a fuzzy question, you’ll get a fuzzy answer.

  64. FatBigot (01:05:20) :
    “To say there is a 25%, 50%, 75% or even 99% chance is to say “we don’t know”. ”
    I believe that the existence of the classical “path” can be pregnantly formulated as follows: The “path” comes into existence only when we observe it.
    –Heisenberg, in uncertainty principle paper, 1927
    The more precisely the position is determined, the less precisely the momentum is known in this instant, and vice versa.
    –Heisenberg, uncertainty paper, 1927
    Although Heisenberg was referring to the character of particles in the subatomic world, he shed a great light on our scientific conundrum. We can determine the path of past climate by ferreting out historical evidence in tree rings, sediments, ice cores, etc. But that path did not exist until we looked for it and created it. The path does not exist as a physical entity because it is simply a human creation. As we try to accurately pin down the exact conditions that exist today, we lose our ability to see where we are going. It’s as if we assume there is a linearity to climate and we simply need to draw the future path from the direction of the past + current data. But Heisenberg points out that you can’t know both at the same time!
    Despite the Heisenberg uncertainty principle, the understanding of quantum mechanics has given us computers, cell phones, etc. We can function quite well in a world of uncertainty. All the statistics do is measure the degree of uncertainty in a chaotic world. As Einstein demonstrated, “It’s all relative”.

  65. Being grounded in metrology (the science of measurement) should be a prerequisite for anyone before applying any advanced statistical tests, and that includes scientists.
    The surface station debacle is the perfect example. Reading through the blogjacks at Lucia’s, after all the statistical game playing, the bottom line is unless first principles of metrology are understood and applied to the instrumentation, it is a meaningless exercise to evaluate the data statistically.
    That’s the problem I have with slick data miners like Tamino and a few threads over at Lucia’s. After all the statistical game playing, not one single analysis employed has addressed the problem of each individual station’s integrity.

  66. Blackbarry (07:14:30) :

    “It’s all relative”

    Yes indeed. That is why a handful of years ago, we often heard “We must act because the science is certain”. We now hear “We must act because of uncertaintly, and we just don’t know how bad it could be!”
    It’s all relative to how the headlines are going.

  67. I can’t believe that so many here think this Science News article is worth anything. Their “barking dog” experiments are such obvious straw men. Nobody would conduct an experiment that way. And these jokers keep returning to that same bad experimental design to “prove” their point. It’s not proof, and, is one more example of bad science. It’s pretty funny that so many people here think these guys are right just because they have an anti-establishment tone to their article. They’re also the establishment, and, you guys who think this article was good have been fooled yet again.
    The complaint that many of you have, namely, that enough experimentation will lead to statistical significance by chance, is already addressed by the ANOVA test or Bonferroni correction to the t-test. In short, statisticians already know about it and raise the bar for statistical significance when multiple trials are performed. It’s true that many people don’t follow the rules, but, that’s the fault of the experimenter, not the field of statistics.
    This Science News article is an example of people speaking about statistics who don’t know what they’re doing or who are misrepresenting the field. Nobody uses p = .05 anymore unless they have really rare data. We like to see p = 0.01 or 0.001. Also, the Bayesian crap these guys are pushing leads to trivial results in trivial situations but madness in difficult situations. In my experience, people use these trendy methods to get funding, not to get results…. just like the CRU does.

  68. IMHO, Bayesians have a significant epistemological advantage over frequentists – they start with more explicit causal model not an associative model. Consequently they have to be more explicit about causal mechanisms.
    Climate science have always struck me as being underspecified – the dendrochronolgy mess is a perfect example.
    The danger of Bayesian models is attaching significance to subjective probabilities generated by experts with agendas.

  69. Statistics are a valuable tool that can be used to investigate the secrets of science. Trouble occurs when they are used to develop proof of a scientific theory. One of the techniques that Japanese used when using statistics to improve their manufacturing capabilities was to ask why 5 times. Example, the global average for February was the second warmest in 32 years, according to Christy and Spencer. Why? On his website, Roger Pielke, Sr, reports that this anomality is all due to Greenland being much warmer than normal in February. http://pielkeclimatesci.wordpress.com/2010/03/19/an-example-of-why-a-global-average-temperature-anomaly-is-not-an-effective-metric-of-climate/
    Why was Greenland much warmer than normal? Empirical data and statistics can be used as a tool to investigate this questions. In this case there are 4 more whys to answer.

  70. I’ll second Stephen Skinner’s recommendation of the book, still available, “How to lie with Statistics”. On of the great things about the book is, it show how statistics can be manipulated to a particular view. I have seen that book on many desks of designers, especially in the medical field, for many years .
    The general rule we had for data analysis was to look at a good time history chart, use signal conditioning methods to extract additional information, and resort to stats at the end, to see if there were any “droppings”.
    It is most interesting to seen the proponents of AGW, using stats first, almost as a religion, and not even mentioning looking at charts and graphs to seen if all the data sets make sense. However stats do provide a cover, if one does not understand the underlying principals.

  71. When my children were learning algebra…
    they said to me:
    “Why do I have to learn this? I’m never going to use it.”
    My answer:
    So that no one can fool you with algebra.

  72. Statistics are fantastic for showing coincidental relationships

    JER0ME (01:45:36) :
    As a born mathematician, I have revelled in many aspects of the the subject. I must admit that imaginary numbers and the like made me feel a bit queasy, but I ‘took it like a man’, and accepted it all in the end once I saw the benefit of something that seem so wrong.
    But statistics? I have never, ever, been even slightly comfortable with them. It is all so easy to manipulate, even for very bright people. I have enormous respect for those that can delve into this area and come out with any kind of truth. It is all so easy to be misled, and, indeed, to mislead.
    I have a strong belief that mathematics is a pure subject in its own right. It also forms the basis, or foundation, for physics. Without mathematics we cannot accurately describe the physical world.
    Chemistry then rests on top of Physics, as we eventually find that we canny explain or describe Chemistry without Physics. So further, Biology rests in exactly the same way on Chemistry, we also find.
    Where does statistics come into the equation? Pretty much nowhere IMO.
    Of course, it is almost certainly possible to prove me wrong … with statistics…

    Are you asking me to give the “odds” that you are wrong? LOL
    I can think of may ways statistics are misused, misunderstood, wrongfully applied to probabilities as being prooof and often just wrong in the “science” of global warming.

  73. I was in on the beginning of using computers to store data. I had my sights on
    a career in Forestry. As I pursued my study, it became clear that the life of
    a Forester, Field Biologist/Botanist, Ranger, etc. Was to be mostly spent sorting
    data and putting it in to a data base, with little real field work. Reality was not
    as important as the Stats. Trends.” Make the data fit the hypothesis.” I was
    troubled by that. Finished a general Biology B.S. and left the field. Went into aviation where I did more in the field than I did while in study…
    I think that academia sometimes crawls into that Ivory tower and cuts the
    rope ladder that is used to get up there, trouble is- you cannot get down either…

  74. I really don’t know anything about statistics, but I always wonder about the results of many medical studies, especially those that have a very limited sample size, say only 40 or so female nursing students at one college. The results of these studies are often report in the media as if they are valid for everyone in general.

  75. Sou (02:52:56) :
    (…)
    (Amateur statisticians are the cause of much confusion and misinformation in climate science.)

    True. The work of Mann, Jones et al has been exposed as having such amateurism. You would be wise to ignore them.

  76. Basil (07:02:14) :
    calling attention to something that is quite true: the misuse and frequent misunderstanding of the results of statistical tests.
    We, of course, basically agree. What I was trying to say is that the misuse is rarely done by the scientists themselves as we know how the method works. I am one off and know hundreds of those critters and do not know a single one that has a misunderstanding of this. The misuse is done by people trying to use a scientific ‘finding’ for their own purposes.
    That said, scientists are also people [I state this at the 95% confidence level] so some may try to use statistics to misrepresent the significance of a finding i.e. to fool others rather than fooling themselves. One of the best [worst?] examples of that is in this famous paper [cited – at least – 216 times]: http://www.ukssdc.ac.uk/wdcc1/papers/nature.html
    where they state that they found an “unprecedentedly high and significant correlation”: “The correlation coefficient is 0.91, for which the significance level is (100-4.3*10^-11)%. “. Note the clever way the ridiculous significance level is expressed. Had they used the equivalent form 99.99999999996 % it would have jumped out at you that something was amiss [there were only about 30 data points]. In fact, when I pointed that out to them in http://www.leif.org/research/Reply%20to%20Lockwood%20IDV%20Comment.pdf paragraph 21, they lamely admitted [in “http://www.eiscat.rl.ac.uk/Members/mike/publications/pdfs/sub/239_Lockwood_2006JA011640R.pdf ] that “the significance levels of correlations quoted here all make correction for the persistence of the data (from their autocorrelation functions) and hence the effective number of independent samples [Wilkes, 1995]. This correction was not made in the original paper by LEA99 who, as a result, quoted significance values that were too high. However, even with the correction for persistence, all the correlations presented by LEA99 remain significant at greater than the 99.9% level”. A significant(!) number of 9s were dropped.
    The two [comment] papers contain lots of statistics [some bad, some good]. But my point was that we as scientists don’t really believe that statistics ‘prove’ anything, or disproves anything because as Willis notes: if you can’t see it by eye in the graph, it probably ain’t there.

  77. Ref – Leif Svalgaard (01:48:47) :
    “..Now, if a prediction has been made, statistics can be used as a rough gauge of how close to the observations the prediction came, but the ultimate test is if the predictions hold up time after time again. This is understood by scientists, but often not by Joe Public [his dirtiest secret perhaps 🙂 ].”
    ____________________________
    That’s certainly the way it is supposed to work. (Well.. the way it was suppose to work;-) The problem is that the Scientific Ethic has suffered as much, if not more in some ‘psyentists’, as in every other field of human endeavour. The Guilds do not police their own. Indeed, there are no Guilds anymore. All is chaos. Kind of a “Do Your Own Thing” sort of thing. Dig it, Man? Cool! Oh, neato skeeto!

  78. Leif Svalgaard (01:48:47) : hit the issue pretty well on its head, as did later postings by dearieme, et. al. However, I might add that without the use of statistics in process control, and design of experiments, the modern world of low cost, reliable, uniformly manufactured items might never have come to be. If one is offended by the illogic of frequentist approaches, then by all means look at more comprehensible methods such as likelihood ratios.

  79. That was one sensible article. I think I almost cried.
    Statistics is very easy. When properly used it always includes a rational context that the authors explains very well. Rational context: compiling yearly rain water levels during the last hundred years to see what year got the most. Semi rational: to use the previous result to predict in no uncertain terms future average, or otherwise draw nutty conclusions from, like the it’s a fifty-fifty chance of more rain. Irrational context: to predict future draught or flooding or other doom and gloom scenarios a hundred years from now using the same rain water data as above.

  80. This is idiotic, especially about the quote about an experiment requiring statistics being a bad experiment. There is no escape from the chore of having to analyze and interpret our data, and statistics is still the best tool for that.
    Kitchen knives can cut you but we still keep cooking.

  81. @ PJB –
    and there’s the problem right there. We can’t reproduce the climate and the models that are being used, while undoubtedly very clever are not a match for reality.
    Case in point is the current lack of surface warming. None of the models predicted it so therefore all of the models are unsuitable to base governmental policy on.

  82. Blackbarry (07:14:30) :
    –Heisenberg, uncertainty paper, 1927 […]
    All the statistics do is measure the degree of uncertainty in a chaotic world. As Einstein demonstrated, “It’s all relative”.

    This is a common misunderstanding and has often been misused. The Schroedinger Equation that governs quantum mechanics is completely deterministic and allows no uncertainty whatsoever. The uncertainty principle may perhaps better refer to the difficulty related to how to pin down where a wave is: the bigger the wave, the less sense does it make to state its position with high precision; Think of saying that this monster 50-ft ocean wave off Oahu is 123456.789 inches from the shore.


  83. r (06:54:26) :
    Alas, not even calculus is perfect:
    I was devastated when was shown Gabriel’s Horn:
    Gabriel’s Horn is obtained by rotating the curve y=1/x around the x axis for 1<or =x <or = infinity .

    Include Wicked-pedia in that category too (‘not perfect’); they appear to ignore the constraints on ‘x’ in their graphics thereby _not_ accurately illustrating Gabriel’s Horn:
    http://en.wikipedia.org/wiki/Gabriel's_Horn
    Versus a proper depiction (albeit 2D) if one observes the dotted line indicating x=1 :
    http://local.wasp.uwa.edu.au/~pbourke/fun/calculus/
    Or the graphic shown here at bottom-right:
    http://www2.scc-fl.edu/lvosbury/CalculusII_Folder/Calculus_II_Exam_2.htm
    .
    .

  84. “Statistics can prove anything!”
    “Absolute certainty is a privilege of uneducated minds-and fanatics. It is, for scientific folk, an unattainable ideal.” Cassius J. Keyser (like this one)
    “CO2 is causing AGW with 95% probability!”
    “We are 100% honest scientists! We did not fudge the data!”
    “Statistics can never “prove” anything. All a statistical test can do is assign a probability to the data you have, indicating the likelihood (or probability) that these numbers come from random fluctuations in sampling. If this likelihood is low, a better decision might be to conclude that maybe these aren’t random fluctuations that are being observed. Maybe there is a systematic, predictable, or understandable, relationship going on? In this case, we reject the initial randomness hypothesis in favor of one that says, “Yes we do have a real relationship here” and then go on to discuss or speculate about this relationship”.
    In short, gambling game!”
    So how will this gambling game look like like when the data is corrupted?
    Welcome to the Climate Casino!

  85. Fascinating! Thanks for the post. I think the entire issue can be summed up nicely by this little snippit from the actual article, in talking about the marriage between science and statistics, it said:
    “Whether the future holds a fruitful reconciliation or an ugly separation may depend on forging a shared understanding of probability.”
    It’s always been a rough and ugly marriage, and it may very well end in a painful divorce, but IF, and this is a big IF, if the two can find a shared perspective on what is truly meant by probability, then not only will the marriage be saved, but it will be a very positive advancement for both. (just as good marriage should be!)

  86. DR: “After all the statistical game playing, not one single analysis employed has addressed the problem of each individual station’s integrity.”
    Totally agree, there are just so many ways measurements can be out, and so many ways the readings could be biased in one direction in terms of the trend that most analysis is completely rediculous theoretical clap trap. The basic essential prerequisite for an experiment is that comparisons use the same extraneous conditions. Instead this “experiment” has gone from manual measurements using mercury thermometers in an era when there probably wasn’t even a common idea of a universal time, to an era of automated measurement at one universal time.

  87. Various commentators have mentioned sample size in relation to finding significant p-values. I taught my students that they must make the standard p-threshold in their stats more and more stringent as their comparisons within one data set multiplied. Something called ‘the Bonferroni Correction’ should be used, which is merely shifting the value from p<.05 to p<.01 or perhaps p<.001 in statistical testing that really chops a data set into mincemeat. Hans Eysenck had me do that with one of my papers.
    It is correct to say simply that if significance is found it ensures just that the null hypothesis can be rejected, implying that the experimental hypothesis may be accepted-but, as has been said by commentators above, greater certainty can be gained only by replication of the experiment. In the same way that an anecdote is worthless to support any hypothesis so one experiment that supports a particular hypothsis and its theory is almost as worthless. I know that sounds really hard but if you want to be credible that is the way to go.
    A large data set can be mined to find significance, as has been said, but the larger the data set small effects begin to become significant; it has been said (Bakan) that a statistical significance in a large data set is uninteresting but a dramatic significance in a small data set is much more satisfying, presuming that this small set obeys the prerequisite of having a normal distribution and is otherwise suitable for feeding into the strongest stats test being used (e.g., parametric, interval, etc.). Non-parametric tests such as Chi-squared, Wilcoxon's and so on are as weak as correlations and regressions, these latter tests being most often found being mangled in climate research.
    A 90% confidence interval is not acceptable even in my discipline, psychology.
    If you want to be credible when writing in scientific papers you should never use the terms 'proof', 'proven', 'truth', 'fact'-only 'hypothesis supported', acknowledging that your finding, however exciting you think it is, can only be temporary, contingent, waiting to be either knocked down by the next experiment or, if it is lucky, further supported by some other published paper.
    For crusty old scientists emotionally attached to their little theories their careers usually end in tears as those theories are shot down in flames by subsequent work. Being a scientist is a tough life, one that usually ends in being patronised or disregarded. Chin up!

  88. My impression is that ‘statistical hypothesis testing’ is not done in the ‘hard’ sciences such as physics and chemistry, but used to study populations in psychology, ecology, medicine, etc.
    There is a hard hitting academic paper titled “Statistical Hypothesis Testing As Pseudo Science”. The entire paper is in an academic journal, and was once online in its entirety. Only some of it appears online now, here it is
    http://www.npwrc.usgs.gov/resource/methods/hypotest/?C=M%3BO=A

  89. Leif Svalgaard (08:47:00) :
    Basil (07:02:14) :
    We, of course, basically agree.
    Hey, I had intentionally misspelled ‘basically’ as ‘basilally’. And somehow it got corrected…
    REPLY: No good deed goes unpunished, sorry Leif. – Anthony

  90. Then we’re all agreed: “Liars use figures as one of their tools and when they do their figures cannot be trusted.”
    Next Issue: How do we identify these people before they open their mouth or publish something?
    a. Most held Elected Office
    b. Most charge $100K Speaking Fees
    c. Most live in Energy Inefficient Mansions
    d. Most know nothing about climate change
    Next issue: How do we stop these dispicable excuses for human beings?
    a. 20 to Life
    b. Beheading
    c. Castration
    d. Draw & Quarter

  91. An example of misleading statistics:
    Water vapor is around 1% of the atmosphere, a molecular constituent of air.
    But the 1% figure is misleading because it is an ‘average’ of the entire volume of air in the atmosphere.
    Averages are a product of statistical work-up.
    But in the real atmosphere water vapor is concentrated in some volumes of air (and constantly moving and forming) and tenuous in other volumes of air.
    And, these varying concentrations of water vapor do have an impact on temperature retention absorbtion & release.
    Clearly, a scientist must take into account the specific concentration of water vapor in any given body of air mass.
    Simply considering the average water vapor percentage will not tell the scientist how water vapor acts in the atmosphere.
    One must take into account real time water vapor behavior to understand its contribution to atmosphere behavior, and, thus, climate.
    Averages won’t contribute to that understanding — in fact — the average will mislead.
    Because “averages” in many instances are not how the physical relationships of chemical constituents in a body of gas interact.

  92. This thread is too long, but, I haven’t seen it stated yet:
    Statistics is the science of the behavior of numbers.
    Remember that.
    Rarely does the real world behave like numbers, to which you can assign all sorts of characteristics, and not have any unknown behaviors in your numbers.
    If you doubt the utility of statistics and science, take a couple of courses in the statistical design of experiments. It is breath taking what clever people can do if you let them design the experiments, not call in the clever people after you have done some unorganized experiment and have a bunch of trash data.
    Almost nobody in medical research has any appreciation for statistical design of experiments, or even statistics.
    When I was a freshly minted M.D., I and studied stats on my own, I began to read the medical literature from the viewpoint of their statistical work. Mostly just trash. Even large studies were trash. Every study was meant to prove a point. And, this was before drug companies began to pay for research.
    Problem is, these people get rewarded for bad results.
    My brother in law designed software for the Navy once, for a submarine. For the maiden voyage, the software designers dove with the sub. You can be confident he was sure it would work.

  93. James F. Evans (09:30:31) :
    An example of misleading statistics:
    Water vapor is around 1% of the atmosphere, a molecular constituent of air.

    Except that that statement is not an example of a statistics, but just a statement of fact. Not every number that is calculated or determined is ‘statistical’. Statistics is about drawing and asserting conclusions from the data, not about the data themselves.

  94. Brent Hargreaves wrote (06:01:39) :

    (ii) That the warmists and the sceptics stand either side of a profound philosophical gulf. They are determinists, confident that the forecasts are founded on such solid science and such solid initial conditions that the future of the climate is more pridictable than it actually is. We are Chaoticists, conscious of “known unknowns” and wondering whether there remain “unknown unknowns” yet to emerge.
    I recently tried to discuss this philosophical divide with a bunch of warmists, but was labelled a know-nothing-numpty.

    They are forced into that position because if the climate is chaotic it then follows that, firstly if reduce CO2 to pre-industrial levels that it will not necessarily return the climate to pre-industrial conditions, and secondly it place serious doubts on the chaotic computer models ability represent the chaotic climate. I laugh to myself every time someone says that “Weather is chaotic but Climate isn’t.” because chaotic systems are by definition self-similar at every scale, so if on the small temporal scale of weather the system is chaotic it also is on the larger temporal scale of climate. Big chunks of Chaos theory were either discovered or rediscovered by Meteorologists and digital computing devices.


  95. Michael (08:52:07) :
    This is idiotic, especially about the quote about an experiment requiring statistics …

    ‘Needs’, the quote used the word “needs statistics”, not ‘requires’; small, subtle, but important difference I think …
    ‘Needs’ is more akin to an “if necessary” qualifier than the much stricter qualifier ‘requires’, also, if an experiment ‘needs’ statistics to ‘winnow out’ an observation, it probably:
    a) isn’t clear to the naked, unaided eyeball and
    b) requires the use of those ‘statistical’ methods to qualify the result to some singular number (or numbers) by which success or failure is scored.
    Kinda like Climate Science; statistical techniques are needed (AKA necessary) to fudge (‘cool’) the past numbers in order to show ‘warming’ in the present … looking at raw, uncooked data (for clean, un-UHI contaminated sites, for instance) does not indicate the warming that the statistically-cooked, massaged data shows.
    Therefore, statistical techniques are needed in ‘Climate Science’ to ‘prove’ their case thereby making it a bad experiment scoring by Rutherford’s rule:

    “If your experiment needs statistics, you ought to have done a better experiment.” – Lord Ernest Rutherford

    .
    .

  96. My pet concern, I have seen it many times in biology/biochemistry, is when people assume the thing they are measuring is normally distributed when that is not at all clear from their data.

  97. Leif Svalgaard (09:25:01) :
    Leif Svalgaard (08:47:00) :
    Basil (07:02:14) :
    We, of course, basically agree.
    Hey, I had intentionally misspelled ‘basically’ as ‘basilally’. And somehow it got corrected…
    REPLY: No good deed goes unpunished, sorry Leif. – Anthony

    Since you didn’t see it, my pun wasn’t any good to begin with 🙂
    Missed you at ctm’s great party last night. Seven police cruisers were standing by outside [only half a block away] the joint.

  98. Decades ago, Dr Edward Deming, who some call the father of Quality Control, recognized that theoretical statisticians were needed to help ensure companies correctly interpreted quality measurements. assignable causes of variability, the signifigance of trends, etc.
    Its my impression climate scientists do not engage the statistical community in the formulation or review of their work. They seem to just plug and play statistical tools. Sorry, that’s what it seems. Correct me if I’m wrong.

  99. Luboš Motl (06:26:43)

    Holy cow, it is a silly article.
    There is a substantial portion of science where work without statistics would be almost impossible – and be sure that you’re hearing this from a person who almost always used “non-statistical” arguments about everything. That people make errors or add their biases or misinterpret findings can’t reduce the importance of statistics. People do mistakes, misinterpretations, and distortions outside statistics, too.
    The notion that statistics itself should be blamed for these human problems or that it is inconsistent because of them is preposterous. Even in the most accurate disciplines, like particle physics, it’s inevitable to work with statistics. It’s a large part of the job. And people usually don’t do flagrant errors because scientists in this discipline don’t suck.
    One can have his opinions about the ideal methodology and/or required confidence level, but dismissing all of statistics is surely about the throwing of the baby out with the bath water.

    Lubos, perhaps we are reading different articles, but there’s nothing in the article that I see that says we should throw out all statistics.
    Instead, it says that most of the time people misuse statistics. I agree with that completely. In climate science in particular, we are dealing with non-normal datasets that have a high Hurst coefficient, which makes most statistics unreliable.
    That’s the main problem I see, that statistics are applied improperly.

  100. PJB (05:55:15) :
    “Having used statistical methods in my research, I always distilled my results down to one aspect of analysis.
    Signal to noise ratio.”
    I have always said that Information Theory ( and Parsing) is the only tool for analysis of data streams. Temperature data is a stream in time and space. So you are absolutely correct. It all comes down to Signal to Noise.
    Last night, after I read about the launch of the latest GOES, I started thinking about some of the stuff I had been involved with in the past. Which got me to thinking about digital communications and temperature anomalies. (Okay so I’m weird.) Without understanding the nature of the noise sources and the noise distribution, any analysis becomes garbage. The simple case of calculating the noise power in a modulated analog data channel with Gaussian noise, is non-trivial and a naive analysis will result in a 2.5 dB bias! Introduce non Gaussian, or heaven forbid, non monotonic noise and the results will be completely meaningless.

  101. Probably one area where the statistics are extremely good is in actuarial work. Insurance companies need to predict the probabilities of various outcomes with some accuracy. These will always have the most up to date thinking behind them as it is the difference between profit and loss. They will try hard to eliminate any bias.
    It would be interesting to know how their premiums are changing on large weather events, such as hurricanes, cold snowbound winters, etc.

  102. Another important thing with statistics is to design the experiment so as to provide the outcome you want.
    One widely reported experiment suggested shoot-em-up games made people more violent and agressive, but their reactions were measured as soon as they came off the computer when they would have had a lot of adrenaline in their system rather than a few hours later when they would be their normal self.
    Obviously it was widely used as ‘proof’ about video games rather than just an example of poor sience.

  103. And let’s not forget Disraeli’s famous comment about statistics:
    “There are three kinds of lies: lies, damned lies and statistics”
    -Attributed to Benjamin Disraeli (1804-81), British statesman and Prime Minister (1868, 1874-80), in:
    Mark Twain (Samuel Langhorne Clemens), U.S. writer and humorist (1835-1910), Autobiography, “Notes on Innocents Abroad”

  104. Evans wrote: “An example of misleading statistics:
    Water vapor is around 1% of the atmosphere, a molecular constituent of air.”
    Leif Svalgaard (09:41:53) replied: “Except that that statement is not an example of a statistics, but just a statement of fact.”
    Has Science observed & measured for H2O at every location possible to confirm that indeed that H2O does “average” out to one percent?
    No. but due to the confidence in Science’s understanding of average under the circumstances, Science understands what is present (physical properties) and operative (based on what we know) specific results will happen and/or conditions will exist.

  105. Statistics is a branch of math, no more and no less! Limitations are due to the choices we make as statisticians.
    Who can argue with the Deming Method? Seems to be well-proven over time. Statistical process control in automation is standard, we cannot manufacture in today’s environment without it.
    However, in terms of scientific studies, I am frustrated by small sample sizes, sampling errors, mistreatment of outlier data, tagging on regressors etc.
    Climate science seems to commit all of these sins, and many more, because it is policy driven, not driven by the need for accurate and replicable results.

  106. @joel
    ‘Statistics is the science of the behavior of numbers.’
    Rather the frequency of numbers, since the numbers themselves never behave. :p
    On the rest you are correct I think. Properly used it is a very good tool.

  107. Another statistics quote (from my degree course ages ago): “The generation of random numbers is far too important to be left to chance.”

  108. I remember a study (with policy implications) that assumed (to avoid intractable mathematics) that once a female is pregnant once, that female is never pregnant again.
    The folks selling this stuff were no slouches. If there had been a mathematically tractable way to make realistic assumptions, I’ve no doubt they would have. The complacent attitude that has taken deep root in mainstream science modeling culture: “We tried – now let’s just go with it”.
    People working in policy need to weed out the stuff that fails in “trying too hard” (unsuccessfully) to appear objective. In its lack of sobriety, the drunken “publish or perish” mill has one reliable function: Water down quality.
    i.i.d. = illusory inconvenience distortion …and they told you “independent identically distributed” ….daily – as the base assumption underpinning (literally) almost everything. “Let X1, X2, X3, … ~ i.i.d. …” — and so the web of mass deception began… (mortgage meltdown, climate alarmism, … – what’s next?…)

  109. You can prove anything you want with statistics.
    For example, Mt. Everest doesn’t exist. Mountain climbers always “cherry pick” their start location to achieve the appearance of an uphill slope. Had they correctly used the longest possible trend (the entire planet) they would have know that the average slope of the earth is zero and that mountains don’t really exist. If you shrunk the earth to the size of a pool ball, it would be smoother than a pool ball is.
    That argument sounds and is ridiculous, but is exactly analogous to arguments used by people on all sides of the climate debates. When you hear the term “statistically significant” you can be pretty sure that someone is pulling a fast one.

  110. Completely agree with Luboš Motl (06:26:43), hooolllyyyy cow.
    This article is complete.. eh.. nonsense (..other ‘expressions’ actually came to mind). WUWT should stay sharp.
    If anybody is interested in actual statistics, I encourage you to take a look at this thread here:
    http://ourchangingclimate.wordpress.com/2010/03/01/global-average-temperature-increase-giss-hadcru-and-ncdc-compared/
    Statistics just challenged this:
    http://scholar.google.com/scholar?q=OLS+trend+temperature+climate&hl=en&btnG=Search&as_sdt=2001&as_sdtp=on
    With statistics, just as like any other formal method, if something goes wrong, you usually get the pop-up: “Error, hit any user to continue”. This however does not invalidate this (FORMAL) discipline.
    For a case study exemplifying this, and involving Tamino, see also:
    http://ourchangingclimate.wordpress.com/2010/03/01/global-average-temperature-increase-giss-hadcru-and-ncdc-compared/#comment-1643

  111. I read Luboš Motl’s rebuttal and found that it wound up saying almost the same thing as the “…..stands on a flimsy foundation” article. I think Siegfried’s point is that a large number of scientists place faith in a method that they don’t adequately understand and use properly. The scientific journal editors and peers similarly are weak in understanding and application so they let the stuff go through.
    What has happened in the last half century or so is that the “soft” disciplines, which used to be verbal and anecdotal wanted to move into the glow of being a science. Political Science is the king of these beasts and psychology the queen. The texts of psychology used to be prosaic insights into why people behaved the way they do – similar to detective work.
    To become sciences, they had to find a way to quantify all this jive. Statistics at the 101 level became the tool for quantifying and hey, if you asked a random selection of 1000 citizens who they were going to vote for, you could report with 90%(?) confidence and were most often correct. Tree rings and things are either confounded by several variables in addition to temperature. Or the right questions aren’t being “asked”.

  112. James F. Evans (11:24:46) :
    Has Science observed & measured for H2O at every location possible to confirm that indeed that H2O does “average” out to one percent?
    It doesn’t have to. We sample the concentration at enough places that we get a good average number. It is only misguided people who thinks there is something wrong with science that can misrepresent this. Anyway, your statement is not a statistical inference, so is O/T.

  113. Willis Eschenbach (10:48:02) :
    “Instead, it says that most of the time people misuse statistics. I agree with that completely. In climate science in particular, we are dealing with non-normal datasets that have a high Hurst coefficient, which makes most statistics unreliable.”
    We don’t have a ‘non-normal’ dataset with a high ‘Hurst coefficient’. We have a time series containing a unit root. Nothting ‘strange’ about that, many time series contain one.
    I’ve been debating (more like a ‘war’) this with AGWH-proponents for two weeks now at the above link (take a look at the last one, extensive test results posted).
    Really, to use ‘AGWH’ terminology, this is a complete ‘anti-science’ article.
    Bah. Stay sharp WUWT.

  114. I am 99 percent confident that this thread belongs to the top 2 percent of all posts that I have read in WUWT. Thank you 100 percent of all contributors for making statistics something to read about at least once in a lifetime. 🙂
    There was a long thread about “random walk” and statistics elsewhere in the blogosphere but most of the stuff there went over this layman’s head and the thread got derailed by Tamino’s gargoyles, especially his Gargoyle-in-Chief, dhogaza.
    Anyways… So, I’d like to highlight a particular comment here that finally helped me understand what was being discussed there. Could temperature trends be nothing more than a “random walk”? Interesting question. And I think steveta_uk’s DIY experiment gives an interesting answer. An expanded version of the experiment (which can be done safely at any home) is worthy of a separate post on WUWT.

    steveta_uk (04:31:35) :
    After reading some of the “random walk” posts recently, I thought I’d try a little experiment, which consisted of writing a bit of C code which generated pseudo-temperature records, based on a random +- 0.1 annual deviation from the previous year, centered around 15C, and with a bias factor that made the temperature drift towards 15C if it starts drifting away.
    So this 15-minute job produced 10,000 years of temperature records which I imported into a spreadsheet and drew some pictures.
    There’s basically with a boring average close to 15, and lots of apparent noise between 13 and 17C. But zoom in a bit, and you see features like little ice ages, medieval warm periods, “hockey stick” features, and all sorts.
    And apply some of the trend analysis functions to selected parts of the “noise” and it finds all sorts of things.
    And it’s all random.

  115. Steve Goddard (12:04:19) :
    Mountain climbers always “cherry pick” their start location to achieve the appearance of an uphill slope.
    Like when picking when to start a snow cover slope? Putting numbers to something is better than just eyeballing, no?

  116. What we need is a little lateral thinking here:
    Our resident experts here have siad that you can’t use statistics to prove anything.
    But Gordon Brown has been using statistics for 13 years to prove that he’s a crook.
    [OK. Some pedant will tell me it wasn’t the statistics is was the statist.]

  117. Evans (11:24:46) wrote: “Has Science observed & measured for H2O at every location possible to confirm that indeed that H2O does “average” out to one percent?”
    Leif Svalgaard (12:23:18) replied: “It doesn’t have to.”
    As I suggested before, “no,” [it is correct that Science doesn’t need to measure from every possibly location] but it is a “statistical” inference.
    A product of a statistical work-up:
    Based on prime observations & measurements and given Science’s understanding of the relevant physical material and conditions.

  118. Guys, for the 100th time, don’t fall for Tamino’s strawmen.
    Nobody claimed temperatures are a *random walk*. We claim that the series contains a *unit root*. A random walk implies a unit root, a unit root doesn’t imply a random walk.
    The presence of a unit root invalidates most ‘trend analysis’ as performed in climate science, because such analysis implicitly assumes the underlying data generating process to be a ‘trend-stationary’ one. However, extensive (formal) testing (see link) has shown this not to be the case (i.e. the series is non-stationary).
    This is a simple implication of a whole body of literature establishing the presence of unit roots in temperature series.
    Read this comment, by Alex, it’s enlightening:
    http://ourchangingclimate.wordpress.com/2010/03/01/global-average-temperature-increase-giss-hadcru-and-ncdc-compared/#comment-1931
    Again, for a detailed analysis of where Tamino ‘went wrong’, and links to all the relevant test results, see:
    http://ourchangingclimate.wordpress.com/2010/03/01/global-average-temperature-increase-giss-hadcru-and-ncdc-compared/#comment-1643

  119. @Steve Goddard
    ‘You can prove anything you want with statistics.’
    I say people only think that they, and usually because they don’t really understand.
    Although I don’t think I really need to explain this to you, but then again it is a saturday. A statistical representation of reality is, and will always only be, the complete raw plot of everything in it’s own context. So, the reality of the last hundred years’ temperatures are only accurately shown by plotting all temperature readings for that period that were taken the same hour on the hour for every day for every year for the last hundred years. The reality that is reconstructed is only for one specific hour on the hour though, no more and no less. Every calculation done on that data are mostly bulls— unless otherwise accurately stated, however, and most importantly the calculations wont ever describe reality, other than the odd random occasion.

  120. VS (12:25:42) : edit

    We don’t have a ‘non-normal’ dataset with a high ‘Hurst coefficient’.

    Say what? The Hurst coefficient of the global temperature datasets are on the order of 0.8 … how is that not high?

  121. James F. Evans (12:42:19) :
    A product of a statistical work-up:
    Based on prime observations & measurements and given Science’s understanding of the relevant physical material and conditions.

    And is thus entirely valid and the best that humankind can and need to do, right. An example of good use of statistics, in your opinion.
    But you are wrong. It is not a statistical inference, it is a simple coverage-weighted average of the individual measurements. There is no hypothesis to be validated or disproved, etc.

  122. Willis Eschenbach (12:56:01) :
    The Hurst coefficient is not applicable here because the series has a unit root. In other words, it is non-stationary.
    The Hurst coefficient is relevant for stationary series, as is ‘trend inference’.
    I reiterate: the temperature series are NOT trend stationary with a high persistence. It is integrated, or I(1), and contains a unit root. Under the link given above, you will find the test results to support that assertion.
    You seem like a devoted and intelligent individual and I enjoyed many of your earlier posts, so I strongly suggest you read the thread in question (thousands of words, many trolls, but you can start with my links given above).
    Most of the statistics employed in climate science (see scholar link above) is in fact obsolete in the presence of a unit root (which we have established, again, extensive test results posted under link).

  123. Statisticians have a sacred thing called “maximum likelihood”. This thing leads to things like the global financial meltdown when it goes wrong.
    Everything rests on the “i.i.d.” assumption. Get that wrong and here is what happens due to the way the calculations go:
    WRONG MULTIPLIED BY ITSELF N TIMES – i.e. WRONG^N
    Confronted with this observation, a quick-witted statistician countered with, “Well, once you take logs it’s only additive.” He was referring to what statisticians call “log-likelihood” — it’s a log that gets optimized. So his defense was that when the “i.i.d.” assumption fails (which is literally almost all the time in many fields of study), results are only n-times wrong, rather than wrong^n.
    I remember a horrified look on another statistician’s face when I suggested that this practice should not be used where the “i.i.d.” assumption is untenable. Why the look of such horror? A whole paradigm is built on the assumption. It is therefore (in the minds of some) not open to challenge.
    Note to academic statisticians:
    I appreciate the elegant abstract mathematics, but do honesty, integrity, & reality mean anything to you folks? Or is it all about mathematical convenience?
    [ :
    All that’s on offer to ecologists, physical geographers, economists, climate scientists, etc. through official academic channels (as statistics “outreach” to the wider community) is methods based on untenable assumptions. There’s no benefit in looking to the wrong people for the wrong methods.
    One could spend several lifetimes buried in endless literature that has no application in reality (due to untenable assumptions). I wouldn’t recommend lifting a finger to study such methods formally (but I can see the appeal for those looking to join a smirking guild of deception).

  124. VS (12:44:28) :
    Guys, for the 100th time, don’t fall for Tamino’s strawmen.
    Nobody claimed temperatures are a *random walk*. We claim that the series contains a *unit root*. A random walk implies a unit root, a unit root doesn’t imply a random walk.

    Well, I guess I am one of those guys, and I’m happy to stand corrected. Great to see VS discussing the issue on the WUWT.
    Off topic: Bishop Hill has just reported that Geological Society is seeking submissions from its members to prepare a position statement with regard to climate change. This was long time coming. Geologists are finallly on the march!

  125. VS (13:14:45)

    Willis Eschenbach (12:56:01) :
    The Hurst coefficient is not applicable here because the series has a unit root. In other words, it is non-stationary.
    The Hurst coefficient is relevant for stationary series, as is ‘trend inference’.

    Thanks, VS, I’ll read the thread in questions.

  126. I agree with what Lubos Motl says.
    I worked on an experiment where we “saw” a Higgs at the 2 sigma(95.4499736%) level. You have not heard of it, because three other experiments saw nothing, so only a limit was set.
    In particle physics to establish a resonance/particle we required 4 sigma (99.993666%).
    One got excited with 3 sigma effects (99.7300204%), but I have seen 4sigma effects that were not reproduced, because of too many cuts on the data.
    In the end repetition of experiments is what is crucial.

  127. yep,
    Climate scientists have convinced a lot of people that CO2 is causing changes that can only be detected by teasing a tiny, tiny signal out from very noisy data with arcane statistical techniques, and that once that tiny signal is detected we can conclude that we must restructure the worlds economies to stave off catastrophe. If we wait ’till we can see the damage (sans arcane statistics) it will be too late.
    Seems to me that statistics are the most powerful tool ever employed by scientists.

  128. VS
    I very much enjoyed reading the thread as it unfolded over many days. As you say, there were many trolls-one of which pops up in all sorts of places but mainly on the Guardian stories supporting George Monbiot. As far as I am aware neither Dr Mann or Jones are particularly statistically literate, which causes problems as their subjects demand a high level in that skill.
    tonyb

  129. [quote anna v (13:36:37) :]
    I agree with what Lubos Motl says.
    [/quote]
    Me too. No need to throw the baby out with the bathwater.
    Mike Mann, for example, not using statistics correctly does mean statistics cannot be used correctly.

  130. Statistics: With one foot in a bucket of ice and one in a hot frying pan….well, on average I feel pretty good!

  131. No mention so far that p-tests control the probability of Type I errors (rejecting a valid null) while failing to control the probability of Type II errors (failing to reject an invalid null). These latter can get very large for small sample sizes for even small deviations into the range of the alternative to the null.

  132. “There is increasing concern,” declared epidemiologist John Ioannidis in a highly cited 2005 paper in PLoS Medicine, “that in modern research, false findings may be the majority or even the vast majority of published research claims.”

    It seems Siegfried bases his conclusions mostly on retrospective studies where one must assume inclusion criteria fits the objectives of the study. In such cases, it is not always the statistical analysis that produces a false result but rather the working assumption that the data is applicable to the degree necessary to avoid false conclusions. In clinical studies, patient enrollment is better controlled to the study’s protocol, making the data quality, validity, and end statistical analysis far more reliable.
    As a general rule, doctors and researchers pay less attention to retrospective studies as compared to clinical studies because the statistical problems of retrospective studies are well understood. There is a growing trend in all areas of medical research to require a qualified statistician to at least check the end analysis of a study submitted for publication. Indeed many IRB’s (institutional review boards) require a study to include analysis by a qualified statistician to be approved.
    Giving considerable thought to Siegfried’s implications, I’m hard pressed to imagine a better objective means to qualify the findings in research analysis outside of the traditional statistical tests. In fairness to his points, it is important that the tests be oriented towards disproving the null hypothesis as opposed to proving the working hypothesis. Disproving the null hypothesis is more robust and reliable. Tests to prove the working hypothesis are far more prone to committing Type I errors (rejecting the null hypothesis when the null hypothesis is true). I believe that is where many studies contain serious statistical flaws. I see a lot of studies bent on proving of the working hypothesis in climate science which is why I remain skeptical of their findings.
    Statistics remains the workhorse in formulating study conclusions, unless, of course, you’re into post modern science. In such a case, consensus will be the method of decision. Comparing consensus view of treatment practices vs. study findings I’ve personally worked on, I’ve seen too many times that the consensus view was wrong. I recently co-authored a published abstract that showed conclusively that the consensus wisdom of treating for inter-amniotic hyperechoic matter using antibiotics is not only unwarranted but exposes the patient and fetus to undue risks.

  133. James F. Evans (14:17:21) :
    “-weighted average”
    …is a statistical term…

    Wrong kind of statistics, we were talking about inferences, not description. There comes a time, when you should stop digging. Your example was not wrong use of statistics.

  134. Leif Svalgaard (13:01:34) wrote: “There is no hypothesis to be validated or disproved, etc.”
    Yes, there is, “no hypothesis to be validated or disproved, etc.”, rather, there is a law of physics to be applied, a theory, and all theories started off as hypotheses.

  135. Here’s a nice topical example of “the beauty of statistics.”
    Approximately 2,420,000 Americans die each year. Approximately 18,000 uninsured Americans die each year.
    Therefore, you are 133.44 times more likely to die if you are insured than if you are uninsured.
    We do not need universal healthcare. We need to ban insurance altogether.
    Silly you say? Consider Great Britain where there is universal health insurance. 100% of the population who die do so while insured. Think of the lives that could be saved if G.B. insured only half the population.
    I rest my case.

  136. steveta_uk (04:31:35) :
    After reading some of the “random walk” posts recently, I thought I’d try a little experiment, which consisted of writing a bit of C code which generated pseudo-temperature records
    You might enjoy Dr. Spencers two term GCM spoken of in http://www.youtube.com/watch?v=xos49g1sdzo
    He calls it a “Minimalist’s Global Climate Model”. It works more properly than a mere random walk and he gets into white noise, red noise, and the pink noise he uses.

  137. Leif Svalgaard (09:25:01) :
    Leif Svalgaard (08:47:00) :
    Basil (07:02:14) :
    We, of course, basically agree.
    Hey, I had intentionally misspelled ‘basically’ as ‘basilally’. And somehow it got corrected…
    REPLY: No good deed goes unpunished, sorry Leif. – Anthony

    🙂
    But Anthony didn’t correct Leif’s “I am one off…” and I’m not sure if this was intended or not. 🙂 I think Leif meant “I am one of…”
    But Leif is more of a “one off” scientist than I think he realizes. I’m glad the hundreds of scientists he knows are not among those who abuse statistics. But how should the thousands of scientists who saw their papers warped into the IPCC “Treatment of Uncertainty” feel about the “>66% = likely” nonsense? And weren’t the people who created this “Treatment of Uncertainty” supposedly “scientists?”

  138. James F. Evans (15:09:08) :
    rather, there is a law of physics to be applied, a theory, and all theories started off as hypotheses.
    You are rambling, spare us and stop digging.

  139. jdn: This Science News article is an example of people speaking about statistics who don’t know what they’re doing or who are misrepresenting the field. Nobody uses p = .05 anymore unless they have really rare data. We like to see p = 0.01 or 0.001.
    Yes, my BS alarm went red when I read that.
    I still think the article raises important points, though, and it’s highly relevant to WUWT. As far as I’ve been able to tell, a lot of the work in climatology violates two principles of statistics: 1) Samples should be random (the tree-ring guys seem to use subjective judgments of data sets instead of choosing randomly from them) and 2) You can’t reuse data (As far as I can tell, they run tests on the same data sets with which they build their models with exploratory methods like e.g. PCA. You can’t do that, in order to test your model you need a new randomly sampled data set from the same population).

  140. Nothing is more ridiculous than the use of statistics to claim the probability of global warming. There is no scientific basis for claiming statistics can help predict the amount of warming or anything else based on some previous period of temperature data. How would the climate scientists of the cold 1700s’ done with temperature data from their period and using statistics to predict temperatures in the 1800s’.
    And as far as computing a global average temperature using sample data sets that are then “homogenized”; how can the larger scientific community not call this out for the pseudo science that it is?

  141. Basil (15:50:07) :
    I think Leif meant “I am one of…”
    Yeah, yeah, ..
    But how should the thousands of scientists who saw their papers warped into the IPCC “Treatment of Uncertainty” feel about the “>66% = likely” nonsense?
    There is probably a disconnect here. >66% is ‘likely’, but scientists take that with several grains of salt anyway. And do not think the word is as ‘strong’ as Joe Public does. For example, In a court of law and in dealings with the IRS, the phrase ‘more likely than not’ is taken to mean ‘indicative’ rather than ‘probable, e.g. http://www.pwc.com/en_US/us/tax-compliance-services/assets/fin_48_tax_penalty_standard.pdf
    And weren’t the people who created this “Treatment of Uncertainty” supposedly “scientists?”
    see above.

  142. Anthony,
    In the lead paragraph you mention, in summarizing the article that: “2- It pulls no punches in pointing out an over-reliance on statistical methods can produce competing results from the same base data. ”
    Is your point to stress that competition is positive or is your point to stress that it is negative?
    I think it can only be positive.
    John

  143. Someone in this thread said something about “the Gaussian assumption”. I suggest you go back and read about the Central Limit Theorem.
    I have always felt that statistics is one of the most extraordinary achievements of the human mind. But using it well takes a lot of experience and effort. (And it is certainly not a guarantee to _scientific_ significance – in fact, some of the greatest experimental scientists, such as N. Tinbergen or C. S. Sherrington, used little or no statistics in their work.)
    Someone also said something about only accepting results beyond 4 sigma – but what was the sample size? The 0.05 criteria is reasonable, but not if the sample size is so large that even small deviations from expectation are unlikely.

  144. VS, I have now slogged through the entire post you cited, most fascinating.
    My only remaining question is this. Hurst derived this theory and his “Hurst statistic” from a climate dataset, the flow of the Nile. You say that Hurst statistics don’t apply to climate datasets … what am I missing here?
    I also found an interesting paper here about unit roots and the Hurst statistic that I would love to get your comments on if you have time.

  145. Evans (14:17:21) wrote: “-weighted average”
    Leif Svalgaard (15:07:54) replied: “Wrong kind of statistics…”
    Hey, you’re the one using a statistical term. I can’t help it if you use a term and then turn around and say, “wrong kind of statistics…”
    Sounds like you’re shuffling the terms of debate.
    Evans (15:09:08) wrote: “rather, there is a law of physics to be applied, a theory, and all theories started off as hypotheses.”
    Leif Svalgaard (16:05:27) replied: “You are rambling…”
    Sorry, not rambling at all.
    Dr. Svalgaard previously claimed, “There is no hypothesis to be validated or disproved, etc.”, which as I previously stated is correct, and it is so because originally the hypothesis was demonstrated via experiment so many times and in many different experiments that it is deemed a Theory, a set of physical relationships that is so well established that it is deemed a ‘physical law’ or part of what is deemed a sub-set of a physical law.
    Silly Dr. Svalgaard, so determined to disagree, he can’t even recognize when I’m acknowledging that statistical analysis can be applied to physical relationships to derive further understanding.
    The point others and I have made here on this thread about statistics, “There are three kinds of lies: lies, damned lies and statistics”, is that statistics are subject to improper use, application, and interpretation, that is to say, abuse or improper use, whether intentional or unintentional, or they may not convey any useful information.
    Like any powerful tool, and statistics is a tool, as is mathematics, it can be used for profit and understanding or misused for confusion and loss.
    This is true for any tool and statistics is no different.
    Which way do you want it, Dr. Svalgaard, statistics are a useful servant to physics or…what?
    But lets not obscure my original point (09:30:31): “An example of misleading statistics: Water vapor is around 1% of the atmosphere, a molecular constituent of air. But the 1% figure is misleading because it is an ‘average’ of the entire volume of air in the atmosphere.”
    Of course, the 1% figure for water vapor in the atmosphere is a data point, often refered to as a statistic, perhaps colloquially, but still often used; as I stated above, the 1% figure holds limited utility for the scientist.
    Statistics are often a breakdown of a larger whole into its constituent parts and proportions…hey, that’s what a percentage is, a proportion or part of a whole.
    And, particular mathematical equations can be applied to derive the percentage based on known physical relationships.
    Which side of the argument are you on Dr. Svalgaard?

  146. For a continuous education in statistics follow the link in the sidebar to William Brigg’s website.
    http://wmbriggs.com/blog/
    A recent entry discussed parameters in models and how one can sometimes compute the the confidence interval for one of the parameters to show that the value is correct to a small margin of error, GIVEN that the model is correct. The message was of course that statistics said nothing about the correctness of the model.
    For example, I made a Mayan climate model that is exactly the same as the best computer model existing but all outputs are multiplied by zero after 2012. It will match all statistical tests until then so one cannot use statistics to show my false model is any worse than the existing model.
    John von Neumann, “With four parameters I can fit an elephant and with five I can make him wiggle his trunk.”
    With more than five parameters the modelers still cannot get rid of the tropical hot spot from their outputs. Then some argue about the data.

  147. rw (17:23:59) :
    Someone in this thread said something about “the Gaussian Someone also said something about only accepting results beyond 4 sigma – but what was the sample size?
    There is some confusion between inductive statistics and descriptive statistics. The 5-sigma physicist-criteria is almost always about descriptive, not inductive, statistics. One measures a ‘blip’ in the counting rate and wonders if the blip rises above the noise. This is very different from trying to tease a trend out of the data.

  148. juandos (03:58:07) :
    “Hmmm, regardless of the numbers and quality of data sets how can one model something like climate if all the subtleties aren’t understood?
    I’m sorry for asking what might be a seriously dumb question but I keep tripping over the butterfly effect…”

    So do the climate models!
    ‘Climate’ is driven by deterministic chaos and is in constant change at all spacial and temporal scales. Trends have no meaning in non-linear systems, and can easily be cherry-picked to support or refute any hypothesis tabled. The current style of pseudo-statistical science is a travesty.

  149. VS,
    Thank you for your thread participation over at Bart’s “http://ourchangingclimate.wordpress.com/2010/03/01/global-average-temperature-increase-giss-hadcru-and-ncdc-compared”
    I went through it a few days ago [repeatedly].
    You made me want to be a statistician. I never had that desire before.
    New subject: Can you provide guidance on the standard [frequentist] vs the Bayesian approaches. Is there a fundamental difference, or is it just a difference in emphasis?
    John

  150. VS, a question if I may.
    If a temperature series indeed contains a unit root, then the temperature series must diverge to infinity, and since it cannot go to – infinity due to the laws of thermodynamics, it must go to positive infinity,
    therefore global warming is caused by statisics,
    the following is from the wikepedia article on unit roots
    As noted above, a unit root process has a variance that depends on t, and diverges to infinity

  151. Luboš,
    In your ‘rebuttal’ you said,
    “””””””And quite often, your data simply don’t contain enough information to decide. This is not a bug that you should blame on the statistical method. The statistical method is innocent. It is telling you the truth and the truth is that we don’t know. The laymen may often be scared by the idea that we don’t know something – and they often prefer fake and wrong knowledge over admitting that we don’t know – but it’s their illness, their inability to live with what the actual science is telling us (or not telling us, in this case), not a bug of the statistical method.””””
    Luboš,
    Good stuff.
    A person who fully faces reality and the many uncertainties, yet does not seek the false security of comfortable nonreality theories/beliefs . . . . that is a special human.
    Your essay is not a rebuttal, it has a stand alone merit. Just edit it slightly to remove the ‘rebuttal’ part.
    Please see if you can do your ‘rebuttal’ post at WUWT by itself, rather than just as a side note to this Tom Siegfried post.
    John

  152. Espen (16:05:52) :
    >I still think the article raises important points, though, and it’s highly relevant to WUWT.
    I liked the topic, but, I felt the authors had an agenda…. and were writing badly in Science News 🙂
    Here’s a nice article if you can see it: http://jrsm.rsmjournals.com/cgi/content/abstract/101/10/507
    They had reviewers examine fake articles in medical research with deliberate errors to see how many of the errors were caught. As you might imagine, the results were not good. Such exercises would be worth doing elsewhere. The climate journals can’t be trusted to self-police. It would have to be done without them knowing they were being tested.

  153. As I see it, statistics have to be applied to (and may only properly be applied to) true indeterminate errors of measurements (every measurement excepting true counting involves an indeterminate error) – that is the only way hypotheses can be tested, when the true indeterminate errors of measurements fall within known deviations of means of measurements.
    Statistics are improperly applied to establishing limits of confidence intervals – when those confidence intervals can be adjusted evidently at will by assumptions made to fit some hypothesis, neglecting assumptions consistent with the hypothesis but altering the confidence intervals over which the hypothesis is taken to be valid

  154. VS (13:37:05) :
    I have read the links you pointed to on series that have a unit root.
    Would I correctly understand unit one series in this thought example: if I view a time series over the last 5,000 years of the average global temperature to have a slope of zero (no trend), then any offsets caused by processes such as additional CO2 do not then create a trend but instead simply move the base up as a one time permanent offset to create a new base of the zero slope linear regression? In other words, a permanent step move, the current regression is unaffected and the slope is still zero but the regression is now split in two pieces at the time the offset occurred.
    Of course in reality that would have to include the logrithmic nature of CO2’s concentration and the rate of its increase and the sensitivity of the climate system and CO2s effect on termpereures itself. But every month there might be a ~0.02 degC permanent step, decreasing as the years go by if CO2 concentrations continue to increase in logrithmic style, but any slope (if there is any slope) would not change from CO2 influence alone.
    Does that type of interpretation of unit one fit correctly in rough terms? Does someone actually know how to mathematically state this type of example in statistics?

  155. Since you didn’t see it, my pun wasn’t any good to begin with 🙂
    Missed you at ctm’s great party last night. Seven police cruisers were standing by outside [only half a block away] the joint.
    It was great to see you Leif!.

  156. VS (13:37:05) :
    Correction:
    I should have not put any explicit rate in my example, the ~0.02/mo was more ~0.02/cy/mo and that is just grabbing an example figure. Just make it read “some positive amount”.

  157. VS (12:25:42) :
    Debating on their turf is always problematic. You should guest post on CA.
    You’ll have more commenters ( UC. Roman, stevemc, hu jeanS) that can actually add to the discussion.

  158. James F. Evans (18:19:24) :
    Which side of the argument are you on Dr. Svalgaard?
    On the right side, of course. What else?
    But I shall not bother with you this time. Ramble on.

  159. There are damned few places online where I’m not the smartest guy in the room. This is, by jeez, one of ’em. It seems to me that if Anthony Watts’ Web site is to be the basis of judgment, the “deniers” are the best educated, most knowledgeable people in the whole AGW discussion.
    I’m wondering if anybody has given thought in this discussion of statistical analysis to the compounding effects of instrumental limits of accuracy upon the datasets being subjected to statistical evaluation in climate research.
    Like some of the other folks commenting here, I’m a physician, and one of the things that was pounded into my skull back when I was young and had hair was that information cannot be relied upon unless you keep always conscious of the degrees to which your measurements can be affected by errors.
    I’ve been paying attention to the “global warming” brouhaha for more than thirty years, and from the outset I’ve wondered about the ways in which the alarmists’ contentions appear to have been invalidated by uncertainties in the ways that measurements have been taken, and how these uncertainties have been addressed.
    Never struck me as having been taken seriously into consideration, and you’d think that with all the statistical hand-waving they’ve been doing over the decades, some consideration of that should have crept in.
    I’d welcome comments along that line. Thanks.

  160. Max Hugoson (06:43:37) :
    A little bit of fun here: Years ago I was listening to NPR (yes, I still do occassionally, but with a VERY jaundiced eye/ear these days! I’ve learned…how completely left biased they are) and they gave a report on the attempts in N.J. to raise the “standard tests scores” for high school graduates.
    The announcer said this: “Despite 4 years of efforts, 50% of all students still fall below the mean on the standard tests…”
    I almost drove off the road (I was in my car) I laughed so hard.
    My thought: “We know which side of the MEAN this announcer fell on…”
    Remember: 78% of all statistics are actually made up on the spot. (Like that one.)
    ====
    Did he mean median? That would be funnier.

  161. Rob H (16:15:59) :
    Nothing is more ridiculous than the use of statistics to claim the probability of global warming. There is no scientific basis for claiming statistics can help predict the amount of warming or anything else based on some previous period of temperature data…..
    ======
    You take the no-change extrapolation, and I’ll take the predicted warming.

  162. Wren (23:51:02) :
    You take the no-change extrapolation, and I’ll take the predicted warming.
    And I’ll sit back and adapt to whatever transpires.
    I predict that it’ll get warmer over here in July than it was in February…

  163. The article touches upon the philosophy of science. The frequentists vs bayesians, and deductive reasoning proponents vs inductive reasoning, realists vs non-realists.
    The Bayesians and Inductive reasoning proponents have introduced more subjectivity and untested assumptions to science. Bayesian statistics was pretty much discredited before WW II, but has made a comeback in the late 20th century. In order to handle the uncertainty of science, scientific inference has become reliant on probability, perhaps too reliant. The definition of probability is of course at the heart of the debate.
    It is not statistics that is the problem, the problem is those using statistics to test hypothesis that depend on assumptions which are not true, and despite knowing this, pretend to a certainty in their hypothesis, or probability if you like, that is not warranted. Yes, the argument and statistics used may be valid, yet the conclusion reached is only as good as the assumptions, which are not always supported by scientific data, or a good understanding of the science.
    For example, in climate science, whatever is not understood (precipitation efficiency, cosmic rays), or where data is not available (historical cloud cover, TSI), it is ignored, the assumption being that it is not important; or estimations by estimators whose uncertainty is large or even unknown- the assumption being the estimator is accurate enough. False assumptions tend to lead to false conclusions, despite a valid argument, supported by the statistics.
    Science is far removed from reliance on experiments in a lab. Einstein never performed an experiment and had to outsource his math. He eliminated the Aether with a mental eraser, claiming it was not needed to support his theory of special relativity. He back tracked on this a bit in his general theory of relativity, saying the special theoriy of relativity did not deny the Aether, and in regard to the general theory accepting there may be a different kind of Aether, or as he put it ” another thing [in the vacuum], which is not perceptible, [that] must be looked upon as real, to enable acceleration or rotation to be looked upon as something real”.
    In Quantum Mechanics they are now looking at the possibility of a kind of Aether like particle, called the quantum vacuum, which is thought of as a seething froth of real particle-virtual particle pairs going in and out of existence continuously and very rapidly.
    So much of science is in considering things which can not be directly observed. Nobody has ever seen an electron or even an atom, and we seek to explain what happened in the past 100 years, to 4 billion years, to the very beginning (Big Bang). We don’t really know what gravity is, just what it does. If you look at the earth as an apple, we are only guessing as to what is under the skin of the apple, having explored so little of it. Perhaps 10 thousand years from now people will laugh at how little we know, while thinking we know so much. Much like a physicist in the late 19th century claiming the greatest discoveries have all been made.
    Generally though, however you go about it, the test of a scientific theory backed up by statistics or not is can it do more than fit the known observational data and predict that which is unknown (eg the global temperature in 10 years).
    And even this does not prove the theory is true (just useful). For example the Aether theory was considered proven by French physicist Auguste Fresnel in the 19th century when he developed a mathematical version of the theory which was used to predict new optical phenomena. Optical experiments confirmed his predictions, yet Einstein and modern physicists claim there is no Aether, and the experiments disproving the Aether, whatever form it might take, are not as convincing as they would like us to think.

  164. VS:
    If you read this before answering my question above, forget it! I picked up a reference in the last few comments of Charles party post leading to BishopHill which lead to some external sites where you have laid out enough links on “unit root” testing to last me a at least a week. Thanks for the clarity!
    Here’s a hint, Dr. Spencer described a bounded method using a proper feedback parameter common to climate science GCMs which creates your bounded “random walk” alternate. Check it out.

    Steve:
    Willis:
    You should follow that path too. There is a lot of stat info there. And Willis, I followed your method to detect discontinuities using Excel under “Tale of two cities”, it works great! Still don’t understand how the residual sums and the math beneath create such of a graph, but, it does work fine.

  165. MikeE (10:05:22) Wrote: “My pet concern, I have seen it many times in biology/biochemistry, is when people assume the thing they are measuring is normally distributed when that is not at all clear from their data.”
    The selfsame assumption is made in the stockmarkets. Benoit Mandelbrot demonstrated this is his book ‘The (mis)Behaviour of Markets’. He related the tale of a bunch of wheeler-dealers who invested vast sums on the assumption of normal distributions, made millions, lost more millions when the assumption let them down, and had to be bailed out to prevent a crash (before the recent crash). Movements in chaotic systems are not – repeat not – Gaussian.

  166. pft (00:27:50): “Much like a physicist in the late 19th century claiming the greatest discoveries have all been made.”
    Hah! I knew that ‘the science is settled’ had a familiar ring! I wonder if somebody asked Einstein, ‘Why should I give you my data? You’ll just try to find something wrong with it!’

  167. As usual Luboš Motl is right. Measurement is useless without an estimate of confidence intervals and that is gained by statistical analysis.
    When your doctor gets the results of your blood test, all of the results are accompanied by upper and lower values equal to one standard deviation above and below the mean value observed in the general population. This is how she applies science to diagnosing your ailment.
    For the statistical approach to data analysis to work properly, the doctor or scientist must be willing to reject or accept the hypothesis being tested. This is difficult to do when the financial and career consequences are great.
    Professor Wegman’s criticism of Michael Mann’s use of statistics was that he did not apply the statistical techniques properly. Wegman confirmed the claim by McIntryre and McKittrick that the technique used actually mined the data to generate the “hockey stick”.

  168. @ John Whitman (19:09:00) :

    New subject: Can you provide guidance on the standard [frequentist] vs the Bayesian approaches. Is there a fundamental difference, or is it just a difference in emphasis?
    John

    Edwin Jaynes; “Probability Theory, The Logic of Science ( partial manuscript ) ” http://omega.albany.edu:8008/JaynesBook.html ( Full book also available at Amazon.
    And a large number of statistical papers by various authors on both general and specific applications — http://bayes.wustl.edu/
    Enjoy.

  169. I think the major problem here is with the social sciences, especially medical studies. A p value of 1% means that if the character you are measuring happens randomly, 1% of the time you’ll get an event as rare as that or rarer at least 1% of the time.
    Then consider all the factors that can affect the medical testing of a food, drug, etc— Age, sex, weight, frequency of use- Combining a bunch of factors you can easily get 1% results
    like,
    “women over 40 drinking over 5 cups of coffee per day reduce their chance of heart attacks by 1/3. Then would happen
    less than 0.5% by chance alone, so the results are significant.”
    In actuality, the first test is really just a “fishing” expediton. Once the results are in the next step is to run an independent test on the same factors- If you still get that 1% result, there may be something to the test.

  170. steveta_uk (04:31:35) :
    After reading some of the “random walk” posts recently, I thought I’d try a little experiment, which consisted of writing a bit of C code which generated pseudo-temperature records, …

    As have I, repeatedly. I also have access to huge quantites (GBytes) of raw process data.
    I can make up random numbers, use actual process data, induce both negative and positive forcings, examine process data with harmful forcings, and various feedback mechanisms. With the process data, a lot of it has the input data correlated with the output data.
    What was quite interesting was when you got below 500 odd data points, the cause/effect and process forcing was indistinguishable from random actions.
    True, I’m sure from a statistical standpoint this is completely meaningless, but when the data presented with climate schience (sic) whitewashed on top matches the random burp of a computer program, you tend to suspect that the climate models aren’t analyzing data but generating it.

  171. So what are you going to do when you want to research something and you don’t have a gazillion dollars? You do what you can afford, get some preliminary results and write up what you’ve found. But ah ha, others do not see a trend when you see it. How to get everyone to admit whether there is or isn’t a trend? A statistical statement. Then at least people have a basis on which they can argue whether the stats were done right or not. And if they agree the stats are correct, then someone can pursue further research. This is but the first step in finding the truth, not the final determination. The final determination will be made when there is such a large body of data, or when the scientific theory is so solid, that stats are not needed. In the meantime, statistics can provide a guideline as to what areas of research to pursue or not.

  172. Bill Tuttle (00:26:10) :
    Wren (23:51:02) :
    You take the no-change extrapolation, and I’ll take the predicted warming.
    And I’ll sit back and adapt to whatever transpires.
    I predict that it’ll get warmer over here in July than it was in February…
    ============
    That’s not very bold of you, but I wasn’t referring to seasonal changes in temperature.
    A bold prediction would be a prediction of no more global warming(i.e., a no-change extrapolation). I say bold because it doesn’t backcast well.


  173. Wren (10:02:39) :

    A bold prediction would …

    Pls, sully not a thread containing some really good technical posts, references and so forth …
    .
    .

  174. VA, along with all the others, I say thanks for coming on here, I posted this question twice over on Bart’s “Global average temperature increase GISS HadCRU and NCDC compared” but haven’t had an answer form anyone.
    Well nobody bothered to answer my question, so I will ask it again.
    We all know that the Global Temperature Anomaly series is “Corrected”, “Celled”, “Averaged” and “Homogenised”.
    Has anyone looked at a Raw Temperature Series to see if it exhibits the same Statistical characteristics?

  175. Wren (10:02:39) :
    That’s not very bold of you…A bold prediction would be a prediction of no more global warming
    There’s bold, and then there’s rash.
    I haven’t survived four combat zones and two marriages by being rash.

  176. wayne (02:53:10)

    Willis, I followed your method to detect discontinuities using Excel under “Tale of two cities”, it works great! Still don’t understand how the residual sums and the math beneath create such of a graph, but, it does work fine.

    Thanks, wayne. Now if you’d be so kind, post that over on the “Tale of Two Cities” thread, Steve Goddard still doesn’t believe it.

  177. Willis,
    I am glad that you and Leif like your spreadsheet. Nevertheless the entire the 1895-1941 “trend” you claim, occurred during one year (1920) indicating a discontinuity which your spreadsheet missed.

  178. Steve Goddard (13:28:36)

    Willis,
    I am glad that you and Leif like your spreadsheet. Nevertheless the entire the 1895-1941 “trend” you claim, occurred during one year (1920) indicating a discontinuity which your spreadsheet missed.

    Trend 1895-1919 (not including 1920) 0.04°C/decade
    Trend 1920-1941 (not including 1920) 0.09°C/decade
    In other words, there is a trend both before and after 1920.
    I’m afraid your eyeball has misled you again. There is a change in the trend in 1920, but a change in the trend != a discontinuity.
    Also, please recall that your claim was that

    The increase in temperatures started around 1970.

    When I showed there was a trend post 1941, your new claim was that there was no trend pre 1941. But please note:
    Your original thesis was that the trend only existed post-1970, and was driven by the differential post-1970 population growth.
    That claim has been resoundingly disproven. We’re now discussing other issues about the temperature record.
    But we should discuss this on the relevant thread. I have cross-posted this there. Thread drift, it burns …

  179. A very petite friend of mine was expecting a baby and was incensed to be told by her health visitor that there must be a problem with the pregnancy because her baby’s size-for-dates was “below average”.
    When Joe Public has such a clear understanding of statistics, it is very easy to make them believe anything.

  180. Willis,
    I don’t know where you are getting your numbers from. The trend from 1895-1919 is negative -0.0056. The trend from 1920-1941 is 0.0097 . This is nowhere near the 0.20 you originally claimed, or your reduced numbers above.
    You are changing the subject of this discussion, which was the fact that your spreadsheet missed the discontinuity in 1920.
    And I had already agreed that there was probably a post mid-1940s trend, which is when the population started to grow rapidly in Fort Collins – supporting the UHI thesis.

  181. Veronica,
    Chances are that temperatures will remain above the GISS average for the indefinite future, even if there is no further increase in temperatures. This will potentially allow Hansen to paint his maps red – forever.

  182. Steve Goddard (16:45:44)

    Willis,
    I don’t know where you are getting your numbers from. The trend from 1895-1919 is negative -0.0056. The trend from 1920-1941 is 0.0097 . This is nowhere near the 0.20 you originally claimed, or your reduced numbers above.
    You are changing the subject of this discussion, which was the fact that your spreadsheet missed the discontinuity in 1920.
    And I had already agreed that there was probably a post mid-1940s trend, which is when the population started to grow rapidly in Fort Collins – supporting the UHI thesis.

    I will answer this on the relevant thread

  183. Willis,
    There is a clear UHI trend in Fort Collins and less so in Boulder. You chose to dispute it because of some irrelevant statistics generated from your spreadsheet. Meaningless statistics is the topic of this thread.

  184. bob (19:57:06), if I may have a stab at answering your question, I think you misinterpret your quote. It is the variance that diverges to infinity, not the series.

  185. Steve Goddard (18:36:05) : edit

    Willis,
    There is a clear UHI trend in Fort Collins and less so in Boulder. You chose to dispute it because of some irrelevant statistics generated from your spreadsheet. Meaningless statistics is the topic of this thread.

    I have never denied that both have UHI. It was your claim that the trend started in 1970 that I denied. If you wish to claim that my statistics are “meaningless”, you have to do so mathematically. Nothing is more statistically meaningless than a “because I said so” claim regarding statistics.

  186. You can also drive yourswlf nuts like i do by picking up self-referencing sentences. Example: “This sentence no verb”.
    Here in the leader we have a Dr Goodman quoted as “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”
    If the number is more than anybody appreciates, and because he is an anybody, therefore he cannot appreciate that there is a question about it.
    You can keep that one, it’s a good one.

  187. Anthony, heads up:
    “Improving Predictions of Climate Change and its Impacts: Media Briefing
    Wed, 17 Mar 2010 15:05:00 -0500
    NSF invites reporters to participate on Monday, March 22 at 11:00 a.m., EDT
    On March 22 at 11:00 a.m., EDT, officials from the National Science Foundation (NSF) and the U.S. Departments of Agriculture and Energy will discuss the launch of an interagency program aimed at generating predictions of climate change and its impacts at more localized scales and over shorter time periods than have previously been possible. This project represents an historic augmentation of …
    More at http://www.nsf.gov/news/news_summ.jsp?cntn_id=116601&WT.mc_id=USNSF_51&WT.mc_ev=click
    This is an NSF News item.”

  188. Steve Goddard (20:25:49)

    Willis,
    Give me a break.
    You littered the thread with graphs like this one generated from your spreadsheet, which attempted to show that the divergence was linear since 1895, implying that it had nothing to do with UHI.
    http://homepage.mac.com/williseschenbach/.Pictures/boulder_ft_collins_temps.jpg

    Not sure what your point is here, Steve. The graph is accurate, it shows the original data including any possible discontinuities. Once you take out the January 1942 discontinuity that we both agree is there, it looks like this.

    Sure looks like a trend beginning to end to me. If you are saying those graphs are not accurate, post up your own.
    I don’t know why the temperatures of the two towns have diverged for the last century plus. Part of it is quite possibly UHI, but the population trends don’t explain it, they don’t diverge until 1970, and before that Boulder was growing faster. I don’t know why they diverge … so sue me. You tell us why they diverge since (at a minimum) 1942.
    I see no mathematical evidence of a 1920 discontinuity. What month of 1920 do you think it occurred in, and how large do you think it was? And speaking of statistical analysis, why do you think that the residuals test for discontinuities doesn’t show any 1920 discontinuity, as demonstrated here?
    w.

  189. Willis
    I will assume the actual location of both weather stations cited in the graph has been cheked and that one hasn’t accidentally been installed in the kitchens of McDonalds 🙂
    We have a similar disconnect locally in some of our coastal towns. One town has the coast next to it AND an estuary so is in effect bounded on two sides by water. The other is bounded by the sea only. Inland are the Highlands of Dartmoor which has a dramatic effect on weather.
    Rain and sunshine and cloud are highly dependent on the direction of the winds. Our predominant winds are south westerlies which affects both town pretty equally.
    If another wind direction predominates (as has happened for long periods in the last half century) -particularly one coming over the Dartmoor land mass -this has a fundamental effect on the local weather and affects one town much more than the other.
    I’ve no idea of the topgraphy of the two towns you cite but could changes in prevailing wind, plus Uhi and poor location of the weather stations provide some part of the explanation?
    Tonyb

  190. TonyB (01:38:49) : edit

    Willis
    I will assume the actual location of both weather stations cited in the graph has been cheked and that one hasn’t accidentally been installed in the kitchens of McDonalds 🙂

    I’ve no idea of the topgraphy of the two towns you cite but could changes in prevailing wind, plus Uhi and poor location of the weather stations provide some part of the explanation?
    Tonyb

    I fear I haven’t a clue, ask Steve, he’s the one that’s making the claims. I’m just analyzing the numbers …

  191. I know quite few scientist from bio/chem world, and as mathematician i can tell you they have no clue about statistics and they just apply random statistical tests to their data without understanding and consideration of quality and independence of data…
    they just care for ‘significance’ which may or may not be there as test does not make sense quite often. If you don’t understand statistics 100% of the time don’t bother with it as you will get things wrong for sure.

  192. Steve Goddard (06:21:02) : edit

    Willis,
    With both discontinuities (1920 and 1941) removed the graph makes sense. Flat until the mid-1940s when the populations started to grow rapidly. That is the point of the UHI article and I am done trying to explain this to you and your spreadsheet.
    https://spreadsheets.google.com/oimg?key=0AnKz9p_7fMvBdElxNDA4Vlh2OGhvOUdEX1N0bm1CeWc&oid=6&v=1269263839043

    Well, since I asked you how much the discontinuity in 1920 was, and what month it occurred in, and you haven’t answered either one, I’d say you haven’t even started “trying to explain this” as you claim.
    Next, the point of your UHI article was not that the difference between Boulder and Ft. Collins was “Flat until the mid-1940s when the populations started to grow rapidly.” That’s historical revisionism. You started with a chart showing 1970 on, and the statement that:

    The big difference is that Fort Collins has tripled in size over the last 40 years, and Boulder has grown much more slowly.

    Last forty years, that would be since … umm … 1970. That’s where you started. You concluded by saying:

    Conclusion:
    We have two weather stations in similarly sited urban environments. Until 1965 they tracked each other very closely. Since then, Fort Collins has seen a relative increase in temperature which tracks the relative increase in population. UHI is clearly not dead.

    In other words, you didn’t say a steenkin’ thing about 1940 in your article. You claimed the temperature difference started in 1965 (not true), and that it was due to Fort Collins faster growth post 1965 (also not true, otherwise it should have gone the other way pre-1965). And you want to lecture me on the misuse of statistics?
    Finally, the point of your whole article was that it was the difference in the two cities’ growth rates that was the cause of the growing temperature disparity.
    But if you now claim that it started in 1940, during 1940-1970 the disparity is reversed. Boulder grew faster than Fort Collins during that time (see above) … so according to your claim, Boulder’s UHI should have grown faster 1940-1970 as well. But it didn’t, so your theory about relative population growth is dead on arrival.
    Do I think that there was UHI in Boulder and Fort Collins? Yes, and I think it was greater in Fort Collins than in Boulder. But you can’t tie that to population growth as you tried to do. The math simply doesn’t work.

  193. Willis,
    Drop it, please. Fort Collins has grown much faster than Boulder, particularly around the weather station. As a result, Fort Collins has seen much more UHI effect. That is exactly what the data shows. The math is fine, it is your use of statistics that is the problem.

  194. H.R. (06:38:17) :
    channon (03:54:04) :
    “Yes pure math gives what appears to be the comfort of certainty and although many pure scientists believe this to be absolutely true, most philosophers can show that no system of logic is both complete and consistent.
    That being the case, that purity is only relatively true and there is an element of uncertainty inherent in all calculations and proofs. […]”
    Sooo… when is it that 1pebble + 1pebble doesn’t equal 2pebbles? Did I make an error in logic somewhere?
    He kicked the stone but missed the point………….

  195. Well what has always seemed odd to me, is that everyday, it is a different experiment to measure the local temperature; so naturally you expect to get a different result from the experiment you did the day before; after all, weather comes and goes.
    So what is the point of averaging the result of two entirely different experiments done under possibly different conditions. The answers are supposed to be different, and the average isn’t any more correct for either of the experiments. I don’t mind averaging the result of different runs of exactly the same experiment, run under identical conditions. The average of all of the people on earth doesn’t look any more like anyone you know, than anyone else does.
    Statistics is mostly creating “information” where there is none.

  196. One of the great debates in the election arena is whether statistics can be used to verify elections. In 2004, Kerry was predicted to be the winner by three percent based upon exit polls (of about 14,000 people nationwide as I recall — very large sample). Yet Bush was declared the winner by three percent.
    The reason given by the pollsters for the discrepancy was the Bush voters declined to talk to pollsters more often than Kerry voters declined to talk with them.
    On the other hand, somewhere between 75 and 80% of the votes were cast on computers with no paper trail to audit the vote. The computerized votes were counted by four or five major computer election companies several with strong ties to the Republican party. Public electioneers had turned the elections over to private corporations and public electioneers had no capability of their own to check the equipment or the software.
    So who really got the most votes in 2004? Depends on who you ask. There was a huge fight between pollsters who refused to release their data to the public and a vast number of statisticians who believed the data had proven Kerry had won and the computerized voting systems were full of systemic fraud, made unprovable by no paper trail.
    So is exit polling a proper use of statistics or is it not? Can it verify elections or not? Depends on who you ask and what interests they have in the outcome.
    But next time you vote on a computer without a paper trail to audit it, remember trusting the software engineer to register your vote correctly is like a blind man trusting a person he does not know to mark his ballot correctly.
    I mention this because if this can happen with elections, it can certainly happen with the use of statistics in other areas where a political agenda is at stake, such as climate.

  197. Hi George how are you? Been a while.
    Statistics are only properly applied to “random” processes in some way, or applied to determine what, if any, meaning “random” has. That’s all they are.

  198. davidgmills (17:18:05),
    I had not heard that computer voting companies were accused of throwing the Bush-Kerry election. [Personally, I’m all for paper ballots – with thumbprints.]
    But somehow I doubt that an explosive secret like rigging a national election could be kept between competing companies, with their respective machines in tens of thousands of precincts, and considering the number of people who would have had to be in on the scam. Two people can keep a secret. Three, rarely. More than three, and you might as well blog about it. But I suppose it isn’t impossible.
    That also recalls the Washington state election, which I followed at the time. Christine Gregoire lost to Dino Rossi in the first vote count by a couple of hundred votes.
    State law required a re-count if the vote margin was less than 2,000, IIRC. So they did another machine count. Rossi again won, but the margin was reduced to around 40 votes. I don’t understand how computer voting can come up with different numbers in an identical machine re-count.
    Anyway, Gregoire’s supporters paid for a third recount. [John Kerry personally paid $250,000.]
    Surprise! Christine Gregoire ‘won’ by a hundred and some votes. She was promptly sworn in, even though Rossi produced evidence that hundreds of convicted felons had voted illegally.
    Same thing happened in that long drawn out Coleman-Franken Senate race. When the votes were counted, Coleman won by a small margin, I forget how many votes exactly. Then, after more recount shenanigans in which it was shown that over 2,800 deceased individuals had ‘voted’ [the ballots were traced to an ACORN group; but no charges were ever filed], Al Franken was finally declared the winner.
    Looks like some folks have learned how to game the system. That’s the new millennium democracy in America, where our elected representatives are “deemed” to have voted to pass legislation, without having to actually vote.

  199. Steve Goddard (12:19:23) :
    “Willis, Drop it, please. Fort Collins has grown much faster than Boulder, particularly around the weather station.”
    Steve, that’s a rather qualitative statement. It is not common to find a UHI differential of 4 deg C in under 20 years. Might happen in China when a new megapolis was built in the bush, but it’s quite hard to see that size of change at Boulder. Any idea as to the secondary mechanism that population growth caused? From data I have looked at, which is not many cities, I rather feel that population growth in big cities increases the area of the UHI without much affecting the temperature, which seems to plateau or climb very slowly around the middle of a 1 million people city.

  200. Geoff,
    I have a thermometer on my bike, and ride in and out of downtown Fort Collins (where the station is located) all the time. If the wind is light, I always see at least two degrees difference and have seen as much as five or six.
    It is very easy for me to believe the graph which shows 2.5 degrees of warming in Fort Collins, and the Colorado State Climatologist has told me that he also believes most of the warming is due to UHI.
    REPLY: How do you mount the Stevenson Screen or MMTS on your bike? 😉 -A

  201. Anthony,
    I don’t need a Stevenson screen, because it all averages out with Monte Carlo statistics. ;^)
    Seriously though, it makes no difference which way I am riding, day or night – it is always warmer downtown.
    REPLY: True dat. I made the west to east transect drive on Colfax Ave once with a car that had a thermometer. Same result -A

  202. Steve Goddard (12:19:23)

    Willis,
    Drop it, please. Fort Collins has grown much faster than Boulder, particularly around the weather station.

    When people start asking someone to please “drop it”, most people can draw the obvious conclusion … This is particularly true when Boulder grew faster than Fort Collins for 30 years, from 1940 to 1970, but there was no effect on the increasing difference between the two stations … so your hypothesis cannot be proven in the way that you have chosen. If your theory were correct, the temperature difference should have shown a big jump starting in 1970 … but it did no such thing.
    Note that this is does not mean that your hypothesis is false. It just means that you can’t prove it by population figures as you claim. Now if you can show logarithmically increasing growth from 1897 to 2010 around the Fort Collins site, and no corresponding growth around the Boulder site, you might have something. Until then, it’s just math-free handwaving.

  203. Willis,
    I am not sure why you are invoking the third person in your “most people” claim. The fact that you are having difficulty seeing something is your own business.
    This overlay of the 1895-2008 temperature records makes it painfully obvious that somewhere between 1950 and 1975 a divergence started and has accelerated since. This corresponds to Fort Collins period of rapid growth. Boulder population has actually decreased during the last decade.
    http://docs.google.com/View?id=ddw82wws_468cpnbv7fd
    And as Tom Moriarity pointed out, the Boulder station faces open space on one side, meaning it is less affected by population growth than the Fort Collins station – which is downtown.

  204. Willis,
    I’m not sure why it isn’t clear to you, but the points of the article are:
    1. Fort Collins temperatures have risen strongly in correspondence to rapid growth of the city. Fort Collins has increased in size by 300% over the last few decades.
    2. Boulder temperatures have risen much less, and the city has grown much less. Boulder has grown less than 50% during that same time period.
    You claim to be doing some sort of precise mathematics, but at the same time you chose to arbitrarily subtract about two degrees from all post-1941 Boulder temperatures, to try to prove your point.

  205. Steve Goddard (21:46:20) : edit

    Willis,
    I’m not sure why it isn’t clear to you, but the points of the article are:
    1. Fort Collins temperatures have risen strongly in correspondence to rapid growth of the city. Fort Collins has increased in size by 300% over the last few decades.
    2. Boulder temperatures have risen much less, and the city has grown much less. Boulder has grown less than 50% during that same time period.

    When we look at the Fort Collins minus Boulder dataset, we see:
    Trend 1942-1970 = 0.2°C/decade
    Trend 1970-present = 0.2°C/decade
    Therefore, there is no change in 1970, despite the pre- and post-1970 differences in the growth of the cities.
    Therefore, the difference in the trends and the difference in the growth of the cities are not related.

    You claim to be doing some sort of precise mathematics, but at the same time you chose to arbitrarily subtract about two degrees from all post-1941 Boulder temperatures, to try to prove your point.

    I haven’t a clue what you are talking about. I have not subtracted two degrees from post-1941 Boulder temperatures, I haven’t touched them at all. I adjusted the difference dataset (Fort Collins – Boulder) by 0.6°C in January 1941, to correct for a mathematically identified discontinuity. Period. End of story. No other adjustments. Your two degree adjustment doesn’t exist.

  206. Steve Goddard (21:34:37) : edit

    … This overlay of the 1895-2008 temperature records makes it painfully obvious that somewhere between 1950 and 1975 a divergence started and has accelerated since. This corresponds to Fort Collins period of rapid growth. Boulder population has actually decreased during the last decade.
    http://docs.google.com/View?id=ddw82wws_468cpnbv7fd

    Oh, great, you’re back to “look at this diagram, it’s painfully obvious” … no, it’s not obvious at all.
    We are discussing the difference between the two city temperatures. If you want to graph something, graph what we are discussing – graph the difference between the two city temperatures, 1897 to present.
    Then point to where the trend starts.
    I await your graph …

  207. Willis,
    I have posted the difference graph you are asking for many times in the last two weeks, in posts specifically directed at you.
    https://spreadsheets.google.com/oimg?key=0AnKz9p_7fMvBdElxNDA4Vlh2OGhvOUdEX1N0bm1CeWc&oid=2&v=1269347628911
    The trend from 1895-1965 is 0.011 with low significance
    The trend from 1966-2008 is 0.031 with high significance
    The UHI effect is very apparent during the last 40-50 years.
    OTOH, you wanted a to prove a linear trend through the entire series, and in order to do that you made an adjustment to all Boulder post 1941 data. How very Hansenesque.

  208. Willis,
    Hopefully we can agree on these points.
    1. Temperatures in Fort Collins have increased much more than they have in Boulder over the last 50 years.
    2. Temperatures in Fort Collins started increasing rapidly about 40-50 years ago.
    3. Population in Fort Collins has increased much more than it has in Boulder over the last 50 years.
    4. As Tom Moriarty pointed out, the Boulder station is probably less sensitive to population growth than the Fort Collins station, due to it’s proximity to open space.
    You are attempting to do very precise math to prove a long term trend based on a major post-1941 correction of your own device. This in itself is a mistake (the trend doesn’t exist) but just as bad – the Boulder station history shows many moves and changes prior to 1980 which puts any sort precision in the trash bin. Your analysis is flawed and you are missing the forest for the tree.

  209. I’ve read the article but not all of the responses – so this might be duplication.
    I agree with the thrust of the article in that there is much misuse and misinterpretation of statistical analysis in research, usually but not only, by those with inadequate training, experience or even motivation to ‘think statistically’ (a little knowledge can be a dangerous thing). The apparent belief that a P-value of 5% (or any other value) somehow represents a hard and fast dividing line between truth and falsehood is a major difficulty, but it is by no means the only one in reaching valid conclusions.
    However, I don’t think that this means that the statistical approach should be abandoned. By analogy, the (supposed) principle of English law is that an accused person is innocent until ‘proven guilty’. In a trial, the court hears the evidence and then either acquits the defendant or finds him or her guilty, beyond reasonable doubt; that is, rejects the null hypothesis of innocence.
    There are thus two correct outcomes to the trial – if actually innocent, the accused is acquitted or, if actually guilty, the accused is convicted. There are also two possible incorrect outcomes; that is, an innocent person is wrongly convicted or a guilty person goes free.
    Society generally views either of these latter two outcomes as undesirable. A solicitor of my acquaintance (social not professional!) argued vehemently that the trial process should be such that an innocent person could never be convicted (equivalent to a P-value of zero). He refused to accept that the only way to ensure this would happen, would be to acquit everybody whatever the evidence. This would mean, as a consequence, that there would be no need for any of the trappings of the current criminal justice system. The lynch mob would reign supreme.
    Against that scenario, civilised society would probably prefer the current justice system with its equivalent of non-zero P-values and the occasional acquittal of felons (corresponding to less than 100% power of a statistical test).
    The difficulty lies not with statistical methods of analysis per se (and of the design of the associated data collection process), whether frequentist or Bayesian, but with the many users who have inadequate knowledge of the shortcomings of the techniques and the related pitfalls. It is not easy to see how the situation can be improved without throwing away the benefits of properly applied statistical design of data collection and analysis. The ready availability of powerful software makes things worse because it reduces the need for the analyst to think very much about their analysis and its validity.
    A related but separate issue is the deliberate misuse and falsification of the interpretation of observational data. Could this apply to the topics of man-made global warming, its averred consequential climate change and the dire predictions of its ‘inevitable effects?
    Charlie Barnes
    P.S. A slight nit-pick – probability (as indicated by the shaded P-value is measured by area under the curve between two horizontal (abscissa) values. The height of the curve refers to the probability density (function) rather than probability itself. CB.

  210. “”” Brian G Valentine (17:47:01) :
    Hi George how are you? Been a while.
    Statistics are only properly applied to “random” processes in some way, or applied to determine what, if any, meaning “random” has. That’s all they are. “””
    Hey Brian, I’ve been noticing your shingle pop up now and then; and meaning to contact you. With the help of some very nice helpful folks, I have been slowly refreshing a lot of what I forgot from 50 years of lack of use.
    Prof Will Happer at Princeton, has been particularly helpful and gracious.
    The more I get into this, the more convinced I become, that “It’s the Water !”
    I never thought I would have to relearn Quantum Mechanics, just to go outside, and see if it is cloudy or humid.
    You might remember I once asked you how the hell the CO2 band at 15 microns came up as a comb of equally spaced lines; and you replied it was “harmonics”.
    Silly me, never dreamed that the molecular energy levels would also be quantized (duh!). As I told Prof Happer; I quit chemistry about one year too soon; to focus on the Physics, and Radio-Physics.
    If you still have my e-mail address, drop me a line; I’ll look in the archives for yours.
    George

  211. This article sounds like it is written by someone fundamentally anti-science – no shortage of those in the “intelligentsia”. The author expresses shock at the discovery that 95% probability of something being true means 5% chance of it NOT being true. His jowl-flapping indignation at the basic principles of probability is given no coherent factual basis.
    Application of statistics to the scientific method arises from the complexity of natural phenomena and the need to assess significance of observations in the face of such complexity. When studying factor “A”, the role of all the other factors “B …Z” must either be excluded by experimental design or allowed for in the statistical calculation of the results strength (plus initial calculation of the study design, number of subjects needed etc.). This attack on statistics in science seems to reflect an absence of basic understanding of or curiosity towards natural phenomena and their highly complex nature.
    And the philosophical arguments against statistics in science appear to be on a level with the famous quote from Douglas Adams’ “Life the Universe and Everything” – the conversation among the philosopher custodians of the deep thought computer (which found the answer to life the universe and everything to be 42 but gave no +/- error bars):
    “We’re philosophers. But we might not be!”

  212. One particularly important area of statistics is the use of gaussians from Monte Carlo simulations. Many areas of science, engineering, medicine and business could not survive without top notch random number generators. In high school I heard one of the famous Polish mathematicians say “there is no such thing as a perfect random number generator, and even if there was it would be impossible to prove it.”
    Even a lousy random number generator would do better than the Met Office seasonal predictions. Why? Because they always predict warming during a period of cooling.

  213. Steve,
    Arguably, with a perfect “random number” generator, it would be impossible to predict what the next number generated would be. No matter how long a sdequence of numbers had already been generated; nothing would give you a clue as to the next number.
    Thinking of Gaussian White Noise, as being a sequence of random numbers, that have a Gaussian distribution; one can then make the argument that no signal contains more information than Gaussian White Noise; there is no redundancy in such a signal at all, so it is nothing but information about the signal.
    Of course it is also totally useless information; but information nonetheless.

Comments are closed.