'Robust' analysis isn't what it is cracked up to be: Top 10 ways to save science from its statistical self

In the wake of what Willis recently pointed out from Nassim Taleb, about how “In fact errors are so convex that the contribution of a single additional variable could increase the total error more than the previous one.”, I thought it relevant to share this evisceration of the over-reliance on statistical techniques in science, especially since our global surface temperature record is entirely a statistical construct.

Excerpts from the Science News article by Tom Siegfried:

Science is heroic. It fuels the economy, it feeds the world, it fights disease. Sure, it enables some unsavory stuff as well — knowledge confers power for bad as well as good — but on the whole, science deserves credit for providing the foundation underlying modern civilization’s comforts and conveniences.

But for all its heroic accomplishments, science has a tragic flaw: It does not always live up to the image it has created of itself. Science supposedly stands for allegiance to reason, logical rigor and the search for truth free from the dogmas of authority. Yet science in practice is largely subservient to journal-editor authority, riddled with dogma and oblivious to the logical lapses in its primary method of investigation: statistical analysis of experimental data for testing hypotheses. As a result, scientific studies are not as reliable as they pretend to be. Dogmatic devotion to traditional statistical methods is an Achilles heel that science resists acknowledging, thereby endangering its hero status in society.

More emphatically, an analysis of 100 results published in psychology journals shows that most of them evaporated when the same study was conducted again, as a news report in the journal Nature recently recounted. And then there’s the fiasco about changing attitudes toward gay marriage, reported in a (now retracted) paper apparently based on fabricated data.

But fraud is not the most prominent problem. More often, innocent factors can conspire to make a scientific finding difficult to reproduce, as my colleague Tina Hesman Saey recently documented in Science News. And even apart from those practical problems, statistical shortcomings guarantee that many findings will turn out to be bogus. As I’ve mentioned on many occasions, the standard statistical methods for evaluating evidence are usually misused, almost always misinterpreted and are not very informative even when they are used and interpreted correctly.

Nobody in the scientific world has articulated these issues more insightfully than psychologist Gerd Gigerenzer of the Max Planck Institute for Human Development in Berlin. In a recent paper written with Julian Marewski of the University of Lausanne, Gigerenzer delves into some of the reasons for this lamentable situation.

Above else, their analysis suggests, the problems persist because the quest for “statistical significance” is mindless. “Determining significance has become a surrogate for good research,” Gigerenzer and Marewski write in the February issue of Journal of Management. Among multiple scientific communities, “statistical significance” has become an idol, worshiped as the path to truth. “Advocated as the only game in town, it is practiced in a compulsive, mechanical way — without judging whether it makes sense or not.”

Commonly, statistical significance is judged by computing a P value, the probability that the observed results (or results more extreme) would be obtained if no difference truly existed between the factors tested (such as a drug versus a placebo for treating a disease). But there are other approaches. Often researchers will compute confidence intervals — ranges much like the margin of error in public opinion polls. In some cases more sophisticated statistical testing may be applied. One school of statistical thought prefers the Bayesian approach, the standard method’s longtime rival.

Why don’t scientists do something about these problems? Contrary motivations! In one of the few popular books that grasp these statistical issues insightfully, physicist-turned-statistician Alex Reinhart points out that there are few rewards for scientists who resist the current statistical system.

“Unfortunate incentive structures … pressure scientists to rapidly publish small studies with slapdash statistical methods,” Reinhart writes in Statistics Done Wrong. “Promotions, tenure, raises, and job offers are all dependent on having a long list of publications in prestigious journals, so there is a strong incentive to publish promising results as soon as possible.”

And publishing papers requires playing the games refereed by journal editors.

“Journal editors attempt to judge which papers will have the greatest impact and interest and consequently those with the most surprising, controversial, or novel results,” Reinhart points out. “This is a recipe for truth inflation.”

Scientific publishing is therefore riddled with wrongness.

Read all of part 1 here


WORTHLESS A P value is the probability of recording a result as large or more extreme than the observed data if there is in fact no real effect. P values are not a reliable measure of evidence.

Excerpts from Part2:

Statistics is to science as steroids are to baseball. Addictive poison. But at least baseball has attempted to remedy the problem. Science remains mostly in denial.

True, not all uses of statistics in science are evil, just as steroids are sometimes appropriate medicines. But one particular use of statistics — testing null hypotheses — deserves the same fate with science as Pete Rose got with baseball. Banishment.

Numerous experts have identified statistical testing of null hypotheses — the staple of scientific methodology — as a prime culprit in rendering many research findings irreproducible and, perhaps more often than not, erroneous. Many factors contribute to this abysmal situation. In the life sciences, for instance, problems with biological agents and reference materials are a major source of irreproducible results, a new report in PLOS Biology shows. But troubles with “data analysis and reporting” are also cited. As statistician Victoria Stodden recently documented, a variety of statistical issues lead to irreproducibility. And many of those issues center on null hypothesis testing. Rather than furthering scientific knowledge, null hypothesis testing virtually guarantees frequent faulty conclusions.

10. Ban P values

9. Emphasize estimation

8. Rethink confidence intervals

7. Improve meta-analyses

6. Create a Journal of Statistical Shame

5. Better guidelines for scientists and journal editors

4. Require preregistration of study designs

3. Promote better textbooks

2. Alter the incentive structure

1. Rethink media coverage of science

Read the reasoning behind the list in part 2 here

I would add one more to that top 10 list:

0. Ban the use of the word “robust” in science papers.

Given what we’ve just read here and from Nassim Taleb, and since climate science in particular seems to love that word in papers, I think it is nothing more than a projection of ego from the author(s) of many climate science papers, and not a supportable statement of statistical confidence.

One other point, one paragraph in part one from Tom Siegfried said this:

For science is still, in the long run, the superior strategy for establishing sound knowledge about nature. Over time, accumulating scientific evidence generally sorts out the sane from the inane. (In other words, climate science deniers and vaccine evaders aren’t justified by statistical snafus in individual studies.) Nevertheless, too many individual papers in peer-reviewed journals are no more reliable than public opinion polls before British elections.

That ugly label about climate skeptics mars an otherwise excellent article about science. It also suggests Mr. Siegfreid hasn’t really looked into the issue with the same questioning (i.e. skepicism) that he did for the abuse of statistics.

Should Mr. Siegfreid read this, I’ll point out that many climate skeptics became climate skeptics once we started examining some of the shoddy statistical methods that were used, or outright invented, in climate science papers. The questionable statistical work of Dr. Michael Mann alone (coupled with the unquestioning media hype) has created legions of climate skeptics. Perhaps Mr. Siegfeid should spend some time looking at the statistical critiques done by Stephen McIntyre, and tell us how things like a single tree sample or upside down data  or pre-screening data begets “robust” climate science before he uses the label “climate deniers” again.


227 thoughts on “'Robust' analysis isn't what it is cracked up to be: Top 10 ways to save science from its statistical self

    • “0. Ban the use of the word “robust” in science papers.”
      Spot on. This article could simply have been entitled ” Robust analysis isn’t”.
      In fact, it would be a shame to ban it, since it is one the best early indicators that the author is doing politics not science. One thing you can be sure of when a scientist chooses to describe his results as “robust” is that they are anything but. Robust is a word for politicians and lawyers. Saying a paper is robust is about as convincing as a politician announcing a “robust enquiry”. .
      If we wanted a No.0 for that list I’d say ban discussing “trends” in climate papers. Then they’d have to think a bit. It is also what most of the trivial waffle about significance relates to.
      Siegfreid rather blows objectivity credentials by assimilating climate deenyiers to vaccinations ( why not go the whole crap and cite some Lew-paper reference to moon landings while he’s in there.).
      Still, I’m sure he thinks his comparison is “robust”.

      • Hi Mike –
        I don’t disagree with either you or our host AW’s suggested ban on the word “robust” from science dialog. But I will support “robust” as a meaningful word in engineering. For engineering, the terms “robust process” or “robust design” refer to a process or design that is least sensitive to process/design parameter variations compared to alternative process or design parameters. Determining the robust process/design, however, depends on a combination of solid experimental data and a verified and validated model of the process or design. The robust operating point for either is determined by optimization methods or other systems engineering methods.
        My only point is that where defined, as it is in engineering, the word “robust” is not a dirty word! It’s only “dirty” where it is used in vague, undefined ways that obfuscate or project “soundness” that exceeds reality.

  1. Again, this is my thing, this time in my role as a hypothesis tester in the program Dieharder, a random number generator tester that relies heavily on p-values because, well, it is a hypothesis tester testing the null hypothesis “The current random number generator is a perfect random number generator”, which is what one is trying to falsify with a test.
    I’ve also read Statistics Done Wrong and am very sympathetic to its thesis. But even p-values are useful if one recognizes the importance of George Marsaglia’s remark in the Diehard documentation: “p happens”.
    p happens means that one simply should not take a p of 0.05 terribly seriously. That’s a 1 in 20 shot. This sort of thing happens all of the time. Who would play russian roulette with a gun with 20 chambers? Only a very stupid person, or one where winning came with huge rewards.
    The big question then is when one SHOULD

    • When it’s replicated by a number of independent measurements, (trials). This means it cannot happen in a casino, and unfortunately this cannot be the case in climate science as long anyone is adjusting any data. It’s not just that the adjustments are biased, but the fact that they are dependent on a “body of knowledge” (sic) that by it’s very nature lacks the characteristics of independent observation and interpretation.
      Having had a chance to look at Taleb’s draft version, (without the benefit of a good cold lager), I believe I see his point, and agree with it, (still have not worked through the formal math, but I see where it’s going). In certain types of models it very possible for the error processes to overwhelm the substantive information, the inclusion of additional variables just compounds the problem. This is certainly consistent with my field, (telecommunications), where I work with a lot of nominal variables and find that if we can isolate the main effects correctly then it’s been a good day, anything more than that is just a crap-shoot.

    • A p-value isn’t a probability. It assumes some distribution of an unobservable parameter. In the frequentist world it is not permitted to assign probabilities to unobservables.
      But even it could be a probability, simply because it’s about an unobservable, it doesn’t say much about the model other that it might have good parameter values. IOW: P(parameter|model,data), This says nothing about the model other than it has nice parameters. Saying that the model must represent anything in the real world because the parameters are the best is a lot like saying some car will perform well because it was made with high quality screws and has a great paint job.
      What’s really needed is P(model | parameters, data). The only way to get this is to see how well the model predicts. Nothing else will do.

    • Well I like what famous New Zealand scientist Ernest (Lord) Rutherford, had to say about statistics.
      ” If you have to use statistics, you should have done a better experiment. ”
      I would go even further (strictly my opinion mind you):
      If you are using statistics, you are NOT doing science. You simply are practicing a specific branch of mathematics, albeit a quite rigorous and exact branch of mathematics. No uncertainty exists in the output of statistical calculations. They are always done on a given data set of exact numbers (listed in the data set) So the end result of performing any specific statistical mathematics algorithm is always an exact result.
      As for what that result might mean; well that is pure speculation on the part of the statistician.
      NO physical system is responsive to or even aware of the present value (now) of any statistical computation; they can only respond to the current present value of each and every one of the pertinent physical parameters, and can act only on those values. For example, any statistical average value of some variable almost certainly occurred some time in the past, and the physical system can have no knowledge or memory of what that value is or when it might have occurred.
      So appending a meaning to some statistical math calculation is simply in the eye of the beholder. It is all fiction and it can predict no future event. Well it might suggest how surprised an observer might be, when he eventually learns what a future event is at the time it becomes a now event.
      So as I said, if you are doing statistics, you are not doing science. No experiment can be devised to test the output of your quite fictional calculation. (ALL of mathematics is fictional. Nothing in mathematics exists anywhere in the physical universe.)
      g >> G

      • I think it was George Box (statistician) who said that all models are wrong, but some models are more useful than others. Sure math doesn’t exist in reality, but approximations (i.e., models) are useful, as are estimates of uncertainty.

    • Simple problem.
      I have a product. It is priced at 50 bucks.
      At 50. dollars 80% of the people who click on the link buy it.
      So, with 100 clicks I get 80 sales and 4000 in revenue.
      I do A/B testing.
      For half the people coming to the site I price it at 55.00 and 40 of 50 people buy it
      Do I have enough evidence to raise the price? should I.
      In practical problems it is always easy to determine the Should.. when should I bet with 75% confidence
      80% 99% because one can calculate the cost and benefit of being right or wrong.
      In science what is the cost of being wrong.

      • Steve Mosher

        In (climate) science what is the cost of being wrong?

        Damaging the lives of billions between now and 2100, killing millions each year for 85 years.
        For nothing. To prevent no harm to no people, but to promulgate harm on the all.
        Just so you (those who support the CAGW theories) can “feel good” about your religion of Gaia and death.

      • Hmm increase price to 55 from 50 ? Hmm, whats your COGS. will the increase toss you into a higher tax bracket. NOT so simple…

      • “Damaging the lives of billions between now and 2100, killing millions each year for 85 years.
        For nothing. To prevent no harm to no people, but to promulgate harm on the all.
        Just so you (those who support the CAGW theories) can “feel good” about your religion of Gaia and death.”
        You think that is settled science?
        Saying the Cost is high and that millions will die is just a form of economic catastrophism.
        I suppose you have an econ MODEL to back up that claim… wa.
        A proper skepticism would note that supposed damage from climate change is dependent on models
        Supposed damage from cutting c02 is based on models.
        It’s rather uncertain.
        The decision about what to do.. is not science.. it can be supported by science.. but the decision is not
        a scientific one. It’s political and pragmatic.
        Its good to have a pen and a phone.

      • “Hmm increase price to 55 from 50 ? Hmm, whats your COGS. will the increase toss you into a higher tax bracket. NOT so simple…”
        Still simple. The point is NOBODY who solves practical problems cares a whit about 95% or 94.9%
        or 99.9999999%
        95% is just a tradition. Not written in stone.

      • “Just so you (those who support the CAGW theories) can “feel good” about your religion of Gaia and death.”
        Are U 95% certain that I believe in CAGW? opps U are wrong.
        GAIA? a bunch of crap.
        Try sticking to the topic. The cost of being wrong in science qua science.

      • Steven, you were the one to go off into “marketing” and “economics”,
        Your method was condescending, I used the term “COGS” cost of goods sold.
        In the example you used the percentages are only a small part of the decision making process.
        You had to know I was having a little fun with you. The example you used was not the best for the point you were trying to make.
        Oh and thank you for the reply.

      • Dumb example. What they really do is sell the exact same product under two different brands and model numbers at different prices at the same time. It’s called price discrimination, and it makes marketing go around.
        It sort of like dickering without the dick. Or maybe the other way around.

      • Harold, years ago, I worked for a place that manufactured orthopedic devices. In a test, one of our engineers broke the mold. Sigh, we (the toolmakers) told him it would not withstand a 20% increase in pressure for his test shot. After it split in half, I asked the foreman if he needed me to work through the night to get the replacement mold built. He told me, “No. We manufactured the devices for BOTH of the major competitors, we would just ramp up production for the “other”company until the new mold was built.” God, I love capitalism!
        I laugh to this day.

      • “Steven, you were the one to go off into “marketing” and “economics”,
        yes to illustrate the space where the word SHOULD gets used in making decisions about
        confidence intervals. That is, where values are in play.
        Your method was condescending, I used the term “COGS” cost of goods sold.
        In the example you used the percentages are only a small part of the decision making process.
        Note: I didnt mean for You to answer the question. rather to contemplate those areas where the
        OPs question ( SHOULD) makes sense. I fully understand all the details required in making
        these decisions..
        You had to know I was having a little fun with you. The example you used was not the best for the point you were trying to make.
        The point was simple. Where does the “SHOULD” question make sense
        Oh and thank you for the reply.

      • Have a good night Steven, the wars will wait until tomorrow.
        and again thank you for answering me. and your other comment on “designing chips” has caught my interest. Something to look into, I am ignorant of the subject

      • Well I’m not sure what you mean by being wrong.
        You can mess up an experiment and misread a thermometer. Not to worry; people trying to replicate your results will discover your error. The cost to you is egg on the face.
        You can be wrong by postulating a theory of how some system or process works. Once again; not to worry. Other people doing experiments to test your theory will discover it simply doesn’t work that way. The cost to you is no Nobel Prize award for a meritorious discovery.
        So you change the theory to bring it better into line with what the experimental observations say is actually happening.
        That is how science has always operated.

      • Y’all, Steve is raising a valid point. Quit bashing him just because of history.
        Yes, there is a price of getting it wrong. In this case, it’s my interpretation that the price of being wrong is far less in the do-nothing scenario than in the action scenario. The results of action are more immediate and the effects more certain (notably in increased poverty and decreased ability to raise oneself from it as well as decreased aid from rich nations and damage to the environment due to increased deforestation from biofuels, increased land use from wind farms, and overall reduction in available environmental funds), versus the nebulous and questionable effects of CO2 (which might or might not alter rainfall patterns for the worse in some areas and for the better in others, along with a host of minor issues including negligible increase in temperature and even smaller increase in sea level rise rate).
        It’s not calculable even in theory due to the uncertainties involved. The point that the article is making is that trying to distinguish between tiny discrepancies is pointless and leads to self-delusion. In this case, however, the costs aren’t even close. It’s obviously better to not try and reduce CO2 through wind, solar, or biofuels, and carbon markets are pure financial smokescreens. However, nuclear or hydrological power are useful sources in their own right.

      • You are doing it wrong… You set the price at $80. Then you lower it $5 each time sales start to erode. Now you have the HP pricing model when they released their RPN calculators in the 80’s. All sales above marginal revenue are economic profit. 🙂

    • Glad to see you’re weighing in on this topic, rgb. Especially after seeing Tom Siegfried’s knee-jerk statement.

    • I was fortunate to have George Marsaglia as my advisor in graduate school. A great professor!

    • Were I forced to play Russian Roulette, one using a 20 chamber revolver would be preferred over a standard 6 shot revolver.
      Still, even a standard revolver beats using a semi-auto. One of the funniest news stories I ever read (only the one about the guy who got angry at the soda machine and started rocking it to get his drink is funnier) was about the guy in Chicago who, after seeing Deer Hunter, decided to play the game with an automatic pistol. That’s what I call a self correcting problem.

    • rgbatduke says: “p happens means that one simply should not take a p of 0.05 terribly seriously”
      As always your posts are both insightful and educational … thank you! So I thought I’d pass along what to me is a humorous (or is it sad) anecdote related to the p-value” comment above. I’ll try to make it brief:
      I spent my career at a major aerospace company that invoked “Six Sigma” in the 90’s. The blackest of black belts issued a “best practice” memoranda describing a “ground-breaking analysis” to improve aeropropulsion engine performance and the study found “five key controling variables”
      So, consider the experimental design: y was engine performance and x was a matrix of 100 variables … ranging from the physically realistic to the improbable (not scientifically identified). The study results identified 5 of the 100 variable as statistically significant the 95% level using multiple regression (no surprise there). I then challenged the grand-six-sigma guru by talking about just what a 95 confidence level means and Tukey’s teachings regarding the multiple comparison effect. If the 100 variables were simply random variables, the results would likely be the same.
      I’m sure it’s obvious to most, frequentist statistics are predicated on making a pre-defined hypothesis and then testing it; it’s not a hunting license to search for correlations within a database that meet a “p” value that “itself” perverts the notion of probability.

    • The appropriate p value needs to be determined by analyzing Type I errors (p value or alpha) and Type II errors (beta). A p value of 0.05 may be far superior to p=0.01 in terms if statistical power if the beta value (probability of Type II error) grows very quickly as the Type I error is decreased by a lower p value.
      In terms of Russian roulette, consider that you p value diminishes as the number of cylinders is raised from say 20 to 100, but the Type II error may grow very fast. Again, if the Russian roulette analogy is interpreted a p= 1/no of cylinders then beta may be something such as the cylinder exploding when the cylinder walls become thinner. The point is that in hypothesis testing as the p value decreases the beta value tends to increase.
      For a simple discussion of the relation between Type 1 type errors and Type II errors please see:
      http://statistics.about.com/od/Inferential-Statistics/a/Type-I-And-Type-II-Errors.htm and here is an explanation that I think is more understandable:
      Finally a larger random sample size lowers the probability of either error.

  2. great article explains how people in lab coats could publish the notion that inhaling smoke scattered and defused in the air could be WORSE for a person than inhaling 100% of the smoke directly to the lungs without really anybody saying WUWT!

    • The smoke that hadn’t gone through the filter could be worse is easy to see if in fact the filter was actually filtering out harmful particles. The fact that the smoker would breathe the same unfiltered smoke as they sat in the room where they were smoking so would be getting it as well as the filtered. So many points of not just WUWT but WTF with the controls on those studies. Too many “ifs” and “assumptions” in those studies.

      • Maybe you’ve never observed someone smoking. Almost all of the “smoke” generated when a cigarette is used comes from being exhaled by the smoker, thus through the cigarette filter (if the cigarette is filtered) and the lung filter of the smoker. (Which is why first hand cigarette smoke causes lung cancer, and 2nd hand doesn’t…at least “statistically” speaking)

      • “Which is why first hand cigarette smoke causes lung cancer”
        I hate to quibble, especially about something as detestable as cigarettes, but I think I must.
        Smoking is associated with an increased risk of lung cancer.
        It is incorrect to say it causes it, as a stand alone statement.
        I hate cigarettes, and do not and have never smoked, but in fact lot of people smoke all day long, everyday, and have done so since they were teens, and do not get cancer. I have four siblings for which this is the case.
        Just sayin’.

      • TRM even a relative novice, like myself, can see that the basis of these experiments is nonsense. I don’t care, hate cigarettes, all you want to do is be sure that science is telling you the truth or something close to it. It does not require that you make up an ad hoc reason that you should BELIEVE the conclusions

      • To some of us it really has little to do with health and everything to do with dealing with the smoker’s bad manners. A smoker’s sense of smell and taste is deadened by smoking. They assume that everyone around them has the same handicap and fail to understand that cigarettes in particular stink. There are tobaccos that don’t, but not in cigarettes. One trick I used to convince my two-pack-a-day dad was to scent trail him in the dark. Cigarette smoke also flavors every bl**** thing you put in your mouth. If you have eaten in a European restaurant, and if you’re a non-smoker, then you have had the wonderful experience of tasting first class wine-plus-ash-tray, truly excellent venison-plus-ash-tray, etc.

    • Though you would have to balance the effect of one cigarette in your mouth against the dozens of cigarettes burning in a public place, and the fact that you are inhaling the secondhand smoke all the time you are in there instead of just for 5min every hour or so.
      In assuming that the one cig you smoke is worse than the many others in the room, you are assuming a linear relationship between smoke concentration and effect. That might be incorrect, after all the effect of CO2 on greenhouse effect is not linear but near to saturation. It may be that the effect of smoke is chronic, that continuous exposure to smoke irritates the lungs more than a high concentration for a short time. Without investigation of these factors, no-one knows,.
      Though, it ihas been suggested that continuous exposure to lower levels of smoke from unventilated wood fires is what caused hunter-gatherer society humans to have a lifespan of only half the modern one.The traditional line of archaeologists was that this was down-to the harsh living conditions, but the reality might just be that one single factor was responsible, a factor which the people themselves failed to recognise and therefore took no action to remedy.
      All of which in principle shows just how difficult it is to obtain meaningful statistical results,

      • I believe he was actually referring to the background PM studies that show 30 ppb particulate matter in the air causes an endless list of health problems. Those are the issue. Secondhand smoke, especially in concentrated areas, is much more suppored (in fact, if these suggestions were taken into account, I think the EPA would have needed much less statistical trickery to get it labeled as a health hazard).
        The issue is really the knowledge that a pack a day takes 5-10 years off your life, but the EPA is claiming 30 ppb in the air (a small fraction of the exposure that the smokers get) can be responsible for hundreds of thousands of premature deaths annually.

    • Perhaps Science was playing Russian Roulette with rgbatduke’s 20 chamber revolver with 19 bullets, or, more effectively….playing said roulette with a Glock….which generates a “winner” every time.

      • Sorry but you’d have to explain one that to UK readers:
        A Glock is a pistol
        OK, what’s a ‘pistol’ ?
        A pistol is a short gun.
        Gun? Please explain.

      • Ian,
        A Glock is a Swiss manufactured semi-automatic handgun. Semiautomatics fire from a spring loaded magazine (cartridges are stacked atop one another) which feeds the chamber of the firearm. The are some differences between single action (the weapon requires the operator to load the first round into the chamber by working the action (normally a slide on a handgun) which also cocks the hammer and a double action (the trigger mechanism cocks the hammer – I believe you still have to work the action to load the chamber – I only shoot SA autos), but essentially, if you have a semiauto pistol, with a round in the chamber and the safety off, the odds are almost 100% that the weapon will discharge. When playing RR those odds are only good for the spectators betting on the action.

    • The problem is that people use the term science about unscientific methods.
      The 500 year old method of inductivism has been demonstrated to be flawed.
      However it seems to be dominating the works by IPCC.
      Not enough people endorse the empirical method of Karl Popper.
      The empirical method is about making precise, falsifiable statements and then putting all effort into trying to falsify the statement. The empirical content of the theory is higher the more the theory forbid. A theory is merited by the attempts of falsification it has survived.
      Popper must be turning in his grave by the confidence and agreement statements by IPCC.

    • And even worse than that – 36 out of 65 Nobel laureates signed an unscientific statement.
      At least there were some reluctancy – 29 of them demonstrated some integrity by not signing – giving at least some hope in scientific integrity.

  3. The more you look at it…..the more skeptical you become
    Until you reach the point of the most asinine ridiculous piece of fabricated BS you ever thought you would like to see…………

  4. Offline: What is medicine’s 5 sigma?
    Richard Horton
    DOI: http://dx.doi.org/10.1016/S0140-6736(15)60696-1
    “… this symposium—on the reproducibility and reliability of biomedical research, held at the Wellcome Trust in London last week—touched on one of the most sensitive issues in science today: the idea that something has gone fundamentally wrong with one of our greatest human creations.
    The case against science is straightforward: much of the scientific literature, perhaps half, may simply be untrue. Afflicted by studies with small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, science has taken a turn towards darkness. As one participant put it, “poor methods get results”. … The apparent endemicity of bad research behaviour is alarming. In their quest for telling a compelling story, scientists too often sculpt data to fit their preferred theory of the world. Or they retrofit hypotheses to fit their data. Journal editors deserve their fair share of criticism too. … Our acquiescence to the impact factor fuels an unhealthy competition to win a place in a select few journals. Our love of “significance” pollutes the literature with many a statistical fairy-tale. We reject important confirmations. Journals are not the only miscreants. Universities are in a perpetual struggle for money and talent, endpoints that foster reductive metrics, such as high-impact publication. National assessment procedures, such as the Research Excellence Framework, incentivise bad practices. And individual scientists, including their most senior leaders, do little to alter a research culture that occasionally veers close to misconduct.
    “One of the most convincing proposals came from outside the biomedical community. Tony Weidberg is a Professor of Particle Physics at Oxford. Following several high-profile errors, the particle physics community now invests great effort into intensive checking and re-checking of data prior to publication. By filtering results through independent working groups, physicists are encouraged to criticise. … Weidberg worried we set the bar for results in biomedicine far too low. In particle physics, significance is set at 5 sigma—a p value of 3 × 10–7 or 1 in 3·5 million (if the result is not true, this is the probability that the data would have been as extreme as they are). …”

    • From the second link, ” insist on replicability statements in grant applications and research papers”
      would go a long way to clean up this mess.

  5. An improvement in the media coverage of science would need better journalists. Not going to happen.

    • At a minimum we need journalists who believe that their job is to inform rather than to educate and influence.

    • Journalists, like scientists, want to eat at the “cool table” in the school cafeteria, so they come up with stories highers-up want to see, and nothing sells better than imminent catastrophe. Global warming is right up there. The fact that the end of the world keeps getting put off every few years bothers them not at all.

  6. I have not seen any mention of the fact that we are in an interglacial period during which temperature is expected to rise and continue to rise…until it doesn’t, and then back to another ice age.

    • Given the fact that the CAGWers have “high jacked” every Degree K of Interglacial Global Warming (IGW) that has occurred post-1880 …… they dare not make mention of the existence of any IGW.

    • Interglacial T is not “expected” to rise until it doesn’t, and back to another ice age, nor has it. For three thousand years each warm peak, has been cooler then the proceeding.

  7. There is not one size fits all for statistics.
    P values at the 1/20 threshold (2 sigma) are not what people like unless the study is really expensive or the benefits are worth banging on ahead (the old hackneyed example is people are dying by holding up a treatment).
    The other time low p-values don’t matter is in early experiments in a paper that are later confirmed later (hopefully in the same publication) by different approaches.
    But really, most people push things to the 3 and 4 sigma level. (better than 1/100 to better than 1/1000.) I see no problems when you are in that region, as long as the exp design is okay.
    Estimation is great sometimes, but most tests are experimental against control and estimation often makes no sense.

  8. Science has certainly become tainted by marketing in recent decades, just compare the famous sentence in the Watson-Crick paper on the double helix, which went something like “our results may shed some light on the mechanism for inheritance” with the hype found in many modern climate science papers.
    Science papers should simply describe what the authors have done, let the readers decide if this is Nobel material or worthless junk.

    • I concur many times over, It’s compounded by academic cheer-leading … I’ve seen too many recent dissertations with the word revolutionary or groundbreaking associated with the results. Back when I earned my PhD we were told: 1) This is not your magnum opus, it’s just the start of what can be a good career, 2) The purpose of the dissertation is to establish, to your committee and the senior faculty, that you understand the process of designing and executing quality original research. If there is sound logic behind your research questions, your research design, and yet the null hypothesis fail, then your dissertation research was a success, since we all have learned something.

    • Which is the basis for the Nobel physics, chemistry, and medicine awards often waiting decades to ensure the discoveries stand to replication. Unfortunately for many this consigns the discovers to No Nobel, as they are posthumously awarded.

  9. The greatest problem with the use of P-values is that researchers use them to measure the value of observations as evidence for or against a hypothesis when they are not designed for this purpose, and using them in this manner leads to some very illogical results. Neyman-Pearson inference suffers from the same problem, as do the use of confidence intervals. They are all part and parcel of the same line of thinking.
    Bayesian methods are meant to answer a completely different question. In effect, Bayesian methods tell a person how to modify prior beliefs and remain consistent with probability in light of new data.
    The only method that measures data as evidence for or against a particular hypothesis is the method of likelihood. Most scientific questions can be framed as a test of one hypothesis against another, so likelihood should be the first thought in analysis, but P-values, confidence intervals and Neyman-Pearson inference are simpler to do, and more widely accepted. Well, bleeding patients was widely accepted practice at one time as well.
    No amount of statistics will save science from fraud, bias, bad models, circular reasoning and other logical fallacies, or for the corrupting influence that sources of support, including the Federal government bring.

    • Statistics always tells you only the properties of numbers you already know. They contain no information about numbers you don’t yet know. And all of the information you will ever have about that set of numbers is contained in the set of numbers itself. anything added is information about statistical mathematics algorithms; not about the data set numbers.
      It has been said that the signal with the highest possible information content, is white Gaussian noise.
      It is totally unpredictable and no future sample of the signal can be deduced; so it is 100% information about itself. You only know what it is once you have it.

  10. An example of how adding “color” as a variable increases the number of ways to find a false result, making a false result more likely. How can any study with a 1/20 odds of being wrong be called robust?

    • Some particle physics experiment might be considered valuable if you simply get the right order of magnitude of an effect. But if you want to know the wavelength of the red Cadmium line measured against the standard meter bar, maybe eight significant digits isn’t close enough. ( I think it is; or was 6438.4696 Angstrom units) But don’t quote me on that.

  11. #s 7, 8, 9, and 10 are wrong. Adoption of these points would make research worse, not better. Testing the null hypothesis is the gold standard and should remain so. Trouble is there are many ways to get to a p value, and the climate game is to choose the one that has the best chance of getting to a p value you like. Worst cases are when there are none that lead to a p value without smelling bad so estimations and confidence intervals are used instead. All four of those points should be trashed for just one: Ban the use of statistical test SHOPPING and instead outline recommended statistical testing based on study design. I would further emphasize that any study that resorts to estimations and confidence intervals should be labeled as snake oil salesmanship and be banned from ever seeing the light of day in a research journal.

    • I do believe you have it exactly right. No amount of research guidelines and formalisms can protect an endeavor
      from unethical behavior or a deliberate effort to deceive. Statistics in general, and wee p-values are not the problem here, as abused as they sometimes are.
      I also like to remind people that ClimateScience! is not science, it is politics. What those people do should not be held against the real sciences. And that is my Robust conclusion.

      • You give me too much credit. My experience with statistics comes from one audited graduate level class (but I did all the assignments and earned an A), running my data through my own purchased Statview SE program for my published research, and an oddity in my brain: math intuition. Any statistician could easily run circles around this one hit wonder.

    • Pam, all you’re suggesting is that there is a ‘gold standard’ statistical method. And there are two corollaries to this:
      1) If such a gold standard exists, then you would already be recommending it.
      2) That statistical correlations can answer scientific causation.
      But no matter how we turn the subject around, statistics will always fail to properly test the null hypothesis. For if there is a relation between two variables, then we can simply construct an experiment, fiddle one, and watch the other dance. Within science there are only three purposes for statistics:
      1) To do initial inquiry to see if some large and unexpected something is unavoidably present and worth chasing further.
      2) To put distributions on the error bounds between the mathematical description of the fiddling mentioned above and the received observational values.
      3) When it is impossible to actually perform an experiment.
      And that last is where we get into the very serious problem of statistics in science: For how can it be science if an experiment is impossible?
      Which is not to say that statistics don’t have their uses. But it only shines when we can state that we know what should or must be the case, because we have engineered it successfully, and then compare it to what we actually get.

      • That is not what I am suggesting at all. There are a variety of methods to determine statistical significance that can accurately determine results that fall outside of chance occurrence. The issue is ill-matched methods with type of research design. If there is a gold standard it is this: Pay great attention to your choice of p-value methods lest your ill-chosen methods lead you down the primrose path to error.

    • “Testing the null hypothesis is the gold standard and should remain so. ”
      Interesting argument or rather interesting LACK of an argument.
      So much of of scientific understanding comes in areas where there is no null, that it is odd to make the assertion that you do.
      The “null” is a tool. sometimes handy. other times as the author argues it leads you astray.

  12. “Promotions, tenure, raises, and job offers are all dependent on having a long list of publications in prestigious journals, so there is a strong incentive to publish promising results as soon as possible.”

    It’s worse than that. Before those 4 things are even on a young post-doc’s horizon, publishing for becoming competitive for that first big government research grant must occur in today’s environment. Publishing null hypothesis affirmations are the short path to a job in industry.

  13. “but on the whole, science deserves credit for providing the foundation underlying modern civilization’s comforts and conveniences.”
    Disagree – Fossil Fuels, mainly oil, provided the leap for humanity.

    • Without the products of science, fossil fuels were just gooey black stuff that made a mess of your shoes.

      • MarkW:
        Fossil fuels required the steam engine for their chemical energy to do useful work.
        Science owes more to the steam engine than the steam engine owes to science.
        Attributed to Lawrence Joseph Henderson

    • Fuel science, anyone?
      “Son, I have one word for you. Plastics.”
      +1 to anyone who can name the movie.

    • wrong. Try horse collar, stirrups selective breeding for animals (horses!) and plants. Societies go through many phases one advancement the logical out come of earlier ones, You need some earlier “techs” to successfully use others.
      Note the lack of the wheel in south american societies. This list is is but an off the cuff snarl. Others can add or substitute from it. Last I think it is better to say it is the engineers rather then the scientist who is paramount in human development .

  14. “vaccine evaders”
    So vaccines are prison now? And they openly admit that?
    Or maybe it’s a religion? Apostates will be eradicated?

  15. I gather this has to do with hockey sticks and IPCCs GCMs. Interesting, but beside the point. All these statistical analyses, five significant figure anomalies, results beyond the resolution of the instrument, attempt to back justify CAGW. Something akin to the Wonderland Queen’s verdict first, trial later. What really matters:
    “These questions have been settled by science.” Surgeon General
    IPCC AR5 TS.6 Key Uncertainties. IPCC doesn’t think the science is settled. There is a huge amount of known and unknown unknowns.
    According to IPCC AR5 industrialized mankind’s share of the increase in atmospheric CO2 between 1750 and 2011 is somewhere between 4% and 196%, i.e. IPCC hasn’t got a clue. IPCC “adjusted” the assumptions, estimates and wags until they got the desired mean.
    At 2 W/m^2 CO2’s contribution to the global heat balance is insignificant compared to the heat handling power of the oceans and clouds. CO2’s nothing but a bee fart in a hurricane.
    The hiatus/pause/lull/stasis/slowdown (IPPC acknowledges as fact) makes it pretty clear that IPCC’s GCM’s are not credible.
    The APS workshop of Jan 2014 concluded the science is not settled. (Yes, I read it all.)
    Getting through the 1/2014 APS workshop minutes is a 570 page tough slog. During this workshop some of the top climate change experts candidly spoke about IPCC AR5. Basically they expressed some rather serious doubts about the quality of the models, observational data, the hiatus/pause/lull/stasis, the breadth and depth of uncertainties, and the waning scientific credibility of the entire political and social CAGW hysteria. Both IPCC AR5 & the APS minutes are easy to find and download.

  16. Tom Siegfried makes the same mistake classically made by social constructivists (not saying he’s one), which is to confuse and conflate science with scientists.
    Here’s the jump: “But for all its heroic accomplishments, science has a tragic flaw: It does not always live up to the image it has created of itself. Science supposedly stands for allegiance to reason, logical rigor and the search for truth free from the dogmas of authority. Yet science in practice is largely subservient to journal-editor authority, riddled with dogma and oblivious to the logical lapses in its primary method of investigation:…” Typical. Start the sentence about science, finish it with the behavior of (some) scientists.
    Science is about the interplay of a falsifiable theory, and reproducible data. That interplay and the practice of it by scientists, are where all our advances have originated. The fact that individual scientists have foibles is lovely for sociologists, but does not reflect at all on science itself.
    Second, Psychology is not a branch of science. The fact that, “100 results published in psychology journals shows that most of them evaporated when the same study was conducted again” says nothing about a sad state in science.
    Likewise, Epidemiology is not a branch of science, and epidemiological correlation-chasing is not part of the scientific method. It’s standard practice in these ‘science is flawed‘ articles, to immediately offer abuses in Psychology and medical epidemiology as proof texts. But those fields are not part of science.
    That logical disconnect between the accusation (science is flawed) and the evidence (after all, look at Psychology… blah, blah, blah) is typical, and is a kind of intellectual bait-and-switch. Look for it. It’s always there, it’s always a sign the author has an ax to grind about science, and it’s always a dead giveaway that the thesis is bankrupt.
    In science, predictions are deduced from hypotheses and theories. The falsifiability of hypotheses and theories means that they make logically coherent but extremely unlikely statements about how the physical universe works. Predictive statements imply one and only one observable outcome. Physical deductions and predictions imply causality and invite observations and experiments as tests of the causal claim.
    Epidemiological correlations are inductive inferences. They imply no causality and predict no observables. The rooster crows and the sun rises. The correlation is strong. Whoop-de-do. Does anyone think the former is causal to the latter? That’s epidemiology and that’s about the entire causal content of Psychology.
    Science itself is not undercut by the fact that some scientists are foiblicious, or that medical epidemiology is infested with crockness. Those arguing from the latter to the former merely demonstrate a non-understanding of science itself.

    • And those people would be yet another group, which is science journalists. Usually (but not always), they have a very thin understanding of science, and don’t understand how thin it is. So ‘science’ is what they imagine it to be, and like a groupie spurned, when “science” doesn’t behave they way they expect it to, they lash out like this.
      Technology journalism is frequently just as bad, with endless stories about perpetual motion machines, etc.
      The whole journalistic enterprise is a disaster. It’s gotten to the point where the only journalists who can cobble together a coherent sentence are the ones with JD degrees.

    • Pat writes “Science is about the interplay of a falsifiable theory, and reproducible data.”
      Whilst this is true, its not specific enough. So in climate “science” there is extensive use of models because largely they have no choice. People like Mosher dont seem to understand the difference between a model that organizes data into an understandable form (eg global temperature anomaly calculation) and a model that makes a projection. He just sees that models are ok in general.
      So when climate science uses projection models it no longer uses data except in the sense that its now data from a model and has nothing to do with reality.
      Climate science should be starting with the hypothesis that their GCMs model reality but they seem to have skipped that step and moved on to the assumption they do and draw results relating to reality from them. Its pretty obvious that they dont in so many respects that its outrageous to move to the next step of drawing climate conclusions using them.
      At that point climate “science” became non-science for a great many papers. And since they all draw upon one another, the whole field is tainted.

  17. “Above else, their analysis suggests, the problems persist because the quest for “statistical significance” is mindless. “Determining significance has become a surrogate for good research,” Gigerenzer and Marewski write in the February issue of Journal of Management. Among multiple scientific communities, “statistical significance” has become an idol, worshiped as the path to truth. “Advocated as the only game in town, it is practiced in a compulsive, mechanical way — without judging whether it makes sense or not.””
    Climate “Science” has neither the statistical support, nor does it make sense. Climate “science” fails on both fronts.
    1) Statistical analysis of every ice core I’ve looked at covering the Holocene shows that a) we are well below the temperature peak of the past 15k years and b) there is absolutely nothing statistically significant about the temperature variation over the past 50 and 150 years.
    2) Geologic record covering 600 million years demonstrates that atmospheric CO2 was as high as 7,000 PPM, and we never had run away global temperatures, in fact temperatures never got above 22 degree c for a sustained period of time. We fell into an ice age when CO2 was 4,000 PPM.
    3) Mother Nature isn’t stupid, Earth and life has survived billions of years. The absorption band of CO2 is centered around 15 microns IR, that is consistent with a black body of -80 degree C. Only a small fraction of the radiation at 10 microns IR (The Average Earth Temp) is absorbed by CO2, and as it warms, the peak shifts to the left, and CO2 absorbs even less of the radiation. Unless my eyes deceive me, it looks like CO2 doesn’t even absorb IR at 10 Microns.
    BTW, if CO2 doesn’t absorb at 10 Microns how in the hell can it be the cause of warming? CO2 would have to be absorbing radiation hotter than the earth to warm it. The more I look into this “science” the more nonsensical it becomes.
    It looks like CO2 stops absorbing at 13 microns.
    13 Microns is consistent with a black body of -50 degree C.
    This “science” is pure garbage. How can radiation IR at -50 Degree C warm the globe? What a joke.

    • Well you need a bit of retuning there CO2islife.
      If the earth surface really radiates a 288 K black body like spectrum, the as you say, it peaks ( o a wavelength scale graph at 10 microns 920 times the sun’s peak wavelength).
      But 98% of that total radiated spectrum energy lies between one half of the peak wavelength (5.0 microns) and 8 times the peak wavelength (80 microns) Only 1% remains beyond those two end points, and only 25% of the total energy is at wavelengths less than the peak of 10 microns.
      So the spectral radiant emittance at 15 microns; the CO2 band is quite substantial. In fact my handy dandy BB calculator says at 1.5 times the peak wavelength the spectral radiant intensity is 70% of the peak value.
      So indeed CO2 has a lot more than peanuts to feed on.

      • George,
        A question for you. If all the radiation from the sun was only at the electromagnetic frequency of 15 microns, what is the maximum temperature that the earth could reach? If the answer is warmer than -80c then I think you have answered CO2islife’s challenge.
        My understanding of CO2islife’s point is that if you direct a fire-hose of water at a building and the water is at a temperature of say 50c, the building can not warm up to more than 50c – no matter how much water you hit it with.
        I don’t know the answer by the way and I hope that you do.

      • I do have a non scientific theory in that I observe that a microwave cooker can heat water to boiling point even though microwaves have a longer wavelength that 15 microns. But I have to admit I don’t know enough about microwaves to know if they also have a broader spectrum of emissions.

        • ” I observe that a microwave cooker can heat water to boiling point even though microwaves have a longer wavelength that 15 microns. But I have to admit I don’t know enough about microwaves to know if they also have a broader spectrum of emissions.”
          Microwaves heat dipole molecules by making them vibrate, basically friction. Thermal photons (IR) don’t heat like that.
          But i think microwaves show that’s there’s more than one way photons can transfer energy.
          If the Sun’s blackbody output had a peak wavelength of 15u, It does have a surface temp of about 192K. And the Earth would be be limited to a max 192K.

      • Well Bernard, that is an interesting question. Not one pertaining to reality though.
        You ask if ALL (my emfarsis) radiation from the sun was 15 micron wavelength, what would be the maximum obtainable Temperature.
        So I’ll get pedantic and take your recipe at face value.
        If ALL of the radiation is 15 micron wavelength then NONE of the radiation is at ANY other frequency or wavelength.
        Ergo, the sun must be a perfectly coherent source of 15 micron wavelength EM radiation. which is 20 THz frequency, so it is a laser, or a maser, whichever you prefer.
        A laser behaves as if it is a near point source with a diameter of the order of one wavelength. Well it has a Gaussian beam profile, with a I/e waist diameter in that range (or radius). My short term memory can’t recall the exact waist diameter.
        But the point is that the 860,000 mile diameter sun would appear to be a much more distant roughly one micron diameter source.
        Well the real sun is not a coherent source, it is quite incoherent, and its apparent angular diameter as seen from the mean earth sun distance is about 30 arc minutes.
        The optical sine theorem which is based on the second law of thermodynamics, says that it is possible to focus the sun image down to a size which can reach a Temperature equal to the about 6,000 K sun surface Temperature (in air).
        The limiting concentration is 1/sin^2 (0.5 deg.) = 13, 132 times (area wise).
        Prof Roland Winston at UC Merced (actually in Atwater) has in fact focused the sun’s image to an areal density of over 50,000 suns; more than 4 times that above limit.
        Well I left out a little factor in the above expression.
        The concentration limit is n^2 / sin^2 (0.5 deg.) where n is the refractive index of the medium in which the image is formed.
        Winston made a solid CPC (Compound Parabolic Concentrator) out of a YAG crystal which has a refractive index of about 2.78 so he got a factor of about 7.7 out of that, and with losses he ended up at I believe 56,000 suns; highest ever achieved. But you can’t ever get all of that energy out of the medium into air. Most of it will be trapped and vaporize your crystal. Dunno how Roland kept his YAG from melting with the real sun, but he would get one hell of a full moon spot (it is NOT an image).
        Winston is one of the original and most regarded gurus of Non Imaging Optics, and one of the inventors of the CPC.
        But in principle, you laser sun can be focused down to a Gaussian spot about 15 microns in diameter, in which case it would be much higher Temperature than 6,000 K.
        But then the sun is not a laser or a coherent source, so nyet on doing that experiment.

      • Thanks Micro6500.
        If you are right then CO2islife is correct in stating that 15 micron back radiation could only warm the earth to 192 K or -81 c. This would appear to an important conclusion that basically removes CO2 as something to worry about!
        Thanks George also for your reply.
        Though I did not mean for you to launch into an explanation about focusing lasers! Perhaps I should have re-phrased my question to avoid any reference to the sun and ask what the temperature of the earth would be if it was bathed only in 15 micron IR waves? Forget the sun. If Micro6500 is correct then CO2islife is correct and CO2 does not cause global warming. I wonder if you agree with this logic and this conclusion?
        I fear this thread has aged out now but I think it is a topic that should get more discussion.

        • If the Sun had a peak at 15u, it wouldn’t, but it’s some 6,000 degrees, so it will depend what captures the 15u and if it’s thermalize, if it is, it will be based on the flux in joules, which could warm it more than 192K.
          So it’s more depends. Just like you heat something with the Sun far warmer than 6000 degrees.

      • The sun clearly emits at all the wavelengths in the electromagnetic spectrum that we know about. The key issue is that CO2 absorbs IR at 13-15 microns and not at any other wavelengths (significantly). Those wavelengths equate to emissions from a black body at at a temperature ranging from -50c to -80c. Agreed, that even at those low temperatures, the black bodies emit at other wavelengths as well but that doesn’t make any difference as CO2 is only absorbing in the range of 13-15 microns. A body at a given temperature cannot raise the temperature of another body to a higher temperature than its own through its emissions. It doesn’t matter if the emitting body is a million times bigger than the receiving body, it still cannot raise the temperature of the receiving body above its own temperature. It would therefor seem that the effect of CO2 absorbing at 13-15 microns cannot be more that raising the earth’s temperature to -50c to -80c. In other words, CO2 is not a problem. Though not a scientist, I have raised this point several times and have so far not got an explanation of why this is not true.

        • Though not a scientist, I have raised this point several times and have so far not got an explanation of why this is not true.

          I think it’s because there is a depends, if you’re in space have 2 identical “Black Bodies” one at 193K, it will not warm the other above 193K.
          But photons carry energy, and when you stream a high power flux, if the flux is absorbed, it can make something far hotter than the temp of the flux, think microwaves, and lasers, even the 6,000 degree Sun can be focused to many times that.
          And then there’s the ability to reduce cooling rate, while the surface of the earth cools to space, anything that sends some of that energy back, cold or not slows that cooling rate.

      • Bernard,
        Courtroom lawyers are very careful to ask exactly the question they want answered; and their other rule is to never ask a question they don’t already know the answer to.
        Same sort of thin works in science.
        If you don’t ask specifically the question you want an answer to then you aren’t likely to get the answer you expected.
        I’d like a dollar for every time somebody told me; “that wasn’t the question I asked.”
        Well to the best of my ability I try to make sure that it certainly is the answer to the question they asked.
        So you did ask what would be the temperature if the sun emitted only 15 micron radiation.
        I gave you an answer to that based on the fact that there were no other conditions.
        So now you posed a different question about the earth, sans sun, being bathed only in 15 micron radiation.
        That is not sufficient information to give any meaningful answer other than the earth would be bathed only in 15 micron radiation.
        What else might happen would be pure speculation.

    • Some typos there.
      (on a wavelength scale graph)
      And (20 times the solar spectrum peak wavelength) Wien’s Displacement Law.

  18. Ban p-values? ‘You have to be joking’ as John McEnroe once said. This would be the equivalent of banning information about the probability of a false positive from a clinical (diagnostic) test. Possibly the biggest problem is bad tests. In some cases, it seems somewhat like researchers testing for lung cancer with a test which happens to be proficient at detecting herpes (and other irrelevant diseases). In more formal language, tests which are actually rejecting ancillary hypotheses not of direct relevance to the claimed results.

  19. This article made me remember a question I’ve had for a while and it has to do with (purported) rising ocean levels. I go at this as an educated layman, not a professional in any related field.
    I have read that one prediction of “global warming”, whether anthropic or not, is that ocean levels will rise. In some science articles I’ve seen, the topic material seemed to assume that “rising ocean levels” was a verified phenomenon. Yet, in other articles that are more skeptical of AGW the exact opposite seems to be assumed.
    First, even if there has been a detectable increase in ocean levels I find it difficult to understand how that fact can somehow make AGW “more true” than otherwise. There seems to be other “predictions” in AGW that have not panned out beyond some reasonable MOE.
    To cut to the chase: Are there trustworthy measurements that indicate ocean levels have indeed risen and over time beyond those one might attribute to non-human causes?
    Even if the answer is “yes” it still does not rescue AGW theory from other serious failings but perhaps others have noticed this one “prediction” turns up time and again in science articles related to AGW.

    • Sea level is rising. The argument is over whether or not there is a rate of change of sea level outside natural parameters at work, presumably from an anthropogenic effect.

  20. During my time I have seen a lot of researchers switch from measuring tolerances and computing the error range of the final number based on those measured tolerances, to statistical analysis. While statistical analysis does offer the ability to come up with a final number with a smaller uncertainty range (90% confidence interval) than error range, any mistake in the statistical analysis will often put the final number outside the error range and thus total meaningless.
    Statistical analysis is a great tool, but it is a double-edge sword capable of shredding our work with ease. I’m afraid we have gotten lazy over the past few decades.

    • All statistical tests contain mathematical assumptions about the data being tested, and the nature of the underlying ‘true’ population from which the data sample is pulled. If any of those underlying assumptions are false, then that particular test is invalid. For example, it was assumed that commodity and stock price movements are Gaussian, so normally distributed. Benoit Mandelbrot showed they behave fractally, so are fat tailed, so not normally distributed. In 1972 I programmed the Kolmogorov- Smirnov test proving this for the entire NYSE over an entire decade of daily price movements, for John Lindner’s paper showing one therefore needed to do all commodity and stock price statistics using a lognormal population assumption.

      • And this is precisely the problem with statistical hypothesis testing: It’s comparing a proposed theory against the gubbins of an unknown and undescribed reality. There is no valid manner in which to choose the ‘actual’ probability distribution of reality until you’ve collected enough data. But if you’ve collected enough data, you don’t need null hypothesis testing. The actual data either invalidates the proposed theory or it does not.
        Which is precisely backwards from, say, quality assurance in manufacturing. Where all the gubbins of reality are known, described, and laid out on the shop floor. The issue is not then the gubbins, by whether it’s doing what you designed it to do on the fringe ends of things.

  21. The problem isn’t the use of p values. It’s that science and math are too hard for most scientists and mathematicians. If that were not so, it would be perfectly clear to them just what inferences can and cannot justifiably be drawn from a given p value in a given case.

    • It’s that science and math are too hard for most scientists and mathematicians.

      We do seem to be in a bad mood today, don’t we?

    • Speak for yourself.
      Most scientists or mathematicians do not find those subjects to be too hard. That’s how they were able to get degrees in those disciplines.
      Now if the examiners (lazy SOBs) use multiple choice tests and exams; then no wonder they graduate students who aren’t qualified.
      California (and other State) drivers licence written tests are the only multiple choice exams I ever took.
      You can’t test somebody’s knowledge by giving them the answer and asking them if it is correct.

  22. Was it not Sir Ernest Rutherford; who said,”If your experiment requires statistics, you need to design a better experiment”?
    Of course this over reliance on “modified statistical methods” is the heart of the doomsayers creed.
    Coulda,woulda shoulda, so give me all your money,enslave your children unto me.
    Nice try by the author,Siegfried, but who is he addressing with the denier schtick?
    Is that for the consensus crew or just a demonstration of his own bias?
    However given the history of the Team IPCC ™ this use of statistics to give false credibility to meaningless numbers, is deliberate.
    Science was nothing but a useful garb to cloak their mendacity in.

    • john robertson: Was it not Sir Ernest Rutherford; who said,”If your experiment requires statistics, you need to design a better experiment”?
      He did but it is an incomplete thought: he did not say how anyone was supposed to have known how to have done a better experiment. A fuller idea is to use the statistical analysis from the last experiment to do a better job designing the next experiment.

    • No it was Lord Rutherford; not Sir Rutherford, and he said “If you have to use statistics, you should have done a better experiment. “

  23. Vaca=cow. Vaccine=a generic word derived from the brilliant idea use cowpox to stimulate immunity to smallpox. Vaccine evaders=those who avoid doses of actual data to immunize against superstition.

    • Anti-vaxxers don’t know how short the half-life of methane is in the atmosphere.

  24. The Lancet editorial has justifiably caused a lot of stir. The scientific method is a process. Scientific knowledge is what the process produces. If the product is bad, then the production process is not good. Many causes upthread.
    One that has not yet been mentioned. It used to be (like when I was learning econometrics) that doing statistical analyses meant getting your hands dirty. Understand the math, write code to do the calculations. Plenty of things could wrong, but understanding what you were doing was not one of them. Understanding things like heteroskedasticity, kurtosis, autocorrelation…These days you just put data into a stats package and menu select the stats you want. And that ability to do without first understanding leads to a lot of really bad stuff. Eschenbach’s recent post on autocorrelation’s impact on effective N (and hence the significance of any statistical procedure) is one of many examples of what can go wrong.
    My personal favorite in climate science is Dessler’s 2010 paper claiming via OLS a statistically significant positive cloud feedback (its touted on NASA’s website) in a near perfect data scatter (near perfect defined as r^2 of 0.02!). Dessler flunks stats 101. So do his peer reviewers. So does Science, which published this junk paper including the scatterplot. Essay Cloudy Clouds.

  25. I am not a climate scientist, but reading that here and elsewhere, I think there are very few climate scientists for whom the words don’t constitute an oxymoron. More accurately they are economists, psychologists, social and political studies proponents … who wish to claim they are scientists for whatever public credibility and influence that may gain.
    At my former university, some faculty have proposed a new masters degree program in social “sciences” that has not a single course in research designs, methods, or statistics. For them significance will be a belief system without any probability. Gradual students will be reading/viewing materials for which they have not the ability to judge, but will only acquire feelings. Several senior faculty have just taken early retirement, having issues with these ethics of the modern university. I left over a decade ago, having fought this trend for several decades and recognizing much more meaningful things to do.
    Whenever I read “Scientists have found, believe, discover … “, that bilge only merits publication in National Enquirer or other grocery checkout rags. We actually purchase those for recreational reading while sitting in the outhouse at camp. (composting toilet, by the way)
    On a more topical note, I see very little evidence that climate science has any clue about the nature of its various data. How is temperature data or other distributed? Is any sample independent? What is the extent of auto-correlation among the samples? Statistics requires knowledge of the underlying nature of the data before you go drilling.
    As soon as one invents a mathematical formula through statistical correlation:
    Global Ave T = 0.0005XCO2 + baseline, one has created a spurious mathematical correlation with no justification for its basis in nature. The end justifies the means. I see very little natural justification in this “field”.
    There is a huge chasm between statistical significance and worldly importance. A 2 degree C temperature change isn’t even weak horseradish. OK /end rant

    • Nothing wrong with teaching statistics. It’s at least as interesting as teaching Origami.

  26. “My personal favorite in climate science is Dessler’s 2010 paper claiming via OLS a statistically significant positive cloud feedback (its touted on NASA’s website) in a near perfect data scatter (near perfect defined as r^2 of 0.02!). Dessler flunks stats 101. So do his peer reviewers. So does Science, which published this junk paper including the scatterplot. Essay Cloudy Clouds.”
    I’ve been promoting the idea of creating a Scientific Data and Conclusion Validation Agency to perform double blind tests on research funded by the government and used to form public policy. It would be much like the FDA is to drugs, and the EPA is to chemical approval. We simply need to apply the same rigorous analysis that the FDA does drugs and the SEC does to stock market firms. Global Warming, err, Climate Change has proven itself to be the most corrupting and fraudulent movement in my lifetime. The science, data and models simply don’t support the conclusions. They have published results that prove that. Simply applying double blind statistical analysis to climate change research, and prosecuting a few of the lead fraudsters will end this nonsense of CO2 driven climate change forever. Demand Congress look into the scientific practices being funded by our tax dollars.
    Billions spent, and these nit wits can’t even create a computer model to demonstrate their fraud. Bernie Madoff could have done a better job with a lot less money.

    • Actually, for many climate papers you don’t need to do any statistical replication. The conclusions are somehow falsified nonparametricly from first principles. McIntyre just provided an example of upside down varve use. Karls paper relied on Huang’s 2015 ocean buoy temperature adjustment of 0.12C. Huang did not give a confidence interval. Huang used the method of Kennedy 2011, who also computed 0.12C, BUT plus minus 1.7C! GIGO, another deliberate choice to fudge results. There are many other similar examples in various climate essays in my ebook. Marcott’s mess, Shakun’s mess, Cazenove’s ‘statistical’ explanation for a supposed SLR slowdown, Fabricius OA coral studies, Thomas extinction estimates that became the sole basis for the AR4 estimates, Bebber poleward spread of plant pests, and many more.

  27. Reading the headline, I thought this might be about robust statistics, a favourite topic if mine. In short, robust statistics can not be thrown off by a moderate fraction of wildly corrupted values. But upon further reading, it doesn’t appear that’s really what is addressed here.
    Historically statisticians have depended on least-squares estimation, which is notoriously sensitive to even one bad value. There were many reasons for using least squares (including tractability and the bloody central limit theorem), but with the introduction of modern robust techniques and a better understanding of the nature of real data, these has have fallen by the wayside.
    I don’t know how often robust statistics are used in climatology. It certainly appear to be a good candidate for it.

  28. Improve statistical tests 500% for significance
    As rbgduke cites above “p happens”. Because of the frequent lack of reproducible significance in papers, mathematician Valen Johnson calls for 5 time more stringent statistics for results to be significant or highly significant in PNAS:

    To correct this problem, evidence thresholds required for the declaration of a significant finding should be increased to 25–50:1, and to 100–200:1 for the declaration of a highly significant finding. In terms of classical hypothesis tests, these evidence standards mandate the conduct of tests at the 0.005 or 0.001 level of significance.

    PNAS vol. 110 no. 48 Valen E. Johnson, 19313–19317, doi: 10.1073/pnas.1313476110

  29. In light of recent Supreme Court responses, I have to comment on the reference in Tom’s writing to the poorly done gay opinion research. While I don’t argue with the need for surveys to assess public opinion and to do them with at least a modicum of excellence, the results of such a study, had it been done right, or the retraction of such a study, cannot be applied in a constitutional case. Such a decision hinges on the constitution, not solely on science, as the deciding factor.
    It is clear in this modern age, that all “men” must be interpreted to be “all humanity” and as such, are equal and endowed with unalienable rights specified by our Declaration of Independence as well as confirmed in our Constitution. It does not matter the opinions of one person or billions of people regarding the rights of consenting adults to pursue happiness in equal measure just like anybody else. Yes, each one of us has the right to not like it. We each have the right to not care. We each even have the right to say we don’t like it, or don’t care. But we don’t have the right to deny such pursuit to one or more persons, even to billions of people. That includes the right to buy a fricken cake in a store that declares itself open to the public. Any other decision would harken back to the days when as a woman I could be denied the right to walk into a public clubhouse, tavern, or other such business, denied work, or denied the right to solely own property, the list goes on, simply because I was considered to be unequal and subservient to men. Yes, I mean subservient. In 1901 my greatgrandmother had to resign as principal of the Lostine High School. Why? She got married and her place was in the home serving her husband. Married women were banned from a teaching career the moment they said “I do”.
    Not to put too fine a point on this, would I marry a woman? No. Not my cup of tea. But I would take up arms to protect my fellow gender’s individual freedoms specified in our founding documents.

    • Ah, well, the ‘pursuit of happiness’ is about as poetic as Kennedy’s fare when normally considered. Consider: Ted Bundy was free to pursue his means of getting his jollies off by murdering women. That is, the government did not ban his existence or incarcerate him in advance and on account of his chromosomal makeup. But being free to pursue his happiness — being free of prior restraint — doesn’t mean that we are required to let him go unpunished for what things he’s gotten himself up to.
      But if you want to get into historical points: ‘all men’ always meant ‘all humanity,’ this has always been well understood. And while unfortunate, the 14th Amendment was never intended to apply to ‘real’ biological differences. When it was ratified it was explicitly and expressly not meant to equalize law between men and women. Only to prohibit differences in law based on politically constructed designators — such as ‘black,’ ‘white,’ or ‘asian.’
      But with respect to homosexual marriage, the government does prevent you from pursuing your happiness by shacking up and getting horizontal with whoever you like. But it does install a difference in law on the basis of whether you’re married or not. Which, should be noted, was also expressly denoted as a violation of the 14th Amendment. The solution here isn’t for government to acknowledge more and different kinds of marriage so that more and different kinds of religiously or sensually motivated couplings can get preferential treatment under law — it is to ask the government to start obeying and enforcing the law as it has been written for around 150 years.
      Don’t get wrapped up too deeply in the ever shifting political climate.

    • Do you know all 57 genders currently available for selection; probably will be used in the next census ??
      I’m not sure if Hermaphrodite is even one of the 57.

  30. In my experience, the problem has been that too many people underestimate the amount of random variability, or don’t believe in it at all. In large numbers, those people will have overconfidence in the power of the their experiments, and overconfidence in their results, no matter what disciplines of statistics they employ. The documented problems with inability to replicate research results will continue. All these issues have been publicly debated in statistics and in psychology for at least 50 years.

  31. “Journal editors attempt to judge which papers will have the greatest impact and interest and consequently those with the most surprising, controversial, or novel results,” Reinhart points out. “This is a recipe for truth inflation”
    For climate papers it must work opposite.You find hardly any controversial results, but instead a lot of papers confirming the same old story seen from different angles.

  32. In engineering design we use tools to perform simulations to help us design things, and to determine if the design will do what is desired. Generally, these tools work pretty well, although it is common to end up “at the bleeding edge” where results are not so consistent.
    Without regard to that, one way to help insure the outcome is to never use parts that have a tolerance larger than 10% of the component value. Designs with this approach tend to be well behaved over time/temp/build.
    But as I’ve said before, the model output is no guarantee of the real world performance. A very good engineer will almost always require two “spins” of the Real World design to “get it right”… and then it will still require some tweaks in production.
    When I see anything where the Tolerance (possible error) is more than 10% of the Value, then I Know that the output will be shaky or outright crap.
    If we add the possibility of measurement error and noise in the measurement (noise is often not evenly distributed, therefore unpredictable) that is also on the same order as the desired measurement… then all is lost.
    Anyone who has worked in design or a production environment can tell you these things. The only truth is what you can measure. And a good production engineer can shake you confidence in your data in ways you cannot imagine.

    • If you design a zoom lens for your Canon or Nikon SLR camera, and you use 10% tolerances on variable values, you won’t even end up with a good Coke bottle.
      But then again, I have designed amplifiers with 20% tolerance components or even a range of 10:1 on some parameters (forward gain) but then desired operation is restored with just two 1% or even 0.01% tolerance components.
      As they say; it all depends.

  33. “Should Mr. Siegfreid read this, I’ll point out that many climate skeptics became climate skeptics once we started examining some of the shoddy statistical methods that were used, or outright invented, in climate science papers.”
    One of the many reasons I no longer subscribe to “Science News”. It’s not about science any more. It’s about editorials and “global warming” and sensationalism. And the writing has been dumbed down. It’s a real shame.

  34. Given the birthday paradox, the probably of two independent variables both being outside the bog-standard 95% confidence is 1-e^(-n(n-1)/2*possibilities), where possibilities is 20 and number of variables n = 2, that’s 4.9%. With 5 variables that’s a 40% chance, and with 10 variables it’s a 90% chance that two variables are outside the confidence interval. I haven’t calculated it for “two or more” yet, but I should.
    Too bad there are two threads on this. not sure where to post this.
    If you reverse this and you want 95% confidence that no two independent variables are outside their respective 95% confidence interval, for n=5 variables you need p=0.008 for each variable and for n=10 you need p=0.001 for each variable. (found via goal-seek in Excel).
    This analysis should be distribution independent but IANNT (I Am Not Nicholas Taleb)
    Since most measurement error bars are posted at 95% confidence (2-sigma), then this applies to real world measurements. If I combine those measurements into a model I’ll get increasingly likelyhood (quickly!) of GIGO as I add measurements to the model. It should also apply to multiple ANOVA or any model that involves multiple variables that involve some sort of distribution of those variables.
    Feel free to smash away at my bad assumptions and math. If you really need help programming the simple equation into Excel I’ll post it to dropbox on request…

  35. “Rather than furthering scientific knowledge, null hypothesis testing virtually guarantees frequent faulty conclusions.”
    Yup. so much for the natural variability null.
    Science is more about understanding and less about null testing than people think.
    especially in the observational sciences.
    The other odd thing is that a while back folks clamored for more statisticians in climate science

  36. Note that generalizing from papers about psychology to other fields is an unwarrented assumption

    • Except that it isn’t. Why? Because unknown variables are replete in both areas of research: psychology and climate. There is a mathematical construct you are missing Steven. A high number of variables will more likely create an actual false positive (rejects the null) than it will a false negative (erroneously accept the null), irregardless of statistical choices or machinations. Climate scientists fail to tread lightly through their fragile data and instead stomp all over it as if it is hardy and robust against error.

      • “Except that it isn’t. Why? Because unknown variables are replete in both areas of research: psychology and climate. ”
        Assumes the same effect in both fields.

      • Pamela my dear, you are indeed an exotic creature. On the one hand, you consistently post highly intelligent comments, which I almost always enjoy, and on occasion — despite myself — actually learn something new. And then, as if to toy with us, you deploy a word like “irregardless.”
        It’s the little things that bring color to an otherwise Gray world.

  37. I prefer science than has practical applications … electrical, mechanical and chemical engineers don’t need statistics … [they] don’t get paid to have something work 95% of the time …

    • never designed a chip.
      Do U know why Intel first did speed binning? what was the design speed of the first processor to be speed binned?
      or better what its the intial defect density of a chip design?
      how does chip yeild ( the % that actually work) increase over the production life span

      • LOL! Even though we often differ in our opinions, your breadth of knowledge is impressive.

      • Well binning for any variable is a way to maximize the economic return.
        If a memory chip on a wafer may have a 3:1 spread in speed over the wafer for various process variable reasons, then by binning them for speed, you can sell all the functioning units to somebody for some price.
        You can get mucho bucks for the 5GHz ones, and the 1.5 GHz ones are fast enough for people to use to write Microsoft word documents.
        I once asked a semiconductor salesman what was the cheapest semiconductor device he ever took an order for.
        He said he sold a bunch of silicon diodes to a crystal set kit manufacturer for one cent each. Typical fast switching signal diodes were going for maybe $1.20 in 10,000 quantity.
        So I asked him, what was the spec.
        His reply: NO opens. NO shorts
        He could have sold the guy AB resistors ( at a substantial loss) and they wouldn’t even work.

      • ” or better what its the intial defect density of a chip design?
        how does chip yeild ( the % that actually work) increase over the production life span”
        Defects are proportional to the area of a chip and I suppose the % of that area that has active area, and if you fab is consistent the only increase you get is when you decrease the die area, and binning was just a way to sell more die, nothing was wrong with them but process tolerance was squishy and some (or a lot) were just slow, our tolerance was about 20% iirc, it was almost 35 years ago.
        But all of the dead die were almost all photo resist flaws, rarely did I see a wafer that was missing a layer.

  38. No reference to the failure of reproducibility in the bogus sciences is complete without reference to the sustained generation of fake results and papers by the con-man social psychologist Diederik Stapel.
    Since his work was highly referenced and he found many imitators during his years of “success”, the discovery that his work was almost entirely faked must surely cast serious doubts upon the robustness of related work by other so-called “researchers”.
    It is especially interesting to note that people were willing to accept and report his results because he was fishing for results that supported tragically simplistic progressive assumptions.
    As with Lewandowsky, his secret was to tell the political left exactly what it wanted to hear.
    It turns out that apparently skeptical people will cease to take interest in the trustworthiness of a “scientific result” if it fits into what they wanted to believe about the world.

  39. “Robust statistics seeks to provide methods that emulate popular statistical methods, but which are not unduly affected by outliers or other small departures from model assumptions.” Oh dear, shurely some mishtake. Ignore those pesky outliers in favour of a pre-conceived closed-form presumption.
    Multi-modal assessment of reality. Whether through the nature of things (aleatory) or our lack of knowledge of things (epistemic), the evidence is that we can most reasonably describe the world as multi-modal – many possible outcomes. Not at all the familiar view of uncertainty as a variability around a mean outcome.
    Colin Powell said it:
    “Tell me what you know. Tell me what you don’t know. Then tell me what you think. Always distinguish which is which.”
    from “It Worked for Me” (http://www.amazon.co.uk/It-Worked-Me-Life-Leadership/dp/0062135139)

  40. As Lubos Motl so nicely put it in http://motls.blogspot.com/2010/03/tamino-5-sigma-and-frame-dragging.html :

    In a serious discipline, 3 sigma is just a vague hint, not real evidence.

    That’s why until I understand , ie : have the algorithms running , to calculate , ie : quantitatively explain , our 3% excess over the gray body temperature in our orbit , to the accuracy of our measures of the spectrum and distance of the sun and our spectral map as seen from outside , ie : ToA spectral map , I don’t see 5-sigma , or even the business world’s 6-sigma any constraint .
    James Hansen’s quantitative howler that Venus is explained as a runaway greenhouse effect is way outside those bounds yet not universally repudiated . So , until the field ( not just the best here ) understands the non-optionality of those calculations of radiative balance , I’m far more interested in getting the stack frames vocabulary in my 4th.CoSy solid and defining the recursive operators required to express these computations succinctly .

  41. My 2013 comment to WUWT is germane to this argument. I will add another quote from the article which should be mandatory reading for anyone delving into statistics:
    “William Feller, Higgins professor of mathematics at Princeton, is in a fighting mood over the abuse of statistics in experimental work.”
    Neil Jordan May 16, 2013 at 1:32 pm
    Re rgbatduke says: May 14, 2013 at 10:20 pm
    Abuse of statistics is also covered in this old article which is unfortunately not on line:
    “A Matter of Opinion – Are life scientists overawed by statistics?”, William Feller, Scientific Research, February 3, 1969.
    [Begin quote (upper case added for emphasis)]
    To illustrate. A biologist friend of mine was planning a series of difficult and laborious observations which would extend over a long time and many generations of flies. He was advised, in order to get “significant” results, that he should not even look at the intervening generations. He was told to adopt a rigid scheme, fixed in advance, not to be altered under any circumstances.
    This scheme would have discarded much relevant material that was likely to crop up in the course of the experiment, not to speak of possible unexpected side results or new developments. In other words, the scheme would have forced him to throw away valuable information – AN ENORMOUS PRICE TO PAY FOR THE FANCIED ADVANTAGE THAT HIS FINAL CONCLUSIONS MIGHT BE SUSTAINED BY SOME MYSTICAL STATISTICAL COURT OF APPEALS.
    [End quote]
    Correction: I was able to locate the article on line at:
    The PDF can be downloaded here:

  42. Penultimate sentence of the post “pre-screeing data” should read “pre-screening data”.

  43. Third attempt to correct my error.
    Should be final sentence not penultimate.
    [Regardless of the final sentence location, “screeing” is changed to “screening” 8<) .mod]

  44. According to IPCC AR5 industrialized mankind’s share of the increase in atmospheric CO2 between 1750 and 2011 is somewhere between 4% and 196%, i.e. IPCC hasn’t got a clue. IPCC “adjusted” the assumptions, estimates and wags until they got the desired mean.

  45. This discussion runs for already more than thirty years. This may suggest that some problems are badly understood, one of them the flaw of the Fisher procedure. Here the null hypothesis postulates zero (e.g. the difference between two means). Zero means exactly 0 and therefore zero is not 0.00000001. The p-value depends on sample size, the greater the sample size, the smaller the p-value. Therefore, in a sufficiently large sample you will get a p-value sufficiently small (highly significant result) if the real difference of two means is 0.00000001. Mathematics tells therefore that for getting a highly significant result you only need money for getting a large sample, making that you can reject a null hypothesis being trivially false from the onset. The best alternative is the likelihood ratio test (also noted here), and especially the rather unknown sequential likelihood ratio test for two intervals. It is a matter of education in statistics to get these to be used in practice.
    Nevertheless, I think that the biggest problem is not checking the assumptions made in the statistical procedure. What assumption is always there? Random sampling. This means that the conclusion from a statistical test is two-fold: the postulated value is not true and/or the sample is not random. But you never read the latter which may point at a dirty secret. So the conclusion that temperatures increased ‘ significantly’ over a certain period, means that the amount of data is sufficiently large in order to get the predicate ‘significant’ and (1) the slope of linear regression is not exactly zero, and/or (2) the data may not be a random sample from a well defined population (i.e. the data were selected, filtered, tortured, dependent, etc.).

  46. Just in case anyone is getting a faulty “take home” message here: The problem isn’t that statistics doesn’t work, it is that it has been used wrongly.
    Let’s say you have a theory, you make just one prediction, and it pans out at the 95% confidence level. There is only a 5% chance that you are mistaken. That is a real chance, though! The world would be foolish to wreck economies, starve the poor, destroy wilderness with bird and bat killing windfarms, etc, just on your 95% chance of being right. If the experiment can be repeated independently, and again pans out, you now have a 99.75% chance of being right – and it gets better the more you repeat the experiment independently and successfully.
    But most of these “95%” results suffer from two afflictions (either one is enough to wreck the result, but most announcements I have seen suffer from both flaws):
    1) There wasn’t just one single prediction being tested. They went out hunting for a significant result. Try just 14 tests where there is no connection between the supposed cause and the effect, and you have a greater than 50% chance of finding at least one 95% significant correlation – where no correlation exists. This is the basis of the extremely funny cartoon urederra posted above.
    2) Even when just one posited connection is being tested, it is inherently irreproducible. Typically this happens in historical surveys – people who ate X also got cancer – that sort of thing. We can’t find another planet and check the correlation there as a second check – we used our data once and for all in the original survey.
    It isn’t hard to see that climate “science” (can’t stop laughing!) is riddled with these errors. Good statisticians understand them – bad scientists ignore them – bad editors and journal publishers overlook them – because personal advancement is the goal, not science, not truth, not the welfare of other people and wildlife.

    • Ron House:
      The problem is, the models (the religion of catastrology predictions over the past 18-1/2 years) have had only a 8% success record of being right!
      After 1/6 of the prediction range in time, only 2 models in 23 have been even close to the real world!

    • So if your “prediction” has just two possibilities; either it happens , or it doesn’t happen.
      Well you may have to wait to the end of time, to confirm that your prediction didn’t happen, whereas, it might happen in the next atto-second confirming your prediction.
      So presumably to be a sensible postulate to test, you must have predicted the event within some finite time window; otherwise your conjecture is just nonsense.
      So ok, the time window comes and passes. Your conjecture is now moot, the window has elapsed.
      And either the event happened or it didn’t. No other possibilities.
      So now; in either case, what can you say about your statistics after the window has elapsed, and one of those .two possibilities eventuated.
      I contend that the experiment neither proves, nor disproves the validity of your 95% confidence level. It has told you nothing.
      Statistics tells you nothing about an event that only happens (or doesn’t) once.
      Now if your conjecture is that in the next succeeding 100 time windows some event will happen in each one, with a 95% confidence level. Well of course after those 100 intervals have come and gone, maybe you got 97 hits and 3 misses.
      Now maybe it is meaningful to say that your statistical analysis was valid.
      But clearly, the prediction of the happening or non happening of just a single event is sheer balderdash.
      Well that’s my opinion.
      If you buy just one lottery ticket; either you win or you don’t. the statistics is irrelevant. It is still really just a choice between two cases. A win, and a non win.
      Buy one ticket in a million different lotteries and the vast majority of them will lose.

  47. Steven Mosher July 12, 2015 at 4:34 pm

    “Damaging the lives of billions between ow and 2100, killing millions each year for 85 years.
    For nothing. To prevent no harm to no people, but to promulgate harm on the all.
    Just so you (those who support the CAGW theories) can “feel good” about your religion of Gaia and death.”

    You think that is settled science?
    Saying the Cost is high and that millions will die is just a form of economic catastrophism.
    I suppose you have an econ MODEL to back up that claim… wa.
    A proper skepticism would note that supposed damage from climate change is dependent on models
    Supposed damage from cutting c02 is based on models.

    Steven, the damage from artificially raising energy prices and from the World Bank denying loans from coal plants is already happening. We can go out and measure it. Your claim that the damage on the two sides is equally dependent on models ignores current, measurable, non-modeled real-world suffering happening right now.
    This is the amazing thing to me, Mosh—the people willing to injure the poor today in hopes of imagined future cooling of a tenth of a degree or so in fifty years either claim to have the high moral ground or, like you, they claim the two sides are equal. They are not equal. One is present harm, the other is pie-in-the-sky promises of climate valhalla. Easy choice for me—if you want to fight CO2 go ahead, but doing it on the backs of the poor by taking any action that drives up energy prices is reprehensible.

    • It is touching how much concern the sceptics have for the poor.
      Actions taken to curb the CO2 emissions will undoubtedly lead to somewhat lower GDP growth. This does not mean that the effect of the measures will cause that we on average will be poorer than today. The effect is that we in the coming years, on average, will be somewhat less wealthy than we could have become, if we had done nothing to curb CO2.
      However, how this GDP growth is distributed is another topic. Somewhat lower GDP growth does not necessarily have to hurt the poor.

      • But it will hurt them Jan, the poor, no way around it. Bet you are a well off ba….rd and you will also have no problem justifing it in the end. Try living on less than $9000/yr and helping your kids/grandkids, some live on even less, I have for the last eight years, see if utilities matter to you then, when it means less food, worst food, no medical, no travel, less heat, less cool when needed. Be a real man, try it out yourself, see if you feel the same after a few years. Right now you are no different than what I wrote of below.

      • Wayne,
        You can use the same argument for all kinds of emission controls. For instance, when we are curbing SO2 that causes acid rain, it also raises the cost of electricity. Should we let the Power plants send out the SO2 in the atmosphere so the electricity could be cheaper?
        In addition, what about emission controls in cars. The catalysts used in car exhaust hurt the poor by making the cars more expensive. Should we also abandon catalysts in solidarity with the poor?
        Moreover, what about sewage cleaning? That is also expensive.
        All these emission controls make us a little less wealthy, but most of us think it is worth the money. I think there are other ways to compensate the poor so they are not hit by the expensive emission controls for CO2.

        • Jan Kjetil Andersen,
          But we don’t exhale SO2, nor is it a basic requirement of life for most life on earth.
          Plus all of the emission equipment on cars are designed to reduce gasoline to mostly water and Co2.

      • Micro says:

        But we don’t exhale SO2, nor is it a basic requirement of life for most life on earth.

        I don’t think that is a very good argument Micro.
        That a compound comes from humans and is essential for life does not mean that it cannot be harmful in elevated quantities.
        After all, sewage comes from humans and contain nutrients that are essential for plant life, and that is harmful in elevated quantities isn’t it?

      • Jan Kjetil Andersen: It is touching how much concern the sceptics have for the poor.

        Everybody says that sarcastically; yet the fact remains that analysis of energy benefits and coal costs shows that restrictions on coal harm the wealth and health prospects of the poor. Analysis of restrictions on mercury and sulfates shows that, after the wealth benefits of burning coal have been achieved, additional health benefits can be achieved by restricting mercury and sulfates. No wealth and health benefits for restrictions on CO2 in under 50 years have been demonstrated, and longer term benefits are totally conjectural.
        So you want to mock people who point out that restrictions on CO2 harm the poor. How exactly does your mockery help the poor you claim to care about?

        • matthewrmarler
          Your claim is dead wrong.
          The statement can be made of course, but it is based on false assumption DESIGNED SPECIFICALLY to create the false conclusion you just repeated.
          Federal hype, exaggeration and spin created (invented) to justify their new regulations BY the bureaucrats (and their supporters in the media and other bureaucracies such as yourself) who want to implement their new regulations and enrich their power and budgets and influence, regardless of cost nor benefits. Not scientific values based on real world medical and economic sense.

      • Jan Kjetil Andersen:
        If the action taken to reduce CO2 results in an undetectably small mitigation of climate change are the costs still worth it?
        Is there a proper pace of CO2 reductions today that can be shown to produce (at least) offsetting benefits in the future? At what point do the benefit/cost curves cross?
        As long as we are talking about hypothetical harms that are contingent upon the precise manner and timing of implementation (for both CO2 emissions and regulations intended to reduce them), should you also consider that future technologies may produce better results with less economic loss?
        In light of the questions above, have we truly answered the question “Can we afford to wait?”

      • Opluso asks:

        … should you also consider that future technologies may produce better results with less economic loss?
        In light of the questions above, have we truly answered the question “Can we afford to wait?”

        The next question is then how long do we have to wait on those future technologies?
        The answer on this last question will probably depend on whether we recognize reduced CO2 emissions as desirable, and if so, how much are we willing to pay for it.
        If we don’t want to pay anything at all, there will be no incentive to develop these future technologies and then we may have to wait a very long time.
        On the other hand, if we adopt a binding target to reduce the emissions by some quantity in let’s say 2030, there will be incentives to develop these technologies. Therefore I think it is time to act now by adopting binding targets.

        • If we don’t want to pay anything at all, there will be no incentive to develop these future technologies and then we may have to wait a very long time.
          On the other hand, if we adopt a binding target to reduce the emissions by some quantity in let’s say 2030, there will be incentives to develop these technologies. Therefore I think it is time to act now by adopting binding targets.

          I think we have 5,10 or 20 years to better understand if extra Co2 is a problem, we also built out 100’s of nuclear power plants, and we’ve been funding fusion for 50 years.
          I don’t think Solar and Wind will ever support a modern world, but more time will give time to develop better wind and solar, at least for right now there’s no evidence doing something is critical, and if it was, we should build another 500-1000 nuclear power plants.
          I think when we see the environmentalists protesting to build nuclear, then they are truly worried about Co2, until then they are protesting modern society, and I’d love to see them going back to human labor farming.

      • JKA:

        On the other hand, if we adopt a binding target to reduce the emissions by some quantity in let’s say 2030, there will be incentives to develop these technologies. Therefore I think it is time to act now by adopting binding targets.

        Incentives need not be coupled to “binding targets”. In fact, technologies that improve existing systems are likely to pay for themselves (as many already do).
        Binding targets are little more than a political fetish at this point.

    • Think that’s the best words you have ever spoken Willis. Good proper words that needed to be said to Mosher.
      Just read recently that Jacques Cousteau estimates that killing 325 million a year is needed. Can you believe that? These people make me shudder. Right… to preserve for our children as they have no problem even thinking of killing countless millions without even a blink? Insane, evil, all of them.

      • That’s around 20 times the killing rate of WWII – which is generally considered to have been a bad thing.

      • Wayne,
        What do you mean by “all of them”? Do you think all non-sceptics are equal?
        I agree that the oceanographer Jacques Cousteau had some very silly ideas about population control.
        To give him some justice, the full quote as said on an interview by “UN Unesco” in 1991 was:

        . . . Should we eliminate suffering, diseases? The idea is beautiful, but perhaps not a benefit for the long term. We should not allow our dread of diseases to endanger the future of our species. This is a terrible thing to say. In order to stabilize world population, we must eliminate 350,000 people per day. It is a horrible thing to say, but it is just as bad not to say it.

        It seems like he thought that the only way to stop population growth is to increase mortality. Fortunately this is wrong.
        The only viable way to stop population growth is to reduce fertility, and that comes as a natural result of lifting the world’s most backward nations up to a level with more education – especially for females, less mortality and better health. Not the opposite.

      • Jacques Cousteau wants that 325 million to be all of a certain sort, no doubt. Eugenics had the same presumptuous arrogance attached to it and we witnessed the murderous end-game predicted by those vehemently opposed to that vile creed.

    • I think Ayn Rand’s best book was her first We the Living . One of the most salient commonalities between traditional economic maxism and the eKo-fascism we face is the sacrifice of the living for supposed utopian future .
      But the marxists at least claimed their centrally enforced privation was for greater future productivity and quality of life . The eKo-fascists offer no such vision . Theirs is anti-life from molecule they demonize to the number remaining alive .

  48. Jan,
    It goes without saying that the people who spend the largest percentage of their income on energy are those who have the lowest incomes. So if you impose a system of mandatory indulgences in order to artificially increase the price of otherwise cheap energy, the most impact will be felt by the poor.
    China lifted millions upon millions of its people out of poverty at a historically unprecedented rate over the last 2 decades, due mostly to actions designed to increase CO2 emissions.
    Where do you suppose they would be today if they had adopted the opposite policy?

    • Khwarizmi,
      the development in China over the last 2-3 decades is highly welcome.
      Admittedly, they have increased the local pollution levels and the CO2 emissions, but the good they have achieved by lifting so many people out of poverty is immensely more important. All these new more well off people now contribute to lifting the economy in the rest of the world.
      I think the increased pollution and CO2 is a temporary situation. When the Chinese get richer they will not accept to breathe in unhealthy smog, and the CO2 emissions there are already starting to level out.
      The best we can hope for is that India and other poor nations can achieve a similar development. That will also for a time period cause more pollution and more CO2 until they eventually are wealthy enough to give priority to the environment.
      But I think the richest countries in the world, like the US, most parts of Europe, Japan and Australia can afford to both curb pollutions and CO2 emissions now and that it should be implemented in a way that spare the poor.

      • Jan – 13/7 at 1036
        “But I think the richest countries in the world, like the US, most parts of Europe, Japan and Australia can afford to both curb pollutions and CO2 emissions now and that it should be implemented in a way that spare the poor.”
        You are mixing things up that should not be mixed. In the richer countries there has been a massive reduction in pollution – remember the projections of New York being knee deep in horse manure if the population kept on growing in Manhattan. But streetcars were invented and then motor cars – Lo and Behold, there is no horse manure. Remember London and the abolition of “smog”.
        But CO2 is not a pollutant, and there is no reason to curb its production. By all means try to produce it more efficiently – it is a good fertilizer, and it is always worth while to reduce the resource costs of creating what is perhaps the “Universal” fertilizer.
        CO2 emissions and “pollutions” should not normally be linked by “and”.

      • Dudley
        Many harmless substances are considered as pollutants when they are found out of place or in excess quantities.
        Horse manure is one of them; that is also a good fertilizer, still it is considered as a pollutant if you have too much of it.
        I think that also CO2 can be rightfully considered as a pollutant.
        It comes as a byproduct by the same processes that produce ordinary pollutants like CO, SO2, NOX and particulates, and it is deadly in very high concentrations.
        How harmful or harmless it in lower concentrations is a big question. It is no less than what this blog and many other are about, so I don’t think we can finish a discussion about that here, but I do not think that we can conclude with certainty that it is only harmless.

  49. The problem with science is that there is science that prove things (the good science) and science that does not prove things (the bad science). Climate change is all about politics relying on science that does not prove things.

    • Proof?
      That is for mathematics and liquor.
      The best we can aim for in empirical science is falsification
      As Karl Popper has said: “A theory in the empirical sciences can never be proven, but it can be falsified, meaning that it can and should be scrutinized by decisive experiments.”

      • First , it is clear that no amount of falsifying evidence can penetrate the warmist skull .
        But more to the point , the precise computations upon which our world runs , including the predictions of when dawn will occur tomorrow at any location on the globe are believed with existential certainty because they have survived centuries of potential falsification . They have been winnowed ; they have been “scrutinized by decisive experiments” and have survived .

      • Bob
        If you take a class in science philosophy you will hear that many students protests with arguments similar to your when they hear this for the first time. The professors have a hard time convincing the students that Popper, Hempel and other gurus in science philosophy was not out of their mind.
        However, given time for reflection most of them realize that the science philosophers are right after all. In a strict logical sense, you can never find the evidence that gives the 100% absolute proof. There will always be a chance that you overlooked something and that new evidence will turn up.
        Repetition increase the confidence in the result, but it is not proof.
        If 1000 independent researchers have confirmed the result, you have a high confidence, but in a strict sense, it is not a proof. Nor will it be when the 2000nd or 3000nd researcher also confirms it.

        • Among the handful of professors from whom I learned the most , I feel Don Campbell edges out the others . He gave me my first job doing APL ( calculating statistics of discontinuities in time series ) and also funded the writing up of what would have been my PhD thesis after I had lost my tenure in grad school . It was largely thru him that I learned of his friend Popper and their similar thoughts on what Campbell coined evolutionary epistemology .
          My overall response to your pedantic point is ” so what ? ” .
          Even Popper could not argue that Newton’s equations were falsified by Einstein , simply that they were shown to be just a limiting case of a more general insight . And both Popper and Campbell would agree that Newton’s quantitative ( mathematical ) derivations of orbital motions from strikingly simple fundamental relationships are profound and precise over an astoundingly large domain and have survived centuries of potential falsification over that domain , yet have not been .
          And if you want to bet me that orbital mechanics will be falsified by the sun failing to come up tomorrow , I’ll be happy to give you very very long odds .
          Michael Mann objects to math because it falsifies his claim to infame .

  50. Nuts.
    First “results published in psychology journals ” means that this isn’t about science, it’s about world shaking theories derived from comparisons between the two halves of a sample of three.
    Second, the penulitmate author here is railing against the mis-use of p-values. Duh. I believe the world champion exponent on this is Dr. Briggs (see http://wmbriggs.com/ ). Any of his articles on the subject will make the same point -but do so without significant (!) reliance on social “science” examples.

  51. Oh well, Gigerenzer.
    Made a lot of money on Heuristics – what a lucky man he was.
    Now the small people of 10 mill greeks blasted world economie – according to 80 mill germans incl. Gigerenzer.
    Heuristics – find scapegoats.
    SPIEGEL: Wie testen Sie denn ob eine Heuristik tatsächlich taugt?
    Gigerenzer: Auch da ein Beispiel:
    Eine der einfachsten Faustregeln
    ist die sogenannte
    Rekognitionsheuristik. Sie beruht
    auf dem Prinzip: Den Namen, die
    man kennt, vertraut man eher.
    Wir haben das im Aktienmarkt
    getestet. Dazu haben wir
    Passanten genommen, in
    München und in Chicago. Diese
    Leute haben wir Aktienpakete
    schnüren lassen, nur nach dem
    Kriterium, welche Firmennamen
    sie schon gehört hatten. Und
    siehe da: Im Schnitt hatte das
    Rekognitionsportfolio basierend
    auf der bloßen
    Namenserkennung von
    halbignoranten Menschen mehr
    Geld gemacht als professionell
    gemanagte Fonds. Ich habe nie
    in meinem Leben so viel Geld
    Objective, neutral, science? Please correct me where I’m wrong. Hans

  52. “Damaging the lives of billions between now and 2100, killing millions each year for 85 years.
    For nothing. To prevent no harm to no people, but to promulgate harm on the all.
    Just so you (those who support the CAGW theories) can “feel good” about your religion of Gaia and death.”
    Oh come on, that’s just the old “Believe in God because if you’re wrong you’ll go to hell””crap re-cycled. Of course since this is a religion we’re discussing rather than science, maybe that’s appropriate, but still ridiculous.
    Using that logic all the AGW-ers should go with the Ice Age scenario because if they’re wrong the results would be horrendous.
    How do you choose between the alternatives if we give up the scientific approach?

  53. End of the world doomsters (biologists like Ehrlich, economists like Malthus, climatologists saving the planet, etc) have always been wrong. This is because they leave out the enormous and unfailing role of human ingenuity at problem solving from their simplistic, linear and two dimensional view of the world (someone said ‘petri dish’ world). Biologists study the habits and ecology of animals and plants, count and analyze droppings, etc. but such study, although important and useful gives them no expertise or insight whatsoever into what the future will bring. Economists, like meteorologists have some short term success in forecasting and the world that has unfolded constantly takes them by surprise. Consensus climatologists are the worst because they have espoused a theory that anchors them and spent most of their time fending off challenges to it so that we basically have nothing new in 35 years of intensive study with a budget thousands of times larger than that of the Manhattan project. All the world’s most pressing problems would have been solved if this money had gone toward such tasks.
    We didn’t get buried in Malthus’s horse manure – hey, the poor horse suddenly all but disappeared with the discovery of petroleum and IC engines. Mass starvation, paucity of resources didn’t end civilization as we know it. Human ingenuity has all but wiped out famine, disease and delivered raw materials in abundance. We didn’t freeze to death in the dark by 2000 and we won’t burn up in 2100.
    This and countless other doom scenarios NEVER came to pass. I don’t think it bold to say it is AXIOMATIC THAT EXTREME PREDICTIONS OF HUMAN CAUSED DOOM CANNOT COME TRUE because overpowering dynamic ingenuity is absent as a force in their thinking. Unconstrained by this first order principal component, their thoughts (and heartfelt concerns) soar through the roof of reality.
    Let me add two more items to the ‘ten things’ to save science from its statistical self.
    1) pass the forecast through the filter of the Axiom above. The forecast should not be one of doom. Nature is the keeper of doom scenarios and human ingenuity will even be able to deal with some of these.
    2) explicitly invoke the Le Chatelier principle first. Le Châtelier’s principle states that if a dynamic equilibrium is disturbed by changing the conditions, the position of equilibrium shifts to counteract the change (in part). Wiki gives this definition: “Any change in status quo prompts an opposing reaction in the responding system.”
    This is pure “governor”-like effect a la Willis Eschenbach. It is even a predictor of Newton’s Laws of Motion, market behavior, and a broad range of things which include initiation of human ingenuity. To use it, take the Doom prediction, cut it in half because of human nature’s propensity for exaggeration and emotional inertia when they want to make a point. Then cut it at least in half again to account for the omitted le Chatelier effect. Finally, if anything sticks up that needs attention, human ingenuity will grind it down to a small bump.

  54. Cook et al.’s dodgy stats are certainly what drove me to “It appears that more than 97% of climate scientists use stats incorrectly.

  55. Hi,
    Take a look at that coolest stuff ever! You’ve never seen something like that I swear! Here, check this out

    All best, gerjaison

Comments are closed.