The “ensemble” of models is completely meaningless, statistically

This  comment from rgbatduke, who is Robert G. Brown at the Duke University Physics Department on the No significant warming for 17 years 4 months thread. It has gained quite a bit of attention because it speaks clearly to truth. So that all readers can benefit, I’m elevating it to a full post

rgbatduke says:

June 13, 2013 at 7:20 am

Saying that we need to wait for a certain interval in order to conclude that “the models are wrong” is dangerous and incorrect for two reasons. First — and this is a point that is stunningly ignored — there are a lot of different models out there, all supposedly built on top of physics, and yet no two of them give anywhere near the same results!

This is reflected in the graphs Monckton publishes above, where the AR5 trend line is the average over all of these models and in spite of the number of contributors the variance of the models is huge. It is also clearly evident if one publishes a “spaghetti graph” of the individual model projections (as Roy Spencer recently did in another thread) — it looks like the frayed end of a rope, not like a coherent spread around some physics supported result.

Note the implicit swindle in this graph — by forming a mean and standard deviation over model projections and then using the mean as a “most likely” projection and the variance as representative of the range of the error, one is treating the differences between the models as if they are uncorrelated random variates causing >deviation around a true mean!.

Say what?

This is such a horrendous abuse of statistics that it is difficult to know how to begin to address it. One simply wishes to bitch-slap whoever it was that assembled the graph and ensure that they never work or publish in the field of science or statistics ever again. One cannot generate an ensemble of independent and identically distributed models that have different code. One might, possibly, generate a single model that generates an ensemble of predictions by using uniform deviates (random numbers) to seed

“noise” (representing uncertainty) in the inputs.

What I’m trying to say is that the variance and mean of the “ensemble” of models is completely meaningless, statistically because the inputs do not possess the most basic properties required for a meaningful interpretation. They are not independent, their differences are not based on a random distribution of errors, there is no reason whatsoever to believe that the errors or differences are unbiased (given that the only way humans can generate unbiased anything is through the use of e.g. dice or other objectively random instruments).

So why buy into this nonsense by doing linear fits to a function — global temperature — that has never in its entire history been linear, although of course it has always been approximately smooth so one can always do a Taylor series expansion in some sufficiently small interval and get a linear term that — by the nature of Taylor series fits to nonlinear functions — is guaranteed to fail if extrapolated as higher order nonlinear terms kick in and ultimately dominate? Why even pay lip service to the notion that R^2 or p for a linear fit, or for a Kolmogorov-Smirnov comparison of the real temperature record and the extrapolated model prediction, has some meaning? It has none.

Let me repeat this. It has no meaning! It is indefensible within the theory and practice of statistical analysis. You might as well use a ouija board as the basis of claims about the future climate history as the ensemble average of different computational physical models that do not differ by truly random variations and are subject to all sorts of omitted variable, selected variable, implementation, and initialization bias. The board might give you the right answer, might not, but good luck justifying the answer it gives on some sort of rational basis.

Let’s invert this process and actually apply statistical analysis to the distribution of model results Re: the claim that they all correctly implement well-known physics. For example, if I attempt to do an a priori computation of the quantum structure of, say, a carbon atom, I might begin by solving a single electron model, treating the electron-electron interaction using the probability distribution from the single electron model to generate a spherically symmetric “density” of electrons around the nucleus, and then performing a self-consistent field theory iteration (resolving the single electron model for the new potential) until it converges. (This is known as the Hartree approximation.)

Somebody else could say “Wait, this ignore the Pauli exclusion principle” and the requirement that the electron wavefunction be fully antisymmetric. One could then make the (still single electron) model more complicated and construct a Slater determinant to use as a fully antisymmetric representation of the electron wavefunctions, generate the density, perform the self-consistent field computation to convergence. (This is Hartree-Fock.)

A third party could then note that this still underestimates what is called the “correlation energy” of the system, because treating the electron cloud as a continuous distribution through when electrons move ignores the fact thatindividual electrons strongly repel and hence do not like to get near one another. Both of the former approaches underestimate the size of the electron hole, and hence they make the atom “too small” and “too tightly bound”. A variety of schema are proposed to overcome this problem — using a semi-empirical local density functional being probably the most successful.

A fourth party might then observe that the Universe is really relativistic, and that by ignoring relativity theory and doing a classical computation we introduce an error into all of the above (although it might be included in the semi-empirical LDF approach heuristically).

In the end, one might well have an “ensemble” of models, all of which are based on physics. In fact, the differences are also based on physics — the physicsomitted from one try to another, or the means used to approximate and try to include physics we cannot include in a first-principles computation (note how I sneaked a semi-empirical note in with the LDF, although one can derive some density functionals from first principles (e.g. Thomas-Fermi approximation), they usually don’t do particularly well because they aren’t valid across the full range of densities observed in actual atoms). Note well, doing the precise computation is not an option. We cannot solve the many body atomic state problem in quantum theory exactly any more than we can solve the many body problem exactly in classical theory or the set of open, nonlinear, coupled, damped, driven chaotic Navier-Stokes equations in a non-inertial reference frame that represent the climate system.

Note well that solving for the exact, fully correlated nonlinear many electron wavefunction of the humble carbon atom — or the far more complex Uranium atom — is trivially simple (in computational terms) compared to the climate problem. We can’t compute either one, but we can come a damn sight closer to consistently approximating the solution to the former compared to the latter.

So, should we take the mean of the ensemble of “physics based” models for the quantum electronic structure of atomic carbon and treat it as the best predictionof carbon’s quantum structure? Only if we are very stupid or insane or want to sell something. If you read what I said carefully (and you may not have — eyes tend to glaze over when one reviews a year or so of graduate quantum theory applied to electronics in a few paragraphs, even though I left out perturbation theory, Feynman diagrams, and ever so much more:-) you will note that I cheated — I run in a semi-empirical method.

Which of these is going to be the winner? LDF, of course. Why? Because theparameters are adjusted to give the best fit to the actual empirical spectrum of Carbon. All of the others are going to underestimate the correlation hole, and their errors will be systematically deviant from the correct spectrum. Their mean will be systematically deviant, and by weighting Hartree (the dumbest reasonable “physics based approach”) the same as LDF in the “ensemble” average, you guarantee that the error in this “mean” will be significant.

Suppose one did not know (as, at one time, we did not know) which of the models gave the best result. Suppose that nobody had actually measured the spectrum of Carbon, so its empirical quantum structure was unknown. Would the ensemble mean be reasonable then? Of course not. I presented the models in the wayphysics itself predicts improvement — adding back details that ought to be important that are omitted in Hartree. One cannot be certain that adding back these details will actually improve things, by the way, because it is always possible that the corrections are not monotonic (and eventually, at higher orders in perturbation theory, they most certainly are not!) Still, nobody would pretend that the average of a theory with an improved theory is “likely” to be better than the improved theory itself, because that would make no sense. Nor would anyone claim that diagrammatic perturbation theory results (for which there is a clear a priori derived justification) are necessarily going to beat semi-heuristic methods like LDF because in fact they often do not.

What one would do in the real world is measure the spectrum of Carbon, compare it to the predictions of the models, and then hand out the ribbons to the winners! Not the other way around. And since none of the winners is going to be exact — indeed, for decades and decades of work, none of the winners was even particularly close to observed/measured spectra in spite of using supercomputers (admittedly, supercomputers that were slower than your cell phone is today) to do the computations — one would then return to the drawing board and code entry console to try to do better.

Can we apply this sort of thoughtful reasoning the spaghetti snarl of GCMs and their highly divergent results? You bet we can! First of all, we could stop pretending that “ensemble” mean and variance have any meaning whatsoever bynot computing them. Why compute a number that has no meaning? Second, we could take the actual climate record from some “epoch starting point” — one that does not matter in the long run, and we’ll have to continue the comparison for the long run because in any short run from any starting point noise of a variety of sorts will obscure systematic errors — and we can just compare reality to the models. We can then sort out the models by putting (say) all but the top five or so into a “failed” bin and stop including them in any sort of analysis or policy decisioning whatsoever unless or until they start to actually agree with reality.

Then real scientists might contemplate sitting down with those five winners and meditate upon what makes them winners — what makes them come out the closest to reality — and see if they could figure out ways of making them work even better. For example, if they are egregiously high and diverging from the empirical data, one might consider adding previously omitted physics, semi-empirical or heuristic corrections, or adjusting input parameters to improve the fit.

Then comes the hard part. Waiting. The climate is not as simple as a Carbon atom. The latter’s spectrum never changes, it is a fixed target. The former is never the same. Either one’s dynamical model is never the same and mirrors the variation of reality or one has to conclude that the problem is unsolved and the implementation of the physics is wrong, however “well-known” that physics is. So one has to wait and see if one’s model, adjusted and improved to better fit the past up to the present, actually has any predictive value.

Worst of all, one cannot easily use statistics to determine when or if one’s predictions are failing, because damn, climate is nonlinear, non-Markovian, chaotic, and is apparently influenced in nontrivial ways by a world-sized bucket of competing, occasionally cancelling, poorly understood factors. Soot. Aerosols. GHGs. Clouds. Ice. Decadal oscillations. Defects spun off from the chaotic process that cause global, persistent changes in atmospheric circulation on a local basis (e.g. blocking highs that sit out on the Atlantic for half a year) that have a huge impact on annual or monthly temperatures and rainfall and so on. Orbital factors. Solar factors. Changes in the composition of the troposphere, the stratosphere, the thermosphere. Volcanoes. Land use changes. Algae blooms.

And somewhere, that damn butterfly. Somebody needs to squash the damn thing, because trying to ensemble average a small sample from a chaotic system is so stupid that I cannot begin to describe it. Everything works just fine as long as you average over an interval short enough that you are bound to a given attractor, oscillating away, things look predictable and then — damn, you change attractors.Everything changes! All the precious parameters you empirically tuned to balance out this and that for the old attractor suddenly require new values to work.

This is why it is actually wrong-headed to acquiesce in the notion that any sort of p-value or Rsquared derived from an AR5 mean has any meaning. It gives up the high ground (even though one is using it for a good purpose, trying to argue that this “ensemble” fails elementary statistical tests. But statistical testing is a shaky enough theory as it is, open to data dredging and horrendous error alike, and that’s when it really is governed by underlying IID processes (see “Green Jelly Beans Cause Acne”). One cannot naively apply a criterion like rejection if p < 0.05, and all that means under the best of circumstances is that the current observations are improbable given the null hypothesis at 19 to 1. People win and lose bets at this level all the time. One time in 20, in fact. We make a lot of bets!

So I would recommend — modestly — that skeptics try very hard not to buy into this and redirect all such discussions to questions such as why the models are in such terrible disagreement with each other, even when applied to identical toy problems that are far simpler than the actual Earth, and why we aren’t using empirical evidence (as it accumulates) to reject failing models and concentrate on the ones that come closest to working, while also not using the models that are obviously not working in any sort of “average” claim for future warming. Maybe they could hire themselves a Bayesian or two and get them to recompute the AR curves, I dunno.

It would take me, in my comparative ignorance, around five minutes to throw out all but the best 10% of the GCMs (which are still diverging from the empirical data, but arguably are well within the expected fluctuation range on the DATA side), sort the remainder into top-half models that should probably be kept around and possibly improved, and bottom half models whose continued use I would defund as a waste of time. That wouldn’t make them actually disappear, of course, only mothball them. If the future climate ever magically popped back up to agree with them, it is a matter of a few seconds to retrieve them from the archives and put them back into use.

Of course if one does this, the GCM predicted climate sensitivity plunges from the totally statistically fraudulent 2.5 C/century to a far more plausible and stillpossibly wrong ~1 C/century, which — surprise — more or less continues the post-LIA warming trend with a small possible anthropogenic contribution. This large a change would bring out pitchforks and torches as people realize just how badly they’ve been used by a small group of scientists and politicians, how much they are the victims of indefensible abuse of statistics to average in the terrible with the merely poor as if they are all equally likely to be true with randomly distributed differences.

rgb

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
323 Comments
Inline Feedbacks
View all comments
Alan D McIntire
June 19, 2013 5:25 am

Robert G. Brown uses a quantum mechanics analogy to make his point. The vast majority of us have no knowledge of quantum mechanics nor do we have any way to make meaningful measurements in the field. In contrast, we have all spent a lifetime experiencing climate, so we all have at least a rudimentary knowledge of climate.
Believers in catastrophic global warming have often used the “doctor” analogy to argue their case. They state something like, “The consensus of climatologist says there is serious global warming. Who would you trust, a doctor (consensus) or a quack(non consensus). A better analogy, (given the average person’s familiarity with climate and motor vehicles as opposed to quantum mechanics and medicine) might be,
“Who would you trust, a friend or a used car salesman?”

commieBob
June 19, 2013 5:36 am

Tsk Tsk says:
June 18, 2013 at 7:01 pm
Brown raises a potentially valid point about the statistical analysis of the ensemble, but his carbon atom comparison risks venturing into strawman territory.

“To ‘attack a straw man’ is to create the illusion of having refuted a proposition by replacing it with a superficially similar yet unequivalent proposition (the ‘straw man’), and to refute it, without ever having actually refuted the original position.” http://en.wikipedia.org/wiki/Straw_man
Modelling a carbon atom is very simple compared with modelling the global climate. rgbatduke enumerates some of the problems involved with using the ‘well-known physics’ to model the simpler case and points out that the chance of success is much smaller in the more complicated case. Unless I am badly misunderstanding something, this is hardly a strawman argument.

rgbatduke – “Let’s invert this process and actually apply statistical analysis to the distribution of model results Re: the claim that they all correctly implement well-known physics.”

Yes, they do indeed claim that they correctly implement well-known physics. They’re wrong. All the models are wrong because they are all based on a faulty understanding of the well-known physics.

John Archer
June 19, 2013 5:39 am

I was going to post this on the earlier thread No significant warming for 17 years and 4 months by Lord Monckton on June 13th but maybe here is better now.
FAN FEEDBACK:
Prof. R G Brown’s contributions here are the dog’s bollocks. This is highly informed rational thinking at its best. It has given me HUGE pleasure reading them. I second Lord Monckton’s appeal to have them given greater prominence and in general for them to be more widely circulated.
The experience has been absolutely THRILLING! I really don’t know why I should associate the two, but it gave me something akin to the intense pleasure I got from watching the exquisite artistry of Cassius Clay when he was on top form!
WAIT! It’s them knockout combos! YES, that’s it!
Oh, and that lovely footwork, too! 🙂
Thank you very much indeed, Professor. Sock it to ’em!

Jimbo
June 19, 2013 5:41 am

Nick Stokes has put up a brave but foolhardy defence of failure. How ever you want to look at the models they have FAILED. Garbage went in, garbage came out. Policies around the world are being formulated on the back of failed garbage. The IPCC is like the Met Office UK, they have a high success rate in temperature projection / prediction failure. Just looking at the AR5 graph is quite frankly embarrassing even for me.

MattN
June 19, 2013 5:43 am

RGB, how often do you try to talk sense into your colleague Bill Chameides at http://www.thegreengrok.com? Or have you just given up by now?

Scott
June 19, 2013 5:52 am

Here’s an analogy to climate models. Every year mutitudes of NFL fantasy football websites tweak their “models” incorporating “hindcasts” of last years NFL player performances to “predict” with “high confidence” the players 2013 performance. For good measure, most throw in a few “extreme” predictions to make people think they are really smart and try to entice them to buy into their predictions. There is even a site that “averages” this “ensemble” of predictions so a fantasy football drafter can distill it all down into one prediction and draft with the best information available. Such a drafter hardly ever wins. Why? because all these smart predictors had an eye on each other, trying to make sure their predictions weren’t too outlandish because if they were 1) no one would believe them this year because all the other experts are saying something different, and 2) from a business perspective if they ended up dead last in their predictions they might be out of business next year. They don’t want to be too far from the average, so they aren’t. It ends up being like a crowd of drunks with arms on each others shoulders, staggering and weaving but in the end all supporting each other right up to the start of the season. Then the predictions immediately start falling apart but really don’t matter much anymore. Because the purpose of all these high confidence predictions is to sell subscriptions to the website, not necessarily to be right. In fact being the best of the worst is sometimes the definition of perfection in the game of prediction.

george h.
June 19, 2013 5:56 am

RGB, my ensemble of models predict an IRS audit in your future.

June 19, 2013 5:56 am

rgb sez:
“””We can then sort out the models by putting (say) all but the top five or so into a “failed” bin and stop including them in any sort of analysis or policy decisioning whatsoever unless or until they start to actually agree with reality.””””
Please note that he sez “the models”, not just the models selected by the IPCC to illustrate their self serving concept of th the future climate, but all the models.

McComber Boy
June 19, 2013 5:57 am

The more I read Nick Stokes’ drivel and harping on about – neener, neener, nearer, he did it first – the less I expect any chance of real discourse from that corner. I’m reminded so much of the old Spinal Tap, in character, interview about their amplifiers. When sanity is introduced, the answer is always, “Ours go to eleven”.

Poor Nick! He just says, over and over, that all of the knobs on all the climate amplifier models are already set on 11! But of course none of them are actually plugged in.
pbh

jeanparisot
June 19, 2013 5:58 am

My CEU credits for the month are taken care of. Thank you.

Mike M
June 19, 2013 6:01 am

One thing I’m certain of is that any model that may have inadvertantly predicted cooling would have immediately been erased by someone whose continued income depends on predicting warming. I guess I’m pointing out my certainty of the possibility of a larger uncertainty.

jeanparisot
June 19, 2013 6:05 am

Shouldn’t these models be judged on multiple criteria to avoid introducing current measurement bias (as opposed to the existing bias in the inputs), so: sea levels, precipitation, the distribution of warming, atmospheric water vapor, etc.

Mike M
June 19, 2013 6:10 am

Scott says: “It ends up being like a crowd of drunks with arms on each others shoulders,”
Which is a lot like the stock market; we’re all going in this direction because…..

Bill_W
June 19, 2013 6:32 am

Nick Stokes,
I assume if it says multi-MODEL mean that this is exactly what it says. If it was multiple runs from the same model, they would call it something else, I would hope. Like a multi-run mean or CMP76-31 output mean. (fyi- I just made that climate model up).

June 19, 2013 6:45 am

The current issue of The Economist magazine from London includes “Tilting at Windmills”, a lengthy, revealing article about Germany’s infatuation with renewable energy. Will it likely result in a consumer revolution over skyrocketing electric power costs and an eventual economic train wreck? Seems like a high price to pay for enduring the last 17 years of no global warming.
But the Europeans have been duly brainwashed to continuing fighting the war on carbon. Here’s a link to this informative article:
http://www.economist.com/news/special-report/21579149-germanys-energiewende-bodes-ill-countrys-european-leadership-tilting-windmills

Jimbo
June 19, 2013 6:46 am

Monckton of Brenchley in reply to the arm waving Nick Stokes on this thread says it starkly.

…It does not matter whether one takes the upper bound or lower bound of the models’ temperature projections or anywhere in between: the models are predicting that global warming should by now be occurring at a rate that is not evident in observed reality….

Nick Stokes is attempting bring up all kinds of defences for the climate models but does not want to deal with the elephant in the room. You could call it a dead parrot.

Monty Python
…..’E’s not pinin’! ‘E’s passed on! This parrot is no more! He has ceased to be! ‘E’s expired and gone to meet ‘is maker! ‘E’s a stiff! Bereft of life, ‘e rests in peace! If you hadn’t nailed ‘im to the perch ‘e’d be pushing up the daisies! ‘Is metabolic processes are now ‘istory! ‘E’s off the twig! ‘E’s kicked the bucket, ‘e’s shuffled off ‘is mortal coil, run down the curtain and joined the bleedin’ choir invisible!! THIS IS AN EX-PARROT!!…..

tadchem
June 19, 2013 6:48 am

It is a core principle of the scientific method that demonstrably erroneous hypotheses that lead to inaccurate and unreliable prediction must be discarded. Evidently, ACC ‘modelers’ have discarded the scientific method.

June 19, 2013 6:49 am

jeanparisot says:
June 19, 2013 at 6:05 am

Shouldn’t these models be judged on multiple criteria to avoid introducing current measurement bias (as opposed to the existing bias in the inputs), so: sea levels, precipitation, the distribution of warming, atmospheric water vapor, etc.

They should be, but they can’t, they’re horribly wrong. This is why they trot out a global average temperature, regional annual temps just don’t match reality, let alone regional monthly/daily temps, they’re even worse.

June 19, 2013 6:51 am

Oh, I should note, even the global annual temps don’t match reality, since that’s the topic of this blog.

Latitude
June 19, 2013 6:55 am

Roy Spencer says:
June 19, 2013 at 4:10 am
We don’t even know whether the 10% of the models closest to the observations are closer by chance (they contain similar butterflies) or because their underlying physical processes are better.
===================
thank you……over and out

Lloyd Martin Hendaye
June 19, 2013 7:01 am

We could say all this in some 150 words, and have for years, yet such common statistical knowledge ever bears repeating. In any rational context, the number of egregious major fallacies infesting GCMs would make laughing-stocks of their proponents. But whoever thought that AGW Catasrophism with its Green Gang of Klimat Kooks was ever about objective fact or valid scientific inference?
Unfortunately, political/economic shyster-ism on this scale has consequences. “Kleiner mann, was nun?” as bastardized Luddite sociopaths trot out the mega-deaths.

johnmarshall
June 19, 2013 7:03 am

The frayed rope graph that Dr. Spencer produced, together with the real time data, just shows how far the models are from reality . It is the fixation on CO2 that has caused this disfunction. The divergence just shows how far the models are from reality.
Reality is what it is about not some religious belief that a trace gas drives climate.

eyesonu
June 19, 2013 7:05 am

This is a very good posting as written by Dr. Robert G. Brown. No summarizing needed as Brown sums it up quite well.
It’s going to take a while to read all the comments posted so far but it would seem to be a good idea to include the Spencer spaghetti graph which would be important to any readers that haven’t been following closely the past couple of weeks. Just saying.

T.C.
June 19, 2013 7:06 am

So, the basic equation here is:
Something from nothing = Nothing.

Frank K.
June 19, 2013 7:06 am

First of all, BRAVO. Excellent article by Dr. Brown, and spot on, based on my 20 years working professionally in computational fluid dynamics.
Jimbo says:
June 19, 2013 at 5:11 am
“For Warmists who say that Robert Brown doesn’t know much about computing or models see an excerpt from his about page.”
http://www.phy.duke.edu/~rgb/About/about.php
Jimbo – you should know by now that NONE of the CAGW scientists ever reall want to discuss the models! Every time I bring up annoying issues like differential equations, boundary and initial conditions, stability, well-posedness, coupling, source terms, non-linearity, numerical methods etc., they go silent. It’s the strangest phenomenon I’ve ever experienced, considering you would think that they would LOVE to talk about their models.
BTW, one reason you will never see any progress towards perfecting multi-model ensemble climate forecasts is that none of the climate modelers want to say whose models are “good” and whose are “bad”…

1 4 5 6 7 8 13