This comment from rgbatduke, who is Robert G. Brown at the Duke University Physics Department on the No significant warming for 17 years 4 months thread. It has gained quite a bit of attention because it speaks clearly to truth. So that all readers can benefit, I’m elevating it to a full post
Saying that we need to wait for a certain interval in order to conclude that “the models are wrong” is dangerous and incorrect for two reasons. First — and this is a point that is stunningly ignored — there are a lot of different models out there, all supposedly built on top of physics, and yet no two of them give anywhere near the same results!
This is reflected in the graphs Monckton publishes above, where the AR5 trend line is the average over all of these models and in spite of the number of contributors the variance of the models is huge. It is also clearly evident if one publishes a “spaghetti graph” of the individual model projections (as Roy Spencer recently did in another thread) — it looks like the frayed end of a rope, not like a coherent spread around some physics supported result.
Note the implicit swindle in this graph — by forming a mean and standard deviation over model projections and then using the mean as a “most likely” projection and the variance as representative of the range of the error, one is treating the differences between the models as if they are uncorrelated random variates causing >deviation around a true mean!.
Say what?
This is such a horrendous abuse of statistics that it is difficult to know how to begin to address it. One simply wishes to bitch-slap whoever it was that assembled the graph and ensure that they never work or publish in the field of science or statistics ever again. One cannot generate an ensemble of independent and identically distributed models that have different code. One might, possibly, generate a single model that generates an ensemble of predictions by using uniform deviates (random numbers) to seed
“noise” (representing uncertainty) in the inputs.
What I’m trying to say is that the variance and mean of the “ensemble” of models is completely meaningless, statistically because the inputs do not possess the most basic properties required for a meaningful interpretation. They are not independent, their differences are not based on a random distribution of errors, there is no reason whatsoever to believe that the errors or differences are unbiased (given that the only way humans can generate unbiased anything is through the use of e.g. dice or other objectively random instruments).
So why buy into this nonsense by doing linear fits to a function — global temperature — that has never in its entire history been linear, although of course it has always been approximately smooth so one can always do a Taylor series expansion in some sufficiently small interval and get a linear term that — by the nature of Taylor series fits to nonlinear functions — is guaranteed to fail if extrapolated as higher order nonlinear terms kick in and ultimately dominate? Why even pay lip service to the notion that or
for a linear fit, or for a Kolmogorov-Smirnov comparison of the real temperature record and the extrapolated model prediction, has some meaning? It has none.
Let me repeat this. It has no meaning! It is indefensible within the theory and practice of statistical analysis. You might as well use a ouija board as the basis of claims about the future climate history as the ensemble average of different computational physical models that do not differ by truly random variations and are subject to all sorts of omitted variable, selected variable, implementation, and initialization bias. The board might give you the right answer, might not, but good luck justifying the answer it gives on some sort of rational basis.
Let’s invert this process and actually apply statistical analysis to the distribution of model results Re: the claim that they all correctly implement well-known physics. For example, if I attempt to do an a priori computation of the quantum structure of, say, a carbon atom, I might begin by solving a single electron model, treating the electron-electron interaction using the probability distribution from the single electron model to generate a spherically symmetric “density” of electrons around the nucleus, and then performing a self-consistent field theory iteration (resolving the single electron model for the new potential) until it converges. (This is known as the Hartree approximation.)
Somebody else could say “Wait, this ignore the Pauli exclusion principle” and the requirement that the electron wavefunction be fully antisymmetric. One could then make the (still single electron) model more complicated and construct a Slater determinant to use as a fully antisymmetric representation of the electron wavefunctions, generate the density, perform the self-consistent field computation to convergence. (This is Hartree-Fock.)
A third party could then note that this still underestimates what is called the “correlation energy” of the system, because treating the electron cloud as a continuous distribution through when electrons move ignores the fact thatindividual electrons strongly repel and hence do not like to get near one another. Both of the former approaches underestimate the size of the electron hole, and hence they make the atom “too small” and “too tightly bound”. A variety of schema are proposed to overcome this problem — using a semi-empirical local density functional being probably the most successful.
A fourth party might then observe that the Universe is really relativistic, and that by ignoring relativity theory and doing a classical computation we introduce an error into all of the above (although it might be included in the semi-empirical LDF approach heuristically).
In the end, one might well have an “ensemble” of models, all of which are based on physics. In fact, the differences are also based on physics — the physicsomitted from one try to another, or the means used to approximate and try to include physics we cannot include in a first-principles computation (note how I sneaked a semi-empirical note in with the LDF, although one can derive some density functionals from first principles (e.g. Thomas-Fermi approximation), they usually don’t do particularly well because they aren’t valid across the full range of densities observed in actual atoms). Note well, doing the precise computation is not an option. We cannot solve the many body atomic state problem in quantum theory exactly any more than we can solve the many body problem exactly in classical theory or the set of open, nonlinear, coupled, damped, driven chaotic Navier-Stokes equations in a non-inertial reference frame that represent the climate system.
Note well that solving for the exact, fully correlated nonlinear many electron wavefunction of the humble carbon atom — or the far more complex Uranium atom — is trivially simple (in computational terms) compared to the climate problem. We can’t compute either one, but we can come a damn sight closer to consistently approximating the solution to the former compared to the latter.
So, should we take the mean of the ensemble of “physics based” models for the quantum electronic structure of atomic carbon and treat it as the best predictionof carbon’s quantum structure? Only if we are very stupid or insane or want to sell something. If you read what I said carefully (and you may not have — eyes tend to glaze over when one reviews a year or so of graduate quantum theory applied to electronics in a few paragraphs, even though I left out perturbation theory, Feynman diagrams, and ever so much more:-) you will note that I cheated — I run in a semi-empirical method.
Which of these is going to be the winner? LDF, of course. Why? Because theparameters are adjusted to give the best fit to the actual empirical spectrum of Carbon. All of the others are going to underestimate the correlation hole, and their errors will be systematically deviant from the correct spectrum. Their mean will be systematically deviant, and by weighting Hartree (the dumbest reasonable “physics based approach”) the same as LDF in the “ensemble” average, you guarantee that the error in this “mean” will be significant.
Suppose one did not know (as, at one time, we did not know) which of the models gave the best result. Suppose that nobody had actually measured the spectrum of Carbon, so its empirical quantum structure was unknown. Would the ensemble mean be reasonable then? Of course not. I presented the models in the wayphysics itself predicts improvement — adding back details that ought to be important that are omitted in Hartree. One cannot be certain that adding back these details will actually improve things, by the way, because it is always possible that the corrections are not monotonic (and eventually, at higher orders in perturbation theory, they most certainly are not!) Still, nobody would pretend that the average of a theory with an improved theory is “likely” to be better than the improved theory itself, because that would make no sense. Nor would anyone claim that diagrammatic perturbation theory results (for which there is a clear a priori derived justification) are necessarily going to beat semi-heuristic methods like LDF because in fact they often do not.
What one would do in the real world is measure the spectrum of Carbon, compare it to the predictions of the models, and then hand out the ribbons to the winners! Not the other way around. And since none of the winners is going to be exact — indeed, for decades and decades of work, none of the winners was even particularly close to observed/measured spectra in spite of using supercomputers (admittedly, supercomputers that were slower than your cell phone is today) to do the computations — one would then return to the drawing board and code entry console to try to do better.
Can we apply this sort of thoughtful reasoning the spaghetti snarl of GCMs and their highly divergent results? You bet we can! First of all, we could stop pretending that “ensemble” mean and variance have any meaning whatsoever bynot computing them. Why compute a number that has no meaning? Second, we could take the actual climate record from some “epoch starting point” — one that does not matter in the long run, and we’ll have to continue the comparison for the long run because in any short run from any starting point noise of a variety of sorts will obscure systematic errors — and we can just compare reality to the models. We can then sort out the models by putting (say) all but the top five or so into a “failed” bin and stop including them in any sort of analysis or policy decisioning whatsoever unless or until they start to actually agree with reality.
Then real scientists might contemplate sitting down with those five winners and meditate upon what makes them winners — what makes them come out the closest to reality — and see if they could figure out ways of making them work even better. For example, if they are egregiously high and diverging from the empirical data, one might consider adding previously omitted physics, semi-empirical or heuristic corrections, or adjusting input parameters to improve the fit.
Then comes the hard part. Waiting. The climate is not as simple as a Carbon atom. The latter’s spectrum never changes, it is a fixed target. The former is never the same. Either one’s dynamical model is never the same and mirrors the variation of reality or one has to conclude that the problem is unsolved and the implementation of the physics is wrong, however “well-known” that physics is. So one has to wait and see if one’s model, adjusted and improved to better fit the past up to the present, actually has any predictive value.
Worst of all, one cannot easily use statistics to determine when or if one’s predictions are failing, because damn, climate is nonlinear, non-Markovian, chaotic, and is apparently influenced in nontrivial ways by a world-sized bucket of competing, occasionally cancelling, poorly understood factors. Soot. Aerosols. GHGs. Clouds. Ice. Decadal oscillations. Defects spun off from the chaotic process that cause global, persistent changes in atmospheric circulation on a local basis (e.g. blocking highs that sit out on the Atlantic for half a year) that have a huge impact on annual or monthly temperatures and rainfall and so on. Orbital factors. Solar factors. Changes in the composition of the troposphere, the stratosphere, the thermosphere. Volcanoes. Land use changes. Algae blooms.
And somewhere, that damn butterfly. Somebody needs to squash the damn thing, because trying to ensemble average a small sample from a chaotic system is so stupid that I cannot begin to describe it. Everything works just fine as long as you average over an interval short enough that you are bound to a given attractor, oscillating away, things look predictable and then — damn, you change attractors.Everything changes! All the precious parameters you empirically tuned to balance out this and that for the old attractor suddenly require new values to work.
This is why it is actually wrong-headed to acquiesce in the notion that any sort of p-value or Rsquared derived from an AR5 mean has any meaning. It gives up the high ground (even though one is using it for a good purpose, trying to argue that this “ensemble” fails elementary statistical tests. But statistical testing is a shaky enough theory as it is, open to data dredging and horrendous error alike, and that’s when it really is governed by underlying IID processes (see “Green Jelly Beans Cause Acne”). One cannot naively apply a criterion like rejection if p < 0.05, and all that means under the best of circumstances is that the current observations are improbable given the null hypothesis at 19 to 1. People win and lose bets at this level all the time. One time in 20, in fact. We make a lot of bets!
So I would recommend — modestly — that skeptics try very hard not to buy into this and redirect all such discussions to questions such as why the models are in such terrible disagreement with each other, even when applied to identical toy problems that are far simpler than the actual Earth, and why we aren’t using empirical evidence (as it accumulates) to reject failing models and concentrate on the ones that come closest to working, while also not using the models that are obviously not working in any sort of “average” claim for future warming. Maybe they could hire themselves a Bayesian or two and get them to recompute the AR curves, I dunno.
It would take me, in my comparative ignorance, around five minutes to throw out all but the best 10% of the GCMs (which are still diverging from the empirical data, but arguably are well within the expected fluctuation range on the DATA side), sort the remainder into top-half models that should probably be kept around and possibly improved, and bottom half models whose continued use I would defund as a waste of time. That wouldn’t make them actually disappear, of course, only mothball them. If the future climate ever magically popped back up to agree with them, it is a matter of a few seconds to retrieve them from the archives and put them back into use.
Of course if one does this, the GCM predicted climate sensitivity plunges from the totally statistically fraudulent 2.5 C/century to a far more plausible and stillpossibly wrong ~1 C/century, which — surprise — more or less continues the post-LIA warming trend with a small possible anthropogenic contribution. This large a change would bring out pitchforks and torches as people realize just how badly they’ve been used by a small group of scientists and politicians, how much they are the victims of indefensible abuse of statistics to average in the terrible with the merely poor as if they are all equally likely to be true with randomly distributed differences.
rgb
O/T but this is REALLY IMPORTANT NEWS. see http://www.dailymail.co.uk/news/article-2343966/Germany-threatens-hit-Mercedes-BMW-production-Britain-France-Italy-carbon-emission-row.html
It would appear that Germany (Europe’s most powerful player) has woken up and has now realised the adverse effect of carbon emission restrictions.
I have often commented that Germany will not let its manufacturing struggle as a consequence of such restrictions and/or high energy prices (which germany is begining to realise is disastrouse for its small industries which are the life blood of German manufacturing).
First, germany is moving away from renewables and is building 23 coal powered stations for cheap and reliable energy.
Second, Germany wants to rein back against too restrictive carbon emissions.
The combination of this new approach is a ground changer in European terms.
I agree with DonV, that “global average temperature” does not have a physical meaning. I believe there was a great post by Anthony about a year ago on this topic.
If I were to try and take an average temperature of my house, where do I start? Placing thermometers in the bedrooms, loft and basement, fridge, freezer, cooker and oven, inside the shower, and inside the bed, and then averaging the measurements? OK, to a man with a hammer everything looks like a nail, and to a man with lots of thermometers average temperatures can be measured anywhere.
But what meaning exactly this number will have? And how is it at all possible, to measure average of the whole planet, to an accuracy of a tenth of a degree, I pray tell me, when a passing cloud can drop temperatures of a thermometer by many degrees? The error bars of any such “average” should be about 10 deg, so any decimal is utterly meaningless, and simply bad science.
Surely the ensemble average would be OK if warmists could claim they were based on non-Markovian guesses?
Surely this is the rub!
“And what’s that big black line down the middle? A multi-model mean!”
OMG Nick..
yes. it was a multi model mean.. designed to show how meaningless the multi-model mean is !!
And I credited you with some meagre intelligence… my bad !!!
Er… I think that should be Markovian. Tired and all that…
“The “ensemble” of models is completely meaningless, statistically”
Absolutely correct. What i think has happened in the world of climate modelling is that it has been assumed that one can apply The Central Limit Theorem to the results of independent and different models.
http://en.wikipedia.org/wiki/Central_limit_theorem
Models are the mathematical formulation of assumptions combined with a subset of physical laws.
The Central limit theorem relates to the multiple sampling (measuring) of reality.
“Let me repeat this. It has no meaning!”
That’s why I call the warmist government scientists pseudoscientists for a number of years now.
Duke
You say that the type of statistics carried out on the collated models is nonsense or has no meaning. Can you or others point out why I’m wrong – I can’t see a problem with the approach. Perhaps I don’t quite grasp the issue.
1) The process adopted seems to be one akin to stochastic modelling in that you do several runs with different starting conditions to give you a range of outputs with a cumulative distribution function (cdf) and mean. From this one has a measure of uncertainty.
2) Climate models, have inputs and assumed sensitivities (correct?). Surely these are you’re starting conditions which can be changed to give a range of simulated results: cdf.
3) Each team models under different assumptions and therefore the runs are akin to stochastic models, and the widely used methodology is sound.
Personally I think the models are nonsense.
I agree with much of what Bob has said here, as well as Greg L’s expansion upon those comments. It seems that the fact that we have only one realization — one Earth — greatly limits what we can determine about causation in the climate system, and makes a lot of the statistics regarding multiple model simulations rather meaningless. We don’t even know whether the 10% of the models closest to the observations are closer by chance (they contain similar butterflies) or because their underlying physical processes are better.
The models are all junk, being based on a fundamental flaw – that increased C02 is somehow “forcing” temperatures up. They then try to disguise those flaws, using conveniently uncertain variables, such as volcanoes and aeorosols, which can be used to tweak the models accordingly. But now, after over 17 years of temperatures flatlining, even the yeoman tweaksters can’t do enough tweaking of their playstation climate models to make them coincide with reality. So, they have resorted to fantasies, such as the “missing” heat hiding in the deep oceans. Even they must know that the jig is up.
Nick Stokes says:
June 19, 2013 at 12:29 am
“I think it is likely that for various purposes AR5 has calculated model averages. AR4 and AR3 certainly did, and no-one said it was a blunder. ”
The first time I heard of the multi model means I thought this is propaganda, not science. Why would you even think of averaging the output of several different computer programs and hope that the AVERAGE is better than each of the instances? One broken model can complete wreck your predictive skill.
IF Warmism were a SCIENCE there would have to be a JUSTIFICATION for this but there ISN’
T ONE.
“People average all sorts of things. ”
People do all sort of stupid things. Scientists are supposed to know what they are doing. They are also supposed to be honest. Nothing of this is the case in warmist science; it is a make-work scheme for con-men.
AndyG55 says: June 19, 2013 at 3:56 am
“And what’s that big black line down the middle? A multi-model mean!”
“yes. it was a multi model mean.. designed to show how meaningless the multi-model mean is !!”
No it wasn’t. The WUWT post was headed EPIC FAIL. Nothing is said against the model mean. Instead, it is the statistic used to show discrepancy between the models and radiosonde/satellite data (also averaged). If model mean is a dud statistic, it would be useless for that purpose.
REPLY: Fixed your Italics Nick. The EPIC fail has to do with the envelope of the models, not the mean, diverging from the observations, much like the AR5 graph:
http://wattsupwiththat.files.wordpress.com/2012/12/ipcc_ar5_draft_fig1-4_without.png
A mean of junk, is still junk. Neither the models envelope nor their mean have any predictive skill, hence they are junk.
This is further illustrated by the divergence of trend lines, which don’t rely on the mean nor envelopes.
http://www.drroyspencer.com/wp-content/uploads/CMIP5-19-USA-models-vs-obs-20N-20S-MT.png
I know that is hard for you to admit being ex CSIRO and all, but the climate models are junk, and that’s the reality of the situation. – Anthony
Completely agree with Brown. You don’t have to be a statistician to understand that the mean opinion of a number of deluded people is not closer to the truth than any individual opinion. What Brown said: take one model and simulate many results using random noise. The relative frequency of results with temperature slopes below the observed slope, is the type II error rate. If it is less than five percent, the model should be rejected.
Fixed your Italics Nick.
Thanks.
The EPIC fail has to do with the envelope of the models, not the mean, diverging from the observations,…
Yes, agreed. But if the mean is meaningless, why was it added, and so emphatically?
REPLY: Maybe just following the lead from Real Climate? – Anthony
http://www.realclimate.org/images/model122.jpg
From:
http://www.realclimate.org/index.php/archives/2013/02/2012-updates-to-model-observation-comparions/
Excellent work. Made me chuckle. “Lies, Damn Lies, and Statistics”. Muwahahahahaaaaaa
rgb, Amen and halle-fraking-lujah!
As someone who has some modeling experience (15+ years electronics simulations and models), I’ve been complaining about them for a decade, but you’ve stated it far far better than I could.
Poor Nick, comprehension seems not to be your strong point !!
For Warmists who say that Robert Brown doesn’t know much about computing or models see an excerpt from his about page.
http://www.phy.duke.edu/~rgb/About/about.php
In cricketing parlance.. twenty scores of 5, DOES NOT mean you have scored a century !!
Junk is junk.. …. is junk !!
Can’t bat, can’t bowl… you know the story !
“Nick Stokes says:
June 19, 2013 at 4:19 am
I know that is hard for you to admit being ex CSIRO and all, but the climate models are junk, and that’s the reality of the situation. – Anthony”
If this is true Anthony, it explains, to me living in Aus, a lot. Nick clearly is not a fool, but severely biased. Nick really needs to put aside that bias, read through RGB’s post with an open mind. RGB’s post, as do all his posts, makes complete sense to me.
Nick: Adding the mean to the display shows clearly where all the (hopeless) models lie. None of the models in the ensemble have any real value as predictors over even a relatively short time period since 1979. The IPCC does show displays with the mean model ensemble, as has been pointed out elsewhere on the thread.
Harping on as you are about who shows the mean of the ensemble is essentially a side-show. The bottom line is not one of the models in the ensemble is of any use as a predictor of climate over even the realtively short period since 1979, no matter how wonderful, complex, physically plausible or anything else they maybe. They may be fantastic, interesting, wonderful academic tools to help people develop new, better models in the future, but as predictors they are worse than useless: they are all clearly biased to high temperatures. And to go then further and base public policy on such nonsense is absurd and negligent.
“But if the mean is meaningless, why was it added, and so emphatically?”
To emphasise the farce.. DOH !!!!!
All properly trained statisticians understand the prerequisite conditions for the validity of statistical measures. Fundamental to all measures is the requirement for independent, identically distributed, random sampling. To publish statistics knowing these conditions have not been met is at best stooopidity, at worst it’s a sinister, blatant fraud.
Now I don’t believe these guys are stoooopid, just misguided, after all really stoooooopid people don’t get to have THAT much influence. That only leaves blatant deception and fraud, or have I missed something?
Nick Stokes says – June 19, 2013 at 3:13 am
I’m not sure the argument holds up.
It may well be that lots of people average things that don’t have any justification to be averaged together. Just because I may like the outcome and the person who did the averaging does not mean they are right to do it.
The practise has to be justified on its own terms not in terms of ‘that’s my side’ or ‘that’s a handy outcome’. The original post makes a very good case that averaging models which embody different understandings of the physics tells us nothing of value and hides what value their may be in the models.
It shouldn’t be done.
It is certainly the case that averaging multiple runs of the same model is a sound practise. It tells us something about the characteristics of that model and the understanding of the physics it embodies.
But I don’t think that’s what the AR5 quote I linked to is doing.
I don’t think AR5 is a sound practise.
Have you noticed that over the last few years Warmists will point to the lowest model projection and say it closely matches observed temperature. This here is part of the con job at work. No mention about the other 95% failed projections. The models failed some time back and are demonstrating their continued failure for each day the standstill continues. How much longer can they carry on this charade?