# The “ensemble” of models is completely meaningless, statistically

This  comment from rgbatduke, who is Robert G. Brown at the Duke University Physics Department on the No significant warming for 17 years 4 months thread. It has gained quite a bit of attention because it speaks clearly to truth. So that all readers can benefit, I’m elevating it to a full post

rgbatduke says:

June 13, 2013 at 7:20 am

Saying that we need to wait for a certain interval in order to conclude that “the models are wrong” is dangerous and incorrect for two reasons. First — and this is a point that is stunningly ignored — there are a lot of different models out there, all supposedly built on top of physics, and yet no two of them give anywhere near the same results!

This is reflected in the graphs Monckton publishes above, where the AR5 trend line is the average over all of these models and in spite of the number of contributors the variance of the models is huge. It is also clearly evident if one publishes a “spaghetti graph” of the individual model projections (as Roy Spencer recently did in another thread) — it looks like the frayed end of a rope, not like a coherent spread around some physics supported result.

Note the implicit swindle in this graph — by forming a mean and standard deviation over model projections and then using the mean as a “most likely” projection and the variance as representative of the range of the error, one is treating the differences between the models as if they are uncorrelated random variates causing >deviation around a true mean!.

Say what?

This is such a horrendous abuse of statistics that it is difficult to know how to begin to address it. One simply wishes to bitch-slap whoever it was that assembled the graph and ensure that they never work or publish in the field of science or statistics ever again. One cannot generate an ensemble of independent and identically distributed models that have different code. One might, possibly, generate a single model that generates an ensemble of predictions by using uniform deviates (random numbers) to seed

“noise” (representing uncertainty) in the inputs.

What I’m trying to say is that the variance and mean of the “ensemble” of models is completely meaningless, statistically because the inputs do not possess the most basic properties required for a meaningful interpretation. They are not independent, their differences are not based on a random distribution of errors, there is no reason whatsoever to believe that the errors or differences are unbiased (given that the only way humans can generate unbiased anything is through the use of e.g. dice or other objectively random instruments).

So why buy into this nonsense by doing linear fits to a function — global temperature — that has never in its entire history been linear, although of course it has always been approximately smooth so one can always do a Taylor series expansion in some sufficiently small interval and get a linear term that — by the nature of Taylor series fits to nonlinear functions — is guaranteed to fail if extrapolated as higher order nonlinear terms kick in and ultimately dominate? Why even pay lip service to the notion that  or  for a linear fit, or for a Kolmogorov-Smirnov comparison of the real temperature record and the extrapolated model prediction, has some meaning? It has none.

Let me repeat this. It has no meaning! It is indefensible within the theory and practice of statistical analysis. You might as well use a ouija board as the basis of claims about the future climate history as the ensemble average of different computational physical models that do not differ by truly random variations and are subject to all sorts of omitted variable, selected variable, implementation, and initialization bias. The board might give you the right answer, might not, but good luck justifying the answer it gives on some sort of rational basis.

Let’s invert this process and actually apply statistical analysis to the distribution of model results Re: the claim that they all correctly implement well-known physics. For example, if I attempt to do an a priori computation of the quantum structure of, say, a carbon atom, I might begin by solving a single electron model, treating the electron-electron interaction using the probability distribution from the single electron model to generate a spherically symmetric “density” of electrons around the nucleus, and then performing a self-consistent field theory iteration (resolving the single electron model for the new potential) until it converges. (This is known as the Hartree approximation.)

Somebody else could say “Wait, this ignore the Pauli exclusion principle” and the requirement that the electron wavefunction be fully antisymmetric. One could then make the (still single electron) model more complicated and construct a Slater determinant to use as a fully antisymmetric representation of the electron wavefunctions, generate the density, perform the self-consistent field computation to convergence. (This is Hartree-Fock.)

A third party could then note that this still underestimates what is called the “correlation energy” of the system, because treating the electron cloud as a continuous distribution through when electrons move ignores the fact thatindividual electrons strongly repel and hence do not like to get near one another. Both of the former approaches underestimate the size of the electron hole, and hence they make the atom “too small” and “too tightly bound”. A variety of schema are proposed to overcome this problem — using a semi-empirical local density functional being probably the most successful.

A fourth party might then observe that the Universe is really relativistic, and that by ignoring relativity theory and doing a classical computation we introduce an error into all of the above (although it might be included in the semi-empirical LDF approach heuristically).

In the end, one might well have an “ensemble” of models, all of which are based on physics. In fact, the differences are also based on physics — the physicsomitted from one try to another, or the means used to approximate and try to include physics we cannot include in a first-principles computation (note how I sneaked a semi-empirical note in with the LDF, although one can derive some density functionals from first principles (e.g. Thomas-Fermi approximation), they usually don’t do particularly well because they aren’t valid across the full range of densities observed in actual atoms). Note well, doing the precise computation is not an option. We cannot solve the many body atomic state problem in quantum theory exactly any more than we can solve the many body problem exactly in classical theory or the set of open, nonlinear, coupled, damped, driven chaotic Navier-Stokes equations in a non-inertial reference frame that represent the climate system.

Note well that solving for the exact, fully correlated nonlinear many electron wavefunction of the humble carbon atom — or the far more complex Uranium atom — is trivially simple (in computational terms) compared to the climate problem. We can’t compute either one, but we can come a damn sight closer to consistently approximating the solution to the former compared to the latter.

So, should we take the mean of the ensemble of “physics based” models for the quantum electronic structure of atomic carbon and treat it as the best predictionof carbon’s quantum structure? Only if we are very stupid or insane or want to sell something. If you read what I said carefully (and you may not have — eyes tend to glaze over when one reviews a year or so of graduate quantum theory applied to electronics in a few paragraphs, even though I left out perturbation theory, Feynman diagrams, and ever so much more:-) you will note that I cheated — I run in a semi-empirical method.

Which of these is going to be the winner? LDF, of course. Why? Because theparameters are adjusted to give the best fit to the actual empirical spectrum of Carbon. All of the others are going to underestimate the correlation hole, and their errors will be systematically deviant from the correct spectrum. Their mean will be systematically deviant, and by weighting Hartree (the dumbest reasonable “physics based approach”) the same as LDF in the “ensemble” average, you guarantee that the error in this “mean” will be significant.

Suppose one did not know (as, at one time, we did not know) which of the models gave the best result. Suppose that nobody had actually measured the spectrum of Carbon, so its empirical quantum structure was unknown. Would the ensemble mean be reasonable then? Of course not. I presented the models in the wayphysics itself predicts improvement — adding back details that ought to be important that are omitted in Hartree. One cannot be certain that adding back these details will actually improve things, by the way, because it is always possible that the corrections are not monotonic (and eventually, at higher orders in perturbation theory, they most certainly are not!) Still, nobody would pretend that the average of a theory with an improved theory is “likely” to be better than the improved theory itself, because that would make no sense. Nor would anyone claim that diagrammatic perturbation theory results (for which there is a clear a priori derived justification) are necessarily going to beat semi-heuristic methods like LDF because in fact they often do not.

What one would do in the real world is measure the spectrum of Carbon, compare it to the predictions of the models, and then hand out the ribbons to the winners! Not the other way around. And since none of the winners is going to be exact — indeed, for decades and decades of work, none of the winners was even particularly close to observed/measured spectra in spite of using supercomputers (admittedly, supercomputers that were slower than your cell phone is today) to do the computations — one would then return to the drawing board and code entry console to try to do better.

Can we apply this sort of thoughtful reasoning the spaghetti snarl of GCMs and their highly divergent results? You bet we can! First of all, we could stop pretending that “ensemble” mean and variance have any meaning whatsoever bynot computing them. Why compute a number that has no meaning? Second, we could take the actual climate record from some “epoch starting point” — one that does not matter in the long run, and we’ll have to continue the comparison for the long run because in any short run from any starting point noise of a variety of sorts will obscure systematic errors — and we can just compare reality to the models. We can then sort out the models by putting (say) all but the top five or so into a “failed” bin and stop including them in any sort of analysis or policy decisioning whatsoever unless or until they start to actually agree with reality.

Then real scientists might contemplate sitting down with those five winners and meditate upon what makes them winners — what makes them come out the closest to reality — and see if they could figure out ways of making them work even better. For example, if they are egregiously high and diverging from the empirical data, one might consider adding previously omitted physics, semi-empirical or heuristic corrections, or adjusting input parameters to improve the fit.

Then comes the hard part. Waiting. The climate is not as simple as a Carbon atom. The latter’s spectrum never changes, it is a fixed target. The former is never the same. Either one’s dynamical model is never the same and mirrors the variation of reality or one has to conclude that the problem is unsolved and the implementation of the physics is wrong, however “well-known” that physics is. So one has to wait and see if one’s model, adjusted and improved to better fit the past up to the present, actually has any predictive value.

Worst of all, one cannot easily use statistics to determine when or if one’s predictions are failing, because damn, climate is nonlinear, non-Markovian, chaotic, and is apparently influenced in nontrivial ways by a world-sized bucket of competing, occasionally cancelling, poorly understood factors. Soot. Aerosols. GHGs. Clouds. Ice. Decadal oscillations. Defects spun off from the chaotic process that cause global, persistent changes in atmospheric circulation on a local basis (e.g. blocking highs that sit out on the Atlantic for half a year) that have a huge impact on annual or monthly temperatures and rainfall and so on. Orbital factors. Solar factors. Changes in the composition of the troposphere, the stratosphere, the thermosphere. Volcanoes. Land use changes. Algae blooms.

And somewhere, that damn butterfly. Somebody needs to squash the damn thing, because trying to ensemble average a small sample from a chaotic system is so stupid that I cannot begin to describe it. Everything works just fine as long as you average over an interval short enough that you are bound to a given attractor, oscillating away, things look predictable and then — damn, you change attractors.Everything changes! All the precious parameters you empirically tuned to balance out this and that for the old attractor suddenly require new values to work.

This is why it is actually wrong-headed to acquiesce in the notion that any sort of p-value or Rsquared derived from an AR5 mean has any meaning. It gives up the high ground (even though one is using it for a good purpose, trying to argue that this “ensemble” fails elementary statistical tests. But statistical testing is a shaky enough theory as it is, open to data dredging and horrendous error alike, and that’s when it really is governed by underlying IID processes (see “Green Jelly Beans Cause Acne”). One cannot naively apply a criterion like rejection if p < 0.05, and all that means under the best of circumstances is that the current observations are improbable given the null hypothesis at 19 to 1. People win and lose bets at this level all the time. One time in 20, in fact. We make a lot of bets!

So I would recommend — modestly — that skeptics try very hard not to buy into this and redirect all such discussions to questions such as why the models are in such terrible disagreement with each other, even when applied to identical toy problems that are far simpler than the actual Earth, and why we aren’t using empirical evidence (as it accumulates) to reject failing models and concentrate on the ones that come closest to working, while also not using the models that are obviously not working in any sort of “average” claim for future warming. Maybe they could hire themselves a Bayesian or two and get them to recompute the AR curves, I dunno.

It would take me, in my comparative ignorance, around five minutes to throw out all but the best 10% of the GCMs (which are still diverging from the empirical data, but arguably are well within the expected fluctuation range on the DATA side), sort the remainder into top-half models that should probably be kept around and possibly improved, and bottom half models whose continued use I would defund as a waste of time. That wouldn’t make them actually disappear, of course, only mothball them. If the future climate ever magically popped back up to agree with them, it is a matter of a few seconds to retrieve them from the archives and put them back into use.

Of course if one does this, the GCM predicted climate sensitivity plunges from the totally statistically fraudulent 2.5 C/century to a far more plausible and stillpossibly wrong ~1 C/century, which — surprise — more or less continues the post-LIA warming trend with a small possible anthropogenic contribution. This large a change would bring out pitchforks and torches as people realize just how badly they’ve been used by a small group of scientists and politicians, how much they are the victims of indefensible abuse of statistics to average in the terrible with the merely poor as if they are all equally likely to be true with randomly distributed differences.

rgb

Subscribe
Notify of
KitemanSA

Might it be a valid mean of random stupidity?

Ian W

An excellent post – it would be assisted if it had Viscount Monckton’s and Roy Spencer’s graphs displayed with references.

This assertion (wrong GCM’s should be ignored not averaged) is so clearly explained and justified, I am amazed no statistician made the point earlier, like sometime in the last 10 years as the climate change hysteria became so detached from reality, as all bad weather is now blamed on climate change.

OK S.

The Bishop has some something to say regarding this comment over at his place:
http://bishophill.squarespace.com/blog/2013/6/14/on-the-meaning-of-ensemble-means.html

PaulH

The ensemble average of a Messerschmidt is still a Messerschmidt. :->

What I’m trying to say is that the variance and mean of the “ensemble” of models is completely meaningless, statistically
Indeed. At best, the outputs of climate models are the opinions of climate modellers numerically quantified.
As such, I’d argue the variance, is direct evidence the claimed consensus is weak.

mark

damn.
just damn.

Pat Frank

rgb: “a small group of scientists and politicians,
It’s not a small group. It’s a large group.
Among US scientists, it’s the entire official institutional hierarchy, from the NAS, through the APS, the ACS to the AGU and the AMS. Among politicians, it’s virtually the entire set of Democratic electees, and probably a fair fraction of the Republican set, too.
And let’s not forget the individual scientists who have lied consistently for years. None of this would be happening without their conscious elevation of environmental ideology over scientific integrity. Further, none of this would be happening if the APS, etc., actually did due diligence on climate science claims, before endorsing them. The APS analysis, in particular, is pathetic to the point of incompetent.
And all of this has been facilitated by a press that has looked to their political prejudices to decide which group is telling the truth about climate. The press has overlooked and forgiven obvious shenanigans of climate scientists (e.g., Climategate I&II, back to 1400 CENSORED, the obvious pseudo-investigatory whitewashes, etc.) the way believers hold fast to belief despite the grotesqueries of their reverends. It’s been a large-scale failure all around; a worse abuse of science has never occurred, nor a worse failure by the press.

Lets face it, Ensemble Means were brought to us by the same idiots who thought multi-proxy averaging was a legitimate way to reduce the uncertainty of temporally uncertain temperature proxies.

Pat Frank

By the way, my 2008 Skeptic article provides an analysis of GCM systematic error, and shows that their projections are physically meaningless.
I’ve updated that analysis to the CMIP5 models, and have written up a manuscript for publication. The CMIP5 set are no better than the AMIP1 set. They are predictively useless.

tz2026

Well put. In great detail too.

k scott denison

Brilliant, thank you. Can’t wait to see the defenders of the faith stop by to tell us, once again, “but, but, but they’re the best we have!!!” Mosher comes to mind.
That the best we have are all no good never seems to cross some people’s minds. Dr. Brown, the simplicity of your advice to ask the key questions about the models is greatly appreciated.

MaxL

I have been doing operational weather forecasting for several decades. The weather models are certainly a mainstay of our business. We generally look at several different models to gain a feel for what may occur. These include the Canadian, American and European models. They all have slightly differing physics and numerical methods. All too often the models show quite different scenarios, especially after about 48 hours. So what does one do? I have found through the years that taking the mean (ie. ensemble mean of different models) very seldom results in the correct forecast. It is usually the case that one of the models produces the best result. But which one is the trick. And you never now beforehand. So you choose what you think is the most reasonable model forecast, bearing in mind what could happen given the other model output. And just because one model was superior in one case does not mean it will be the best in the next case.

Eeyore Rifkin

“At best, the outputs of climate models are the opinions of climate modellers numerically quantified.”
Agreed, but I don’t believe that’s meaningless, statistically or otherwise.
“As such, I’d argue the variance, is direct evidence the claimed consensus is weak.”
I think the magnitude of the variance depends on the scale one uses. Pull back far enough and it looks like a strong “consensus” to exaggerate.

Greg L.

I have mostly stayed out of the fray, as most of the arguing over runaway anthropogenic global warming has for a good bit of time looked to me as far more religious than scientific on all sides. Having said that, and as a professional statistician (who possesses graduate degrees in both statistics and meteorology), I finally have seen a discussion worth wading into.
The post given here makes good sense, but I want to add a caution to the interpretation of it. Saying that making a judgement about an ensemble (i.e., a collection of forecasts from a set of models and their dispersion statistics) has no scientific/statistical validity does not mean that such a collection has no forecast utility. Rather, it means that one cannot make a statement about the validity of any individual model contained within the set based upon the performance of the ensemble statistics versus some reference verification. And this is exactly the point. We are a long way from the scientific method here – the idea that that an experimental hypothesis can be verified/falsified/replicated through controlled experiments. We are not going to be able to do that with most integrated atmospheric phenomena as there simply is no collection of parallel earths available upon which to try different experiments. Not only that, but the must basic forms of the equations that (we think) govern atmospheric behavior are at best unsolvable, and in a number of cases unproven. Has anyone seen a proof of the full Navier-Stokes equations? Are even some of the simplest solutions of these equations solvable (see, for example, the solution to the simplest possible convection problem in Kerry Emmanuel’s Atmospheric Convection text – it is an eight order differential equation with a transcendental solution). And yet we see much discussion on proving or validating GCM’s – which have at best crude approximations to many governing equations, do not include all feedbacks (and may even have the sign wrong of some that they do include), are attempting to model a system that is extremely nonlinear …
Given this, I actually don’t think the statement in this post goes far enough. Even reducing the set of models to the 10% or so that have the least error does not tell one anything. We cannot even make a statement about a model that correlates 99% with reality as we do not know if it has gotten things “right” for the right reasons. Is such a model more likely to be right? Probably. But is it? Who knows. And anyone who has ever tried to fit a complicated model to reality and watch the out-of-sample observations fail knows quickly just how bad selection bias can be. For example, the field of finance and forecasting financial markets is saturated with such failures – and such failures involve a system far less complicated than the atmosphere/ocean system.
On the flip side – this post does not invalidate using ensemble forecasts for the sake of increasing forecast utility. An ensemble forecast can improve forecast accuracy provided the following assumptions hold – namely, that the distribution of results is bounded, the errors of the members are not systematically biased, and that the forecast errors of the members are at least somewhat uncorrelated. Such requirements do not mean whatsoever that the member models use the same physical assumptions and simplifications. But once again – this is a forecast issue – not a question of validation of the individual members. And moreover, in the case of GCM’s within an ensemble, the presence of systematic bias is likely – if for no other reason than the unfortunate effects of publication bias, research funding survivorship (e.g, those who show more extreme results credibly may tend to get funding more easily), and the unconscious tendency of humans that fit models with way too many parameters to make judgement calls that causes the model results to look like what “they should be”.

Chuck Nolan

I believe you’re correct.
I’m not smart enough to know if what you are saying is true, but I like your logic.
Posting this on WUWT tells me you are not afraid of critique.
Everyone knows nobody gets away with bad science or math here.
My guess is the bad models are kept because it’s taxpayer money and there is no need for stewardship so they just keep giving them the money.
cn

Abe

WINNER!!!!!
The vast majority of what you said went WAY over my head, but the notion of averaging models for stats as if they were actual data being totally wrong I totally agree. I think looking at it in that light says a lot about the many climate alarmists who continue to use their model outputs as if they were actual collected data and ignore or dismiss real empirical data.

All the climate models were wrong. Every one of them. [Click in chart to embiggen]
You cannot average a lot of wrong models together and get a correct answer.

Eeyore Rifkin says:
June 18, 2013 at 6:01 pm

I agree with both your points.
I was agreeing with rgb’s statements in relation to the actual climate. Whereas my points related to the psychology/sociology of climate scientists, where the model outputs can be considered data for statistical purposes. And you may well be right that those outputs are evidence of collective exaggeration, or a culture of exaggeration.

Can someone send enough money to RGB to get him to do the 10 minutes of work, and the extra work to publish a model scorecard and ranking for all to see. Like in golf or tennis. At the BH blog someone pointed out that some models are good for temperature, others for for precipitation. So there could be a couple of ranking list. But keep it simple.

Nick Stokes

This is reflected in the graphs Monckton publishes above, where the AR5 trend line is the average over all of these models and in spite of the number of contributors the variance of the models is huge. It is also clearly evident if one publishes a “spaghetti graph” of the individual model projections (as Roy Spencer recently did in another thread) — it looks like the frayed end of a rope, not like a coherent spread around some physics supported result.
Note the implicit swindle in this graph — by forming a mean and standard deviation over model projections and then using the mean as a “most likely” projection and the variance as representative of the range of the error, one is treating the differences between the models as if they are uncorrelated random variates causing >deviation around a true mean!.
Say what?
This is such a horrendous abuse of statistics that it is difficult to know how to begin to address it. One simply wishes to bitch-slap whoever it was that assembled the graph and ensure that they never work or publish in the field of science or statistics ever again. One cannot generate an ensemble of independent and identically distributed models that have different code. One might, possibly, generate a single model that generates an ensemble of predictions by using uniform deviates (random numbers) to seed
“noise” (representing uncertainty) in the inputs.
What I’m trying to say is that the variance and mean of the “ensemble” of models is completely meaningless, statistically because the inputs do not possess the most basic properties required for a meaningful interpretation.”

As I said on the other thread, what is lacking here is a proper reference. Who does this? Where? “Whoever it was that assembled the graph” is actually Lord Monckton. But I don’t think even that graph has most of these sins, and certainly the AR5 graph cited with it does not.
Where in the AR5 do they make use of ‘the variance and mean of the “ensemble” of models’?

The idiocy of averaging “the terrible with the merely poor.” Nice.

edcaryl

Averaging climate models is analogous to averaging religions, with about the same validity.

Pamela Gray

That was like eating a steak. Every bite was meaty!

Mark Bofill

~applause~
Very well said, Dr. Brown!

Bill Illis

Here are the 23 models used in the IPCC AR4 report versus Hadcrut4 – 1900 to 2100 – Scenario A1B, the track we are on.
This is the average of each model although the majority will have up to 3 different runs.
In the hindcast period, 1900-2005, they are closer to each other and the actual temperature record but as we go out into the future forecast, there is a wide divergence.
Technically, only one model is lower than Hadcrut4 at the current time. The highest sensitivity model is now 0.65C higher than Hadcrut4, only 7 years after submitting their forecast.
Spaghetti par excelencia.

An interesting comment…
http://www.thegwpf.org/ross-mckitrick-climate-models-fail-reality-test/
Perhaps the problem is that the models should not be averaged together, but should be examined one by one and then in every possible combination, with and without the socioeconomic data, in case some model somewhere has some explanatory power under just the right testing scenario. That is what another coauthor and I looked at in the recently completed study I mentioned above. It will be published shortly in a high-quality climatology journal, and I will be writing about our findings in more detail. There will be no surprises for those who have followed the discussion to this point.
Ross McKitrick is a professor of economics at the University of Guelph, a member of the GWPF’s Academic Advisory Council and an expert reviewer for the Intergovernmental Panel on Climate Change. Citations available at rossmckitrick.com.

I have seen other comments from Ross that echo your concern, but they may not have been published. He can speak for himself.

Nick Stokes asks what scientists are talking about ensemble means. AR5 is packed to gills with these references. Searching a few chapters for “ensemble mean” I find that chapter 10 on attribution has eleven references, chapter 11 on near-term projections has 42 references, and chapter 12 on long term projections has fifteen references.

TRBixler

So the smartest guy in the room Obama says we need to reduce our carbon footprint based on meaningless climate models. Stop the pipelines kill the coal. Let energy darkness reign over the free world. Where were the academics on these subjects? Waiting for Anthony Watts it seems.

just some guy

Oh my goodness most of that post went way over my head. Except for the part about the spaghettit graph looking like the end of a frayed rope, and what that says about the the accuracy of “climate science”. That is something even I can understand. 😀

Brilliant. Cutting through the heavy Stats, the respect warmists pay to their own output is simply another way of saying “Given that we’re right…”

Looks like Nick Stokes has never read AR5. Looks like he shoots from the hip. I wonder why he comments.

Niff

What I find interesting is that so many models (of the same thing) saying so many different things all get funded. If they are ALL so far from measured reality why is the funding continuing? It is easier to set this stuff up than to admit its all nonsense and courageously dismantle it. Does anyone see ANY signs of that courage anywhere?

Nick Stokes

Alec Rawls says: June 18, 2013 at 6:35 pm
“Nick Stokes asks what scientists are talking about ensemble means. AR5 is packed to gills with these references.”

But are they means of a controlled collection of runs from the same program? That’s different. I’m just asking for something really basic here. What are we talking about? Context? Where? Who? What did they say?

Tsk Tsk

Brown raises a potentially valid point about the statistical analysis of the ensemble, but his carbon atom comparison risks venturing into strawman territory. If he’s claiming that much of the variance amongst the models is driven by the actual sophistication of the physics that each incorporates, then he should provide a bit more evidence to support that conclusion. He could be right, but this is really just a he said/she said argument at this point. Granted this was just a comment and not meant to be a position paper, but understand the weakness.
“…and we can just compare reality to the models. We can then sort out the models by putting (say) all but the top five or so into a “failed” bin and stop including them in any sort of analysis or policy decisioning whatsoever unless or until they start to actually agree with reality.”
“One cannot naively apply a criterion like rejection if p < 0.05, and all that means under the best of circumstances is that the current observations are improbable given the null hypothesis at 19 to 1. People win and lose bets at this level all the time. One time in 20, in fact. We make a lot of bets!"

I presume Dr. Brown is content with the statistical methods used at CERN for the Higgs. Is it simply the case that a p-value of 0.05 is too pedestrian for us? Must it be 1e-5, or 1e-7 to be meaningful? We’re going to be waiting an awfully long time to invalidate even the worst of the models at that threshold (and we’ll have a lot fewer new medical treatments in the meantime…) So we shouldn’t use statistical tests on the validity of the models, but we should pick the “best” models from the set and continue using them. Precisely how do we determine just which of the models is the best? The outputs of the models aren’t just a single point or line. The results of each model are themselves the means of the runs. If we don’t use the modeled mean for each model, then what do we use? I agree with some of the points of the post but this one is just bizarre.
Me? I’m happy to say that the ensemble fails in a statistically significant way. There’s really not much that the CAGW crowd has to respond with other than Chicken Little it’s-in-the-pipeline campfire tales.

OssQss

Is it really about climate,,,,,,, or is it really the politics and funding of such via ideology?
Some folks are figuring it out.
An example of such change for your interpretation.
Just sayin> I was a bit taken back when I saw this video. . . . . . .
Change You Can Believe In ?

More proof that models are crap, even the most “honestly” attempted ones. And I’d go with cutting out the “10 percent best” along with the rest, as Greg L says – they’re all founded on incomplete or fudged data and bad assumptions.
Since the alarmies rely entirely on their models, we need to get the focus off the models and confine it to the empirical data. Forget models – with empirical data we should be able toi kick the alarmies in their sphincters, methinks.

VACornell

There is a study….just now….of twenty models…that says they are getting close.. Have you seen? Are w to be lucky enough …?
Sent from my iPad Vern Cornell

george e. smith

“””””….. although of course it has always been approximately smooth so one can always do a Taylor series expansion in some sufficiently small interval and get a linear term that — by the nature of Taylor series fits to nonlinear functions — is guaranteed to fail if extrapolated as higher order nonlinear terms kick in and ultimately dominate?…..”””””
I love this message: – A linear fit to a non-linear function, will fail when higher order terms kick in.
Of course, that statement is also true in reverse. A non-linear function can always look linear over some restricted range.
In particular, the three functions y = ln (1+x) ; y = x ; and y = e^x – 1 track each other very well for small x compared to 1.
The best longest atmospheric CO2 record, from Mauna Loa, since the IGY of 1957/58 related to the lower troposphere or global surface Temperature record, cannot distinguish between those three mathematical models, sufficiently to say any one of the three is better than another.
Yet some folks keep on insisting, that the first one:- y = ln(1+x) is correct.
Maybe so; but is T x or is it y ??
But a great call on the rgb elevation to the peerage, Anthony.

Max

VERY well said.

pottereaton

Dr. Brown wrote: “. . . it looks like the frayed end of a rope, not like a coherent spread around some physics supported result.”
Re the “frayed end of a rope:” while RGB is talking about model projections, this works even better as a general metaphor for the whole of climate science.
It could also be a metaphor for what IPCC-sanctioned scientists have done to the scientific method, which has, as a rope been frayed at times down through history, but has never become completely unraveled.
Is climate science at the end of its rope?

thingodonta

The spaghetti graph models are just a façade to accommodate to the masses that the scientists supposedly recognise variability and uncertainty in the climate. They really would much rather just draw a straight curve upward, but they know they can’t. You could get the same set of spaghetti graphs attached to any Soviet era 5 year plan. But any model which doesn’t conform to the underlying assumption of high climate sensitivity to C02, or the enormous benefits coming from depriving kulaks of the ownership of their land, is routinely filtered out to begin with. The output is checked to make sure it conforms to the party line. Which I guess is the same thing as saying all the above.

Here below is a copy of an Email I recently sent to the Head of the Met Office which makes the same point as Robert Brown re model studies but I think ,with all due modesty, more simply for general consumption.In view of Obama’s stated intention to force the US to adopt emission control measures in the near fututure and follow the appalling policies of the British the realist community urgently needs to devise some means of forcing the administration to immediately face up to the total collapse of the science behind the CAGW meme which is now taking place.
“Dear Professor Belcher
There has been no net warming since 1997 with CO2 up over 8%, The warming trend peaked in about 2003 and the earth has been cooling slightly for the last 10 years . This cooling will last for at least 20 years and perhaps for hundreds of years beyond that.. The Met office and IPCC climate models and all the impact studies depending on them are totally useless because they are incorrectly structured. The models are founded on two irrationally absurd assumptions.First that CO2 is the main driver – when CO2 follows temperature .The cause does not follow the effect. Second piling stupidity on irrationality the models add the water vapour as a feed back to the CO2 in order to get a climate sensitivity of about 3 degrees. Water vapour follows temperature independently of CO2 and is the main GHG.
Furthermore apart from the specific problems in the Met- IPCC models ,models are inherently useless for predicting temperatures because of the difficulty of setting the initial parameters with sufficient precision.Why you think you can iterate more than a couple of weeks ahead is beyond my comprehension.After all you gave up on seasonal forecasts.
For a discussion of the right way to approach forecasting see
http://climatesense-norpag.blogspot.com/2013/05/climate-forecasting-basics-for-britains.html
and several other pertinent posts also on http://climatesense-norpag.blogspot.com.
Here is a summary of the conclusions.
“It is not a great stretch of the imagination to propose that the 20th century warming peaked in about 2003 and that that peak was a peak in both the 60 year and 1000 year cycles.On that basis the conclusions of the post referred to above were as follows.
1 Significant temperature drop at about 2016-17
2 Possible unusual cold snap 2021-22
3 Built in cooling trend until at least 2024
4 Temperature Hadsst3 moving average anomaly 2035 – 0.15
5Temperature Hadsst3 moving average anomaly 2100 – 0.5
6 General Conclusion – by 2100 all the 20th century temperature rise will have been reversed,
7 By 2650 earth could possibly be back to the depths of the little ice age.
8 The effect of increasing CO2 emissions will be minor but beneficial – they may slightly ameliorate the forecast cooling and help maintain crop yields .
9 Warning !! There are some signs in the Livingston and Penn Solar data that a sudden drop to the Maunder
Minimum Little Ice Age temperatures could be imminent – with a much more rapid and economically disruptive cooling than that forecast above which may turn out to be a best case scenario.
For a dicussion of the effects of cooling on future weather patterns see the 30 year Climate Forecast 2 Year update at
http://climatesense-norpag.blogspot.com/2012/07/30-year-climate-forecast-2-year-update.html
How confident should one be in these above predictions? The pattern method doesn’t lend itself easily to statistical measures. However statistical calculations only provide an apparent rigour for the uninitiated and in relation to the climate models are entirely misleading because they make no allowance for the structural uncertainties in the model set up.This is where scientific judgement comes in – some people are better at pattern recognition than others.A past record of successful forecasting is a useful but not infallible measure. In this case I am reasonably sure – say 65/35 for about 20 years ahead. Beyond that, inevitably ,certainty drops.”
It is way past time for someone in the British scientific establishment to forthrightly say to the government that the whole CO2 scare is based on a mass delusion and try to stop Britain’s lunatic efforts to control climate by installing windmills.
As an expat Brit I watch with fascinated horror as y’all head lemming like over a cliff. I would be very happy to consult for the Met on this matter- you certainly need to hear a forthright skeptic presentation to reconnect with reality.

Jeef

That. Is. Brilliant.
Thank you.

Here is my way of saying the same thing;
Each of the models contains somewhat different physics. The basic physical laws that the models try to capture are the same, but the way the models try to incorporate those laws differs. This is because the climate system is hugely more complicated than the largest computers can capture, so some phenomena have to be parameterized. These result in the different computed climatic reactions to the “forcing” of the climate by humanity and the natural drivers that are incorporated in the forecasts. When you look at the graph it is clear that some models do quite a bit better than the bulk of the ensemble. In fact, several models cannot be distinguished from reality by the usual statistical criteria.
You know that what is done in science is to throw out the models that don’t fit the data and keep for now the ones that seem to be consistent with the observational data. In this way you can learn why it is that some models do better than others and make progress in understanding the dynamics of the system. This is research.
But this is not what is done. What you read in IPCC reports is that all the models are lumped into a statistical ensemble, as if each model were trying to “measure” the same thing, and all variations are a kind of noise, not different physics. This generates the solid black line as a mean of the ensemble and a large envelope of uncertainty. The climate sensitivity and its range of uncertainty contained in the reports are obtained in this way. This enables all of the models to keep their status. But ultimately as the trend continues, it becomes obvious that the ensemble and its envelope are emerging out of the observational “signal.” This is where we are now.

SAMURAI

Fundamentally, CAGW’s purpose isn’t to explain phenomena, it’s to scare taxpayers.
It’s now painfully obvious the runaway positive feedback loops baked into the climate models to create the scary Warmaggedon death spiral is bogus.
We’re now into the 18th year of no statistically significant warming trend–despite 1/3rd of all manmade CO2 emissions since 1750 being made over the last 18 years–which is a statistically significant time period to say with over 90% confidence that CAGW theory is disconfirmed.
Taxpayer-funded climatologists that advocate CAGW theory should be ecstatic that CO2’s climate sensitivity is likely to be around 1C or lower, instead they make the absurd claim, “it’s worse than we thought.” Yeah….right…
To keep the CAGW hoax alive, CAGW zealots: came up with HADCRUT4, continue to use invalidated high model projections to make the line go up faster and higher, “adjust” previous temperature databases down and current temperature databases up to maximize temperature anomalies, blame any one-off weather event (including cold events) on CAGW, push that its warmER and conveniently forget it’s not warmING and push CO2-induced ocean “acidification” instead of non-existent warming.
It’s the beginning of the end for this hoax. Economic hardship (partially attributable to \$trillions wasted on CAGW rules, regulation, mandates, taxes, alternative energy subsidies/projects and grants) and the growing discrepancy between projections vs empirical data will eventually lead to CAGW’s demise.
Time heals all; including stupidity. The question is whether politicians, taxpayers and scientists will learn from history or be doomed to repeat it.

Janice Moore

OssQss (what in the world does your name mean, anyway?)
THAT WAS MAGNIFICENT.
And deeply moving.
Through that 4×6 inch video window, we are looking at the future.
And it looks bright!
Thanks, so much, for sharing the big picture. In a highly technical post such as this, that was refreshing and, I daresay, needed (if for just a moment).
GO, ELBERT LEE GUILLORY!

Col A (Aus)

It would appear from above that using the averages are about as correct as a concensous?
No, I mean the correct averages are a consensous of the mean deviations?
NO, NO, I mean my mean ment to be averaged but got concensoured!!
OR was that my concensous was ment to mean my averagers that I can not apply!!!
Bugger, where is Al Gore, Tim Flim Flam of Mr Mann when you really need them???? 🙂 🙂

Jordan J. Phillips

I would have never imagined that electronic structure calculations would be discussed in a thread, but here it is!

OssQss

Janice Moore says:
June 18, 2013 at 7:48 pm
OssQss (what in the world does your name mean, anyway?)
Well, I am always honest with all that I do. Sooooooo>
That handle is a direct result of trying several email addresses in the early 90’s, and being unsuccessful (with AOL), and basically flailing the keyboard, and well, there ya have it.
No real acronym to it, but I can think of some, but not many 🙂