Statistical proof of 'the pause' – Overestimated global warming over the past 20 years

Commentary from Nature Climate Change, by John C. Fyfe, Nathan P. Gillett, & Francis W. Zwiers

Recent observed global warming is significantly less than that simulated by climate models. This difference might be explained by some combination of errors in external forcing, model response and internal climate variability.

Global mean surface temperature over the past 20 years (1993–2012) rose at a rate of 0.14 ± 0.06 °C per decade (95% confidence interval)1. This rate of warming is significantly slower than that simulated by the climate models participating in Phase 5 of the Coupled Model Intercomparison Project (CMIP5). To illustrate this, we considered trends in global mean surface temperature computed from 117 simulations of the climate by 37 CMIP5

models (see Supplementary Information).

These models generally simulate natural variability — including that associated

with the El Niño–Southern Oscillation and explosive volcanic eruptions — as

well as estimate the combined response of climate to changes in greenhouse gas

concentrations, aerosol abundance (of sulphate, black carbon and organic carbon,

for example), ozone concentrations (tropospheric and stratospheric), land

use (for example, deforestation) and solar variability. By averaging simulated

temperatures only at locations where corresponding observations exist, we find

an average simulated rise in global mean surface temperature of 0.30 ± 0.02 °C

per decade (using 95% confidence intervals on the model average). The

observed rate of warming given above is less than half of this simulated rate, and

only a few simulations provide warming trends within the range of observational

uncertainty (Fig. 1a).

Ffe_figure1

Figure 1 | Trends in global mean surface temperature. a, 1993–2012. b, 1998–2012. Histograms of observed trends (red hatching) are from 100 reconstructions of the HadCRUT4 dataset1. Histograms of model trends (grey bars) are based on 117 simulations of the models, and black curves are smoothed versions of the model trends. The ranges of observed trends reflect observational uncertainty, whereas the ranges of model trends reflect forcing uncertainty, as well as differences in individual model responses to external forcings and uncertainty arising from internal climate variability.

The inconsistency between observed and simulated global warming is even more

striking for temperature trends computed over the past fifteen years (1998–2012).

For this period, the observed trend of 0.05 ± 0.08 °C per decade is more than four

times smaller than the average simulated trend of 0.21 ± 0.03 °C per decade (Fig. 1b).

It is worth noting that the observed trend over this period — not significantly

different from zero — suggests a temporary ‘hiatus’ in global warming. The

divergence between observed and CMIP5-simulated global warming begins in the

early 1990s, as can be seen when comparing observed and simulated running trends

from 1970–2012 (Fig. 2a and 2b for 20-year and 15-year running trends, respectively).

The evidence, therefore, indicates that the current generation of climate models

(when run as a group, with the CMIP5 prescribed forcings) do not reproduce

the observed global warming over the past 20 years, or the slowdown in global

warming over the past fifteen years.

This interpretation is supported by statistical tests of the null hypothesis that the

observed and model mean trends are equal, assuming that either: (1) the models are

exchangeable with each other (that is, the ‘truth plus error’ view); or (2) the models

are exchangeable with each other and with the observations (see Supplementary

Information).

Brief: http://www.pacificclimate.org/sites/default/files/publications/pcic_science_brief_FGZ.pdf

Paper at NCC: http://www.nature.com/nclimate/journal/v3/n9/full/nclimate1972.html?WT.ec_id=NCLIMATE-201309

Supplementary Information (241 KB) CMIP5 Models
Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
348 Comments
Inline Feedbacks
View all comments
BBould
September 6, 2013 7:30 am

Richardscourtney: Thanks for taking the time to explain the paper I brought up, its truly appreciated. This is the reason I started looking into it, post (not addressed to me) from realclimate – “Read up on Quine and the issue of auxiliary hypotheses. In practice, all theories are ‘wrong’ (as they are imperfect models of reality), and all tests involve multiple hypotheses. Judging which one (or more) are falsified by a mismatch is non-trivial. I have no problem agreeing that mismatches should be addressed, but wait for the post. – gavin]”
Hopefully this will help explain my interest.

BBould
September 6, 2013 7:33 am

Pamela Gray: Thanks you made me think of another question. Does anyone study how much energy the ocean loses at night? I know my swimming pool warms and cools much slower than the surrounding air but its always much cooler at dawn.

rgbatduke
September 6, 2013 7:45 am

This might be good work but it is not part-and-parcel of Quine’s work. Seems to me that it just takes Quine’s logic and adds to it. To those who pursued probabilities for various hypotheses, he remarked that he had no interest in colored marbles in an urn.
Sacrilege! Polya would be turning in his grave! Taleb, on the other hand, might not (partly because “he’s not dead yet!”:-). As his character “Joe the cab driver” (IIRC) in The Black Swan” might say to an analysis of the data above, “It’s a mugs game”. If you flip a two-sided coin 20 times and get heads every time, only an idiot would apply naive probability theory with the assumption of an unbiased coin and claim that the probability of the next flip being heads is 0.5. A Bayesian, however firmly they might have believed it was an unbiased coin initially, would systematically adjust the prior estimate of 0.5 until the maximum likelihood of the outcome coincides with the data and at this point would be deeply suspicious that the coin actually had two heads, that the coin was a magical coin, that the coin was so amazingly weighted and carefully flipped that it had p_head -> 0.999999, that it’s all a horrible dream and there is no real coin. A mugs game.
At this point, I’ll simply amplify what I said above in two ways. First, one can, with an enormous amount of effort, attempt an actual statistical analysis of a (essentially meaningless) composite hypothesis, but nothing of the sort has been attempted for CIMP5 in part because doing so would be spectacularly difficult — right up there with the difficulty of the problem that the GCMs are attempting to solve (which is already one of the most difficult computational problems humans have attempted to solve). The difficulty arises because the theories are highly multivariate and have an abundance of assumptions. Every assumption in every model is subject to Bayes theorem as a Bayesian prior! That is, when one assigns a specific functional form to the radiation profile of the atmosphere with various CO_2 levels, one is — since we cannot precisely compute this and are forced to use one of several approximations (see e.g. Petty’s book) we have to statistically weight the probability that those approximations are correct and downgrade the certainty of our results (our eventual error estimate) accordingly.
To put it more formally, the assertion is that
If all the assumptions made constructing the computation are correct, then the model predicts thus and such. However, the assumptions are not certain, and the best estimate of the probability that the model prediction is correct is strictly decreased according to their uncertainty. This can be summarized in the aggregate assumption that “the internal, highly nonlinear, dynamically chaotic differential equations solved by the computer code is correct and sufficiently insensitive to the range of possible error in the Bayesian priors that the output is meaningful in all dimensions (because the code doesn’t just predict temperature, it predicts lots of other things about the future climate as well). Analyzing this precisely for a single theory is enormously difficult, which is why most of the models resort to Monte Carlo to attempt to measure it instead of theoretically predict it. But there are further assumptions built into e.g. the ranges explored by the Monte Carlo itself (more Bayesian priors), into the selection of input variables itself (hard to “Monte Carlo” the omission of a variable that in fact is important), into the granularity and geometry selected (again, difficult to Monte Carlo as the codes may not be written to be length-scale renormalizable in an unbiased way), and one cannot escape the fundamental assumption “this code will correctly predict the multidimensional future in a consistent way and within a useful precision” no matter what you do.
That is the basis for the ultimate null hypothesis, per model. In the end, the model produces an ensemble of results that supposedly span the range of model uncertainty given the priors stated and unstated that went into its construction. If reality fails to lie well within that range in any significant dimension/projection, the model should be considered suspect (weak failure) or overtly fail a hypothesis test depending on how badly reality fails to live within that range.
Imagine attempting to extend this process collectively for all the models in CIMP5! How can one rigorously assess whether approximation A or approximation B for e.g the radiative properties of CO_2 in the atmosphere is most probably correct when the answer could be both are adequate as the basis of a correct theory if everything else is done correctly or neither of them will work because a correct (predictive) theory requires an exact treatment of CO_2’s radiative properties. And then, of course, there are the rest of the greenhouse gases, the non-greenhouse atmosphere, water vapor, clouds, the ocean, aerosols, soot and other particulates, the extent and effect of the biosphere — it is difficult to even count the underlying assumptions that are built into each model and not all of them are in all of the models.
So how can one frame the null hypothesis for CIMP5? “Somewhere in the collection of contributing GCMs is a model that is a reliable predictor of the actual climate”, so that we can then assess a probability of getting the current result if that is true? No, that won’t work. The implicit null hypothesis in the figures above and used by the IPCC in the assessment reports is that “The mean of the collection of models in CIMP5 is a reliable predictor of the actual climate, and the standard deviation of the distribution of results is a measure of the probability of the actual climate arising from the common initial condition given this correct computation of its time evolution”. Which is nonsense, unsupported by statistical theory indeed (as I argue above) unsupportable by statistical theory, and at the end of the day, all that the figure above really demonstrates is that the GCMs are very likely not independent and unbiased in their underlying assumptions because they do produce a creditable Gaussian (with the wrong mean, but seriously, this is enormously unlikely in a single projective variable obtained by a collection of supposedly independent models that otherwise significantly differ in their predictions of e.g. rainfall).
IMO the one conclusion that is immediately justified by the distribution of CIMP5 results is that the GCMs are enormously incestuous, sharing whole blocks of common assumptions, and that at least one of those common assumptions is badly incorrect and completely unexplored by the Monte Carlo perturbations of initial conditions in the individual models. If one performed a similar study of the projective distribution of results in other dimensions one might even gain some insight into just what shared assumptions are most suspect, but that would require a systematic deconstruction of all of the models and code and some sort of gross partitioning of the shared and different features — an awesomely complex task.
The second amplification is a simple observation that I’ll make on the process of highly multivariate predictive modeling itself (wherein I’m moderately expert). There are two basic kinds of multivariate predictive models. An a priori model assumes that the relevant theory is correctly known and attempts to actually compute the result using that theory, using tools like Monte Carlo to assess the uncertainty in the computed outcome where one can (as noted above, one cannot assess the uncertainty linked to some of the prior assumptions implicit in the implementation by Monte Carlo or any other objective way as there is no “assumption space” to sample and nonlinear chaotic dynamics can amplify even small errors into huge end stage differences (see e.g. the “stiffness” of a system of coupled ODEs and chaos theory, although the amplification can easily be significant even for well-behaved models).
In order to set up the Monte Carlo, one has to assign values and uncertainty ranges to the many variable parameters that the model relies upon. This is typically done by training the model — using it to compute a known sequence of outcomes from a known initialization and tweaking things until there is a good correspondence between the actual data and the model-produced data. The tweaking process typically at least “can” provide a fair amount of information about the sensitivity of the model results to these assumptions and hence give one a reasonable knowledge of the expected range of errors predicting the future. One then applies the model to a trial set of data that (if one is wise) one has reserved from the training data to see if the model continues to work. This second stage “validates” the model within the prior assumption “the training and trial set are independent and span the set of important features in the underlying a priori assumed known dynamics”. Finally, of course, the validation process in science never ends.
It doesn’t matter a whit if your model perfectly captures the training data, nails the trial data square on the money, if the first time you compare it to new trial data from the real world it immediately goes to hell well outside of the probable range of outcomes you expected. If you are prospecting for oil with a predictive model, it doesn’t matter if your code can predict previously drilled wells 95% of the time if you only get one oil strike in 100 drilling attempts the first time you use it to direct oil exploration efforts. You are fired and go broke no matter how fervently you argue that your code is good and you are just unlucky. Ditto predicting the stock market, predicting pretty much anything of value. Science is even more merciless than commerce in this regard everywhere but climate science!. Classical physics was “validated” by experiment after experiment in pretty good agreement for a century after its discovery, but then an entire class of experiments could not be explained by classical a priori models. By “could not be explained”, I mean specifically that even if one built a hundred classical models, each slightly different, to e.g. try to predict electronic spectra or the outcome of an electron diffraction experiment, the mean of all of those distinct a priori models would never converge to or in the end have meaningful statistical overlap with the actual accumulating experimental data. They in fact did not, and a lot of effort was put into trying!
The problem, of course, was that a common shared assumption in the “independent” models was incorrect. Further, it was an assumption that one could never “sample” with mere Monte Carlo or any sort of presumed spanning of a space of relevant assumptions, because it was one of the a priori assumed known aspects of the computation that was incorrect, even though (at the time) it was supported by an enormous body of evidence involving objects as large as molecules or small clumps of molecules on up. We had to throw classical physics itself under the bus in order to make progress.
You’d think that we would learn from this sort of paradigm-shifting historical result not to repeat this sort of error, and in the general physics community I think the lesson has mostly been learned, as this sort of process occurs all the time in the delicate interplay between experimental physics and theoretical physics. In some sense we expect new experiments to overturn both great and small aspects of our theoretical understanding, which is why people continue to study e.g. neutrinos and look for Higgs bosons, because even the Standard Model in physics is not now and will never be settled science, at best it will continue to be consistent with observations made so far. A single (confirmed) transluminal neutrino or new heavy particle or darkon can ruin your whole theoretical day, and then it is back to the drawing board to try, try again.
In the meantime, this example shows the incredible stupidity of claiming that the centroid of the projection of a single variable from a collection of distinct priori models with numerous shared assumptions, many of which cannot be discretely tested or simulated, has any sort of statistically relevant connection to reality. Each model one at a time is subject to the usual process of falsification that all good science is based on. Collectively they do not have more weight, they have less. By stupidly including obviously failed models in the average, you without question pull the average away from the true expected behavior.
rgb

September 6, 2013 7:48 am

Richard Barraclough says:
September 6, 2013 at 1:15 am
Good to see a little etymological sparring in amongst the science.
Now, if only we could all distinguish between “its” and it’s”……..

=======================================================================
I’m getting better at it. Someone here (maybe you?) gave a tip a while back to keep them straight.
If it’s the possessive, treat it like “his” or “hers”, no apostrophe.

richardscourtney
September 6, 2013 7:50 am

BBould:
Thankyou for the acknowledgement in your post addressed to me at September 6, 2013 at 7:30 am.
I am grateful that you brought the paper to my attention because I was not aware of it despite its having been published so long ago (i.e. in 2001). And that lack of awareness is not surprising considering the serious flaws the paper contains and the limited – to the degree of being almost useless – conclusion it reaches. However, if RC and the like are intending to use that paper as an excuse for model failure then they really, really must be desperate!
In the light of why you say you raised the paper I now consider the trouble I had obtaining the paper was well worth it. If anybody attempts to excuse model failure by resurrecting that paper from obscurity then I can now refute the laughable attempt.
Thankyou.
Richard

richardscourtney
September 6, 2013 8:03 am

rgbatduke:
Thankyou for your brilliant post at September 6, 2013 at 7:45 am
http://wattsupwiththat.com/2013/09/05/statistical-proof-of-the-pause-overestimated-global-warming-over-the-past-20-years/#comment-1409484
I commend everyone interested in the subject of this thread to read, study and inwardly digest it.
And I ask you to please amend it into a form for submission to Anth0ny for him to have it as a WUWT article.
Richard

September 6, 2013 8:14 am

rgbatduke (Sept. 6, 2013):
To your list of shortcomings in the methodology of global warming research, you could have added that the general circulation models are insusceptible to validation because the events in the underlying statistical populations do not exist.

Pamela Gray
September 6, 2013 8:25 am

In model research, the question is, does the model adequately simulate the workings of the underlying statistical population. The null hypothesis would therefore be: There is no statistical difference between the model results and observations. Logically, it is thus susceptible to validation.

Reply to  Pamela Gray
September 6, 2013 11:27 am

Pamela Gray:
Your understanding of the meaning of “validation” is identical to mine. The populations underlying the general circulation models do not exist; thus, these models are insusceptible to being validated.
In the paper entitled “Spinning the Climate,” the long-time IPCC expert reviewer Vincent Gray reports that he once complained to IPCC management that the models were insusceptible to being validated yet the IPCC assessment reports were claiming they were validated. In tacit admission of Vincent’s claim, IPCC management established the policy of changing the word “validated” to the similar sounding word “evaluated.” Evaluation” is a process that can be conducted in lieu of the non-existent statistical population.” Confused by the similarity of the sounds that were made by the two word, many people continued to assume the models were validated.
To dupe people into thinking that similar sounding words with differing meanings are synonyms is an oft used technique on the part of the IPCC and affiliated climatologists. When words with differing meanings are treated as synonyms, each word in the word pair is polysemic (has more than one meaning). When either word in such a word-pair is used in making an argument and this word changes meaning in the midst of this argument, the argument is an example of an “equivocation.” By logical rule, one cannot draw a proper conclusion from an equivocation. To draw an IMPROPER conclusion is the equivocation fallacy. IPCC-affiliated climatologists use the equivocation fallacy extensively in leading dupes to false or unproved conclusions ( http://wmbriggs.com/blog/?p=7923 ) .

Pamela Gray
September 6, 2013 8:30 am

Solid surfaces lose heat more rapidly. Water loses heat more slowly. However, because Earth is more of a water planet than a land planet, it is an interesting question. My hunch is that heat belched up from the oceans become our land temperatures, which at night send that heat up and outa here! Especially under clear sky night conditions (strong radiative cooling).

September 6, 2013 9:23 am

richardscourtney says: September 6, 2013 at 6:35 am
Hello Richard,
To be clear, we are talking about one’s predictive track based on modeling.
Specifically, the GCMs cited by the IPCC greatly over-estimate the sensitivity of Earth’s climate to atmospheric CO2, and under-estimate the role of natural climate variability. This was obvious a decade ago from the inability of these models to hindcast the global cooling period that occurred from ~1945 to 1975, until they fabricated false aerosol data to force their models to conform. As a result of these fatal flaws, these “IPCC GCMs” have grossly over-predicted Earth’s temperature and have demonstrated NO PREDICTIVE SKILL – this is their dismal “predictive track record”..
The IPCC wholeheartedly endorsed this global warming alarmism and so did much of the climate science establishment. Anyone who disagreed was ridiculed as a “denier”, and due to the extremist position of the global warming camp, some leading academics were dismissed from their universities, some received death threats, and some suffered actual violence. The imbecilic, dishonest and thuggish behaviour of the global warming camp was further revealed in the Climategate emails.
Our conceptual model is based on very different input assumptions from the IPCC GCMs. We assumed, based on substantial evidence that was available a decade ago, that climate sensitivity to increased atmospheric CO2 is insignificant, and that natural variability was the primary characteristic of Earth’s climate. We further assumed, based on credible evidence, that solar variability was a significant driver of natural climate variability. Therefore, we wrote in 2002 that there was no global warming crisis, and the lack of warming for the past 10-15 years demonstrates this conclusion to be plausible.
We further wrote in 2002 that global cooling to start by 2020-2030, and it remains to be seen whether this will prove correct or not – but warming has ceased for a significant time, and I suggest that global temperatures are at a plateau and are about to decline. We did not predict the severity of this global cooling trend, but if the solar driver hypo holds, then cooling could be severe. This we do not know, but we do know from history that global cooling is a much greater threat to humanity than (alleged) global warming.
Regards, Allan

richardscourtney
September 6, 2013 9:39 am

Allan MacRae:
re your post at September 6, 2013 at 9:23 am.
Allan, you begin your post by saying to me, “To be clear …”.
To be clear, yes, I agree.
Richard

September 6, 2013 10:48 am

Terry Oldberg says:
September 6, 2013 at 8:14 am
rgbatduke (Sept. 6, 2013):
To your list of shortcomings in the methodology of global warming research, you could have added that the general circulation models are insusceptible to validation because the events in the underlying statistical populations do not exist.

====================================================================
Mr. layman here. To me it sounds like you just said, “The models can’t be wrong because the models say they are right.”
If that is not what you meant would you please explain in layman’s terms?
(Feel free to insult me if you wish as long as you explain.)

richardscourtney
September 6, 2013 11:01 am

Gunga Din:
re your post at September 6, 2013 at 10:48 am.
Can you see that disc of light behind you?
You have entered Alice’s rabbit hole and that disc is where you entered.
It is light from the outside. Enjoy it while you can. You may never see it again.
Richard

Aphan
Reply to  richardscourtney
September 6, 2013 12:41 pm

richardscourtney:
“Can you see that disc of light behind you?
You have entered Alice’s rabbit hole and that disc is where you entered.
It is light from the outside. Enjoy it while you can. You may never see it again.”
You’re killing me here. Smart AND clever AND humble? I feel a science crush coming on….

September 6, 2013 11:16 am

http://wattsupwiththat.com/2013/09/05/statistical-proof-of-the-pause-overestimated-global-warming-over-the-past-20-years/#comment-1409624
==============================================================
As long as it’s brighter than a “nitlamp” I think I can fing my way out. 😎

September 6, 2013 11:58 am

Gunga Din (Sept 6 at 10:48):
Thank you for giving me the opportunity to clarify. I did not mean to say “The models can’t be wrong because the models say they are right.” I did mean to say that the models are insusceptible to being validated. This has the significance that the method by which models were created was not the scientific method of investigation. A consequence is that many IPCC conclusions, including the conclusion that global warming is man-made, must be discarded. The previous sentence should not be taken to mean that we know the warming is not man-made.
The widespread view that the models were created by the scientific method is a product of successful use of the deceptive argument known as the “equivocation fallacy” on the part of the IPCC and affiliated climatologists. An equivocation fallacy is a conclusion that appears to be true but that is false or unproved. For details, please see the peer-reviewed article at http://wmbriggs.com/blog/?p=7923 .

September 6, 2013 12:33 pm

Terry Oldberg
http://wattsupwiththat.com/2013/09/05/statistical-proof-of-the-pause-overestimated-global-warming-over-the-past-20-years/#comment-1409676
==================================================================
Thank you.
“Equivocation fallacy” sounds very similar to “bait and switch”.
(Guess I didn’t need the nitlamp afterall.)

September 6, 2013 1:01 pm

kadaka (KD Knoebel) says:
September 5, 2013 at 2:32 am
If you have a one meter squared body of water, how much would a 1600W (1.6 kilowatt) hair dryer heat the body of water over 60 seconds, 60 minutes, from the surface?

September 6, 2013 1:32 pm

Rich – The equation is physics-based. The physics is the first law of thermodynamics, conservation of energy. This is discussed more completely starting on page 12, Anomaly Calculation (an engineering analysis), in an early paper made public 4/10/10 at http://climaterealists.com/attachments/database/2010/corroborationofnaturalclimatechange.pdf . This shows an early version of the equation which has since been refined.
The equation contains only one external forcing, the time-integral of sunspot numbers which serves as an excellent proxy for average global temperature. The mechanism has been attributed to influence of change to low altitude clouds, average cloud altitude, cloud area, and even location of cloud ‘bands’ as modulated by the jet stream. I expect it will eventually be found to be some combination of these. The high sensitivity of average global temperature to tiny changes in clouds is calculated at http://lowaltitudeclouds.blogspot.com . It is not necessary to know the mechanism to calculate the proxy factor.
Determining the value of a single proxy factor is not ‘curve fitting’.
Graphs that show (qualitatively because proxy factors are not applied) the correlation between the time-integral of sunspot numbers and average global temperature can be seen at http://hockeyschtick.blogspot.com/2010/01/blog-post_23.html or at http://climaterealists.com/attachments/ftp/Verification%20Dan%20P.pdf (this shows an earlier version of the equation. HadCRUT4 data was not used.
The only hypothesis that was made is that average global temperature is proportional to the time-integral of the sunspot time-integral. The rest is arithmetic. The coefficient of determination, R2, = 0.9 demonstrates that the hypothesis was correct.
The past predictive skill of the equation is easily demonstrated. Simply determine the coefficients at any time in the past using data up to that time and then use the equation, thus calibrated, to calculate the present temperature. For example, the predicted anomaly trend value for 2012 (no CO2 change effect) using calibration through 2005 (actual sunspot numbers through 2012) is 0.3888 K. When calibrated using measurements through 2012 the calculated value is 0.3967 K; a difference of only 0.008 K.
The future predictive skill, after 2012 to 2020, depends on the accuracy of predicting the sunspot number trend for the remainder of solar cycle 24 and the assumption that the net effective ocean oscillation will continue approximately as it has since before 1900.
This is an equation that calculates average global temperature. It is not a model, especially not a climate model…or a weather model. An early version of it, made public in 2010, predicted a downtrend from about 2005.
Part of the problem in trying to predict measured temperatures is that the measurements have a random uncertainty with standard deviation of approximately ±0.1 K so only trends of measurements are meaningful for comparison with calculations.

rgbatduke
September 6, 2013 1:50 pm

To your list of shortcomings in the methodology of global warming research, you could have added that the general circulation models are insusceptible to validation because the events in the underlying statistical populations do not exist.
I could have waxed poetic considerably longer, for example pointing out the recently published comparison of four GCMs to a toy problem that is precisely specified and known and that should have a unique answer. All four got different answers. The probability that any of those answers/models is correct is correspondingly strictly less than 25% and falling fast even if we do NOT know what the correct answer is (the best one could say is that one of the four models got it right and the others got it wrong, but of course all four could have gotten it wrong as well, hence strictly less than). This example alone is almost sufficient to demonstrate a lack of “convergence” in any sort of “GC model space”, although 4 is too small a number to be convincing.
I could also have ranted a bit about the stupidity of training and validating hypothesized global warming models using data obtained from a single segment of climate measurements when the climate was monotonically warming, which may be what you are trying to say here (sometimes I have difficulty understanding you but I think that sometimes I actually agree with what you say:-). When training e.g. Neural Network binary classification models, it is often recommended that one use a training set with balanced number of hits and misses, yesses and noes, because if you have an actual population that is (say) 90% noes and train with it, the network quickly learns that it can achieve 90% accuracy (which is none too shabby, believe me) by always answering no!
Of course this makes the model useless for discriminating the actual yesses and noes in the population outside of the training/trial set, but hey, the model is damn accurate! And of course the solution is to build a good discriminator first and correct it with Bayes theorem afterwards, or use the net to create an ordinal list of probable yes-hood and empirically pursue it to an optimum payoff.
GCMs appear to have nearly universally made this error. Hindcasting the prior ups and downs in the climate record is not in them, to the extent that we even have accurate data to use for a hindcast validation. By making CO_2 the one significant control knob at the expense of natural variations that are clearly visible in the climate record and that are not be predicted by the GCMs as far as I know, certainly not over multicentury time scales, all the models have to do is predict monotonic warming and they will capture all of the training/trial data and there are LOTS of ways to write, tune, initialize a model to have monotonic behavior without even trying. The other symptoms of failure — getting storms, floods and drought, SLR, ice melt, and many other things wrong were ignored, or perhaps they expected that future tweaks would fix this while retaining the monotonic behavior that the creators of the models all expected to find and doubtless built into the models in many ways. Even variables that might have been important — for example, solar state — were nearly constant across the training/trial interval and hence held to be irrelevant and rejected from the models. Now that many of those omitted variables — ENSO, the PDO, solar state are radically changing, now that the physical science basis for the inclusion and functional behavior of other variables like clouds and soot is being challenged, it can hardly be that surprising that the monotonic models that all were trained and validated by the same monotonic interval and insensitive to all of these possibly important drivers continue to show monotonic behavior while the real climate does not as those drivers have changed state?
If the training set for a tunable model does not span the space of possible behaviors of the system being modeled, of course you’re going to be at serious risk of ending up with egg on your face, and with sufficiently nonlinear systems you will never have sufficient data to use as a training set. Nicholas Nassim Taleb’s book The Black Swan is a veritable polemic against the stupidity of believing otherwise and betting your life or fortune on it. Here we are just betting the lives of millions and the fortunes of the entire world on the plausibility that the GCMs built in a strictly bull market can survive the advent of the bear, or are bear-proof, or prove that bears have irrevocably evolved into bulls and will never be seen again. This bear is extinct. It is an ex-bear.
Until, of course, it sits up and bites you in the proverbial ass.
So yeah, Terry, I actually agree. One of many troubling aspects of GCMs is that they have assumptions built into them supported by NO body of data or observation or even any particularly believable theory. They have assumptions that contradict or are neutral to the existing observational data, such as “the PDO can safely be ignored”, or “the 1997-1998 warming that is almost all of the warming observed over the training interval was all due to an improbable ENSO event, not CO_2 per se”, or “solar variability is nearly irrelevant to the climate”. And every one of them is an implicit Bayesian prior, and to the extent that the assumptions are not certain, they weaken the probable reliability of the predictions generated by the models that incorporate them, even by omission.
rgb

rgbatduke
September 6, 2013 2:00 pm

If you have a one meter squared body of water, how much would a 1600W (1.6 kilowatt) hair dryer heat the body of water over 60 seconds, 60 minutes, from the surface?
Well, let’s see, that’s one metric ton (1000 kg) of water. Its specific heat is 4 joules per gram degree centigrade. 1000 kg is a million grams. To raise it 1 one degree requires 4 million joules. If you dumped ALL 1600 W into the water, prevented the water from cooling or heating (adiabatically isolated it), it would take 42 minutes to raise it by a degree. If you tried heating it with warm blowing air from a hair drier on the TOP SURFACE, however, you would probably NEVER warm the body of water by a degree. I say “probably” because the wind from the hair drier (plus its heat) would encourage surface evaporation. Depending on the strength of the wind and how it is applied, it might COOL the water due to latent heat of evaporation, or the heat provided by the hair drier might be sufficient to replace it by a bit. However, even in the latter case, since water will cheerfully stratify, all you’d end up doing is warming the top layer of water until latent heat DID balance the hair drier’s contribution to the water (probably at most a few degrees) and it would then take a VERY long time for the heat to propagate to the bottom of the cubic meter, assuming that that bottom is adiabatically insulated. Days. Maybe longer. And it would as noted probably not heat without bound — it would just shift from one equilibrium temperature at the surface to another slightly warmer on.
rgb

rgbatduke
September 6, 2013 2:09 pm

Your understanding of the meaning of “validation” is identical to mine. The populations underlying the general circulation models do not exist; thus, these models are insusceptible to being validated.
No scientific model can be verified in the strong sense. All scientific models can be falsified in the strong sense. So what is the point? We could have validated any given GCM in the weak sense by observing that it is “still” predicting global climate reasonably accurately (outside its training/trial where this is not surprising). No interval of observing that this is true is sufficient to verify the model in the strong sense (so that we believe that it can never be falsified, the data proves the model). But plenty of models, including GCMs, could be validated in the weak sense up to the present.
It’s just that they (mostly) aren’t. They are either strongly falsified or left in limbo, not definitively (probably) correct or incorrect, so far.
I do not understand your point about the populations underlying the GCMs, after all. You’ll have to explain in non-rabbit-hole English, with examples, if you want me to understand. Sorry.
rgb

richardscourtney
September 6, 2013 2:35 pm

Dan Pangburn:
re your post addressed to me at September 6, 2013 at 1:32 pm
http://wattsupwiththat.com/2013/09/05/statistical-proof-of-the-pause-overestimated-global-warming-over-the-past-20-years/#comment-1409791
in response to my answer to you at September 6, 2013 at 5:19 am
http://wattsupwiththat.com/2013/09/05/statistical-proof-of-the-pause-overestimated-global-warming-over-the-past-20-years/#comment-1409387
Sorry, but the model IS a curve fitting exercise. I remind that the link says

The word equation is: anomaly = ocean oscillation effect + solar effect – thermal radiation effect + CO2 effect + offset.

The link is
http://climatechange90.blogspot.com/2013/05/natural-climate-change-has-been.html
and the mathematical representation of that equation is there (if I knew how to copy it to here then I would).
If the model were not a curve fitting exercise then there would be accepted definitions of
ocean oscillation effect,
solar effect,
thermal radiation effect,
CO2 effect,
and the offset to be applied.
There are no such agreed definitions. The parameters are each compiled to fit the curve.
As you say, they do not disagree with known physics. But other curve fitting exercises could, too. And they would also ‘wiggle the elephants trunk’.
This is not to say the model is wrong. But there is no reason to think it is right. I explain this in my post which you have answered.
Sorry, but that is the way it is.
Richard

September 6, 2013 2:48 pm

rgbatduke says:
September 6, 2013 at 2:09 pm
On top of all that, the GCM, largely right or largely wrong, cannot be even trained over any interval because the world’s temp record keepers have an algorithm that keeps changing the record. Probably the GCMs in existence were “trained” over HadCrut 2 or 3 and now we have Hadcrut 4 for example. Man, a large team has to come in and re-correct the temperature records going back to the raw data. If its dangerous global warming we are trying to quantify, I contend that there is little need for adjustments even if there is some reasonable case for it if we are to be facing runaway warming and seas rising metres. Correcting here or there by 0.2 -0.4 (I call it the thumbtack method – stick the tack in to about 1945 and rotate counter clockwise a half a degree) won’t even matter if we are going to have unbearable heat rise. We haven’t even got our feet wet and GISS was calling for the West Side Highway to be under water before now and its about 10 feet above the water in Manhattan at the present time. The GCMs are easy – we can just throw them out.

September 6, 2013 5:11 pm

Rich – The constants and variables in the math equation are defined just after the math equation. I’ll connect the terms in the word equation with the math equation and try to expand on them a bit more.
ocean oscillation effect = (A,y) “There is some average surface temperature oscillation that accounts for all of the oceans considered together of which the named oscillations are participants.” Page 1, 3rd paragraph from bottom.
solar effect = B/17 * summation of sunspot numbers from 1895 to the calculation year. This accounts for the energy gained by the planet above or below break-even and expresses it as temperature change.
thermal radiation effect = B/17 * summation of 43.97*(T(i)/286.8)^4 from 1895 to the calculation year. This accounts for the energy radiated by the planet above or below break-even and expresses it as temperature change.
CO2 effect = C/17 * summation of ln(CO2 level in the calculation year/CO2 level in 1895 from 1895 to calculation year)
Offset to be applied = D (see the paper)
“…agreed definitions” These are the definitions of the terms in the equation. What matters is the results of the equation. The results match the down-up-down-up-too soon to tell of reported average global temperatures (which have sd≈±0.1 K). The whole point is that these are not for anyone else to ‘agree’ on. I don’t know of anyone else who has thought to look at the time-integral of sunspots.
I’m not sure what you mean by ‘parameters’. The coefficients are ‘tuned’ (tediously) to maximize R2 but, except for the proxy factor, they can be estimated fairly closely by a look at anomaly measurements.
“But other curve fitting exercises could, too.” I don’t think so. Here is the challenge. Fit the measured anomalies back to 1895 with R2=0.9. Approximate the accepted average global temperature trend back to 1610. Use only one external forcing.
I think the equation is right because it does all those things and also gave a good prediction of 2012 measurements based on data through 2005. I have no interest in an ‘is not’ ‘is too’ argument. The equation and graph with prediction are made public and waiting for future measurements.

September 6, 2013 7:51 pm

http://wattsupwiththat.com/2013/09/05/statistical-proof-of-the-pause-overestimated-global-warming-over-the-past-20-years/#comment-1409063
Dan Pangburn says: September 5, 2013 at 4:30 pm
“A physics-based equation, using only one external forcing, calculates average global temperature anomalies since before 1900 with R2 = 0.9. The equation is at http://climatechange90.blogspot.com/2013/05/natural-climate-change-has-been.html
Everything not explicitly considered must find room in that unexplained 10%.”
_____________
Thank you Dan,
This was interesting. You say “About 41.8% of reported average global temperature change results from natural ocean surface temperature oscillation and 58.2% results from change in the rate that the planet radiates energy to outer space, as calculated using a proxy, which is the time-integral of sunspot numbers.” So Solar is your “one external forcing”.
You used Hadcrut4 Surface Temperature record in this analysis.
I suggest that this Surface Temperature record probably exhibits a significant warming bias – my rough estimate for Hadcrut3 was about 0.07C per decade, at least back to ~1979 and possibly much further.
How would your analysis change if you were to decrease your surface temperature record by 0.07C/decade from about 1945 to present, and particularly how would this change the inferred impact of increased atmospheric CO2 and other parameters in your equation?
If you want to email me, you can contact me through my website at http://www.OilSandsExpert.com
Thank you, Allan

1 7 8 9 10 11 14