The Thousand-Year Model

Guest Post By Willis Eschenbach

Back in 2007, Jeffrey Kiehl first pointed out a curious puzzle about the climate models, viz: (emphasis mine)

[3] A review of the published literature on climate simulations of the 20th century indicates that a large number of fully coupled three dimensional climate models are able to simulate the global surface air temperature anomaly with a good degree of accuracy [Houghton et al., 2001]. For example all models simulate a global warming of 0.5 to 0.7C over this time period to within 25% accuracy. This is viewed as a reassuring confirmation that models to first order capture the behavior of the physical climate system and lends credence to applying the models to projecting future climates.

[4] One curious aspect of this result is that it is also well known [Houghton et al., 2001] that the same models that agree in simulating the anomaly in surface air temperature differ significantly in their predicted climate sensitivity. The cited range in climate sensitivity from a wide collection of models is usually 1.5 to 4.5C for a doubling of CO2, where most global climate models used for climate change studies vary by at least a factor of two in equilibrium sensitivity.

[5] The question is: if climate models differ by a factor of 2 to 3 in their climate sensitivity, how can they all simulate the global temperature record with a reasonable degree of accuracy? Kerr [2007] and S. E. Schwartz et al. (Quantifying climate change–too rosy a picture?, available at www., 2007) recently pointed out the importance of understanding the answer to this question.

Kiehl posed the question, what I thought at the time was a very interesting and important question, in a paper called Twentieth century climate model response and climate sensitivity and he got partway to the answer. He saw that as the forcings went up, the sensitivity went down, as shown in his Figure 1. He thought that the critical variable was the total amount of the forcing used by the model, and that the sensitivity was inversely and non-linearly proportional to the total amount of forcing.

kiehl sensitivity vs total forcing

Figure 1, with original caption, from Kiehl 2007.

However, my findings show that the models’ climate sensitivity can be directly derived from the model forcings and the models results. Sensitivity (transient or equilibrium) is directly proportional to the ratio of the trend of the temperature to the trend of the forcing. This makes intuitive sense, because the smaller the trend of the forcing, the greater the trend ratio. And the smaller the forcing the more you’ll have to amplify it to match the 20th century trend, so you need greater sensitivity. I have added two new models to my previous results, the MIROC model from Japan, and a most curious and informative dataset, the Crowley thousand-year hindcast (paywalled here , data here). The Crowley study used what they describe as a linear upwelling/diffusion energy balance model. As Figure 2 shows, just as with my previous findings, the climate sensitivity of the MIROC and the much simpler Crowley model is given the same simple function of the ratio of the trends.

sensitivity vs trend ratioFigure 2.  Equilibrium climate sensitivity versus trend ratio (trend of results/trend of forcings). Equilibrium climate sensitivity is calculated in all cases as being 40% higher than transient climate response, per the average of the results of Otto, which cover the last four decades of observations.

This conclusion about the relationship of the forcing trend to the climate sensitivity is one outcome of the discovery that there is a one-line equation with which we can replicate the global average temperature results from any climate model. Strange but true, functionally it turns out that all that the climate models do to forecast the global average surface temperature is to lag and resize the forcing. That’s it. Their output is a simple lagged linear transformation of their input. This is true of individual climate models as well as the average of “ensembles” of models. Their output can be replicated, with a correlation of .99 or so, by a simple, one-line equation.

I have shown that this is true both for individual models, as well as for an average of a 19-model “ensemble” of models. Modelers call groups of models “ensembles”. I assume this term borrowed from music is used because each model is playing a different tune on a different instrument, but regardless of the etymology, the average of the ensemble results show exactly the same thing as the results from individual models. They all simply lag and scale the forcing, and call it temperature.

In my last report on this subject, I mentioned that I was about to shift the platform for my investigations from Excel to the computer language “R”. I’ve done that now, with some interesting results. Here’s a confirmation that my shift to R has been successful. This shows the results from the average of the 19 models used in the Forster analysis.

ole foster forcing foster modelFigure 3. Average forcings and average modeled resulting temperatures of the 19 Forster models, along with my emulation of the modeled temperatures. The emulation (red line) is calculated using the one-line equation. a) Average of modeled temperatures from 19 global climate models (gray line with circles), along with the emulation given by the one-line equation (red line). b) same information as in 1a, with the addition of the forcing data. “Lambda” is the scaling factor. If the forcings are purely radiative, as in this case, lambda is the transient climate response. “Tau” is the time constant of the lagging process, which is also known as the “e-folding time”. “C” is the heat capacity of the upper layer of the ocean, showing the size of the thermal reservoir, which is calculated from the given tau and lambda. “TCR” is the temporary climate response to a doubling of CO2. Equilibrium climate sensitivity (ECS) is about 40% larger than the transient response.

Figure 1b shows the average inputs (blue line, 20th century “forcings” from CO2, volcanoes, the sun, aerosols, and the like) and outputs (gray line with circles, modeled temperatures for the 20th century) of 19 models used in the IPCC reports. You can see how the model outputs (global average temperatures) are merely a lagged and rescaled version of the inputs (“forcings”). Note that the correlation of the emulation (red line) and the actual model results is 0.99.

So … what are some implications of the finding that the hugely complex climate model global average temperature results are simply a lagged version of the inputs? Well, after thinking about that question for a while, I find that they are not exactly what I imagined them to be at first glance.

For me, the first implication of the finding that the models global temperature output is just lagged and resized forcings is that the models are all operating as designed. By that I mean, they have all faithfully and exactly reproduced the misconceptions of the programmers, without error. This is good news, as it means they are working the way the modelers wanted. It doesn’t mean that they are right—just that they are working as intended by the modelers. The claim of the modelers all along has been what I see as the fundamental misconception of climate science—the incorrect idea that the earth’s temperature is a linear function of the forcing, and everything else averages out. And that is exactly what the models do.

In some ways this unanimity is surprising, because there is a reasonable amount of variation in the complexity, the assumptions,  and the internal workings of the individual models.

I ascribe this widespread similarity to two things. One is that the core physics is roughly correct. No surprise there. They’ve left out the most important part, the control mechanism composed of the emergent thermoregulatory phenomena like thunderstorms, so their models don’t work anything like the real climate, but the core physics is right. This makes the climate models similar in function.

The other thing driving the functional similarity is that the modelers all have one and only one way to test their models—by comparing them with the historical reality.

This, of course, means that they are all tuned to reproduce the historical temperature record. Now, people often bridle when the word “tuned” is used, so let me replace the word “tuned” with the word “tested”, and try to explain the difficulty that the modelers face, and to explain how testing turns into tuning.

Here’s the only way you can build a climate model. You put together your best effort, and you test it against the variable of interest, global temperature. The only data  on that is the historical global average temperature record. If your model is abysmal and looks nothing at all like the real historical temperature record, you throw it away and start over again, until you have a model that produces some results that kinda look like the historical record.

Then you take that initial success, and you start adding in the details and improving the sub-systems. Step by step, you see if you can make it a “better” model, with better meaning that it is more lifelike, more realistic, more like the real world’s history. For example, you have to deal with the ocean-atmosphere exchange, it’s a bitch to get right. So you mess with that, and you test it again. Good news!. That removed the problems you’d been having replicating some part of the historical record. So you keep those changes that have resulted from the testing.

Or maybe the changes you made to the ocean-atmosphere interface didn’t do what you expected. Maybe when you look at the results they’re worse. So you throw out that section of code, or modify it, and you try again. People say the climate models haven’t been tested? They’ve been tested over and over by comparing them to the real vagaries of the historical temperature record, every one of them, and the models and the parts of models that didn’t work were gotten rid of, and the parts that did work were kept.

This is why I have avoided the word “tuning”, because that doesn’t really describe the process of developing a model. It is one of testing, not tuning. Be clear that I’m not saying that  someone sat down and said “we’re gonna tune the ice threshold level down a little bit to better match the historical record”. That would be seen as cheating by most modelers. Instead, they do things like this, reported for the GISS climate model:

The model is tuned (using the threshold relative humidity U00 for the initiation of ice and water clouds) to be in global radiative balance (i.e., net radiation at TOA within 0.5 W m2 of zero) and a reasonable planetary albedo (between 29% and 31%) for the control run simulations.  SOURCE 

So the modelers are right when they say their model is not directly tuned to the historical record, because it’s not tuned, it’s tested. But nonetheless, the tuning to the historical record is still very real. It just wasn’t the “twist the knobs” kind of tuning—it was evolutionary in nature. Over the last decades, the modelers will tell you that they’ve gotten better and better at replicating the historical world. And they have, because of evolutionary tuning. All you have to do is what evolution does—constantly toss out the stuff that doesn’t pass the test, and replace it with stuff that does better on the test. What test? Why, replicating the historical record! That’s the only test we have.

And through that process of constant testing and refinement which is not called tuning but ends up with a tuned system, we arrive at a very curious result. Functionally, all of the various climate models end up doing nothing more than a simple lag and resizing of the forcing inputs to give the global temperature outputs. The internal details of the model don’t seem to matter. The various model parameters and settings don’t seem to matter. The way the model handles the ocean-atmosphere doesn’t seem to matter. They’ve all been smoothed out by the evolutionary process, and all that’s output by every model that I’ve tested is a simple lagging and resizing of the inputs.

The second implication is that for hindcasts or forecasts of global temperatures, climate models are useless. There is no way to judge whether GISS or CM2.1 or the average of the nineteen models is “correct”. All the models do is lag a given set of forcings, and get an answer—but a different set of forcings gives a very different answer, and we have no means to distinguish between them.

The third implication of the finding that the models just lag and resize the forcings is the most practical. This is that this highly simplified one-line version of the models should be very useful, but not for figuring out what the climate is doing. Instead, it should be useful for figuring out where in both time and space the climate is NOT acting the way the modelers think it operates. Following up on this one is definitely on my list.

The fourth implication is that once the forcings are chosen, the die is cast. If you are looking to hindcast the historical temperatures, your model output must have a trend similar to the historical temperatures. But once the forcings are chosen the trend of the forcing and the model are both known, and thus the climate sensitivity is fixed, it’s simply some constant times the temperature trend divided by the forcing trend.

The fifth implication is that this dependence of possible outcomes on the size and shape of the forcings raises the possibility that like the models themselves, the forcings have undergone a similar evolutionary tuning. There is no agreement on the size of several of the elements that make up the forcing datasets, or indeed which elements are included for a given model run. If a modeler adds a set of forcings and they make his model work worse when tested against the real world, he figures that the forcing figures are likely wrong, and so he chooses an alternate dataset, or perhaps uses an alternate calculation that does better with his model.

The sixth implication is that given a sufficiently detailed set of forcings and modeled temperatures, we can use this technique to probe more deeply into the internal workings of the models themselves. And finally, with that as context, that brings us to the Crowley thousand year model runs.

The Crowley dataset is very valuable because he has included the results of a simplified model which was run on just the volcanic forcings. These volcanic forcings are a series of very short interruptions in sunlight with nothing in between, so it’s an ideal situation to see exactly how the model responds in the longer term. Crowley reports the details of the model as follows:


A linear upwelling/diffusion energy balance model (EBM) was used to calculate the mean annual temperature response to estimated forcing changes. This model calculates the temperature of a vertically averaged mixed-layer ocean/atmosphere that is a function of forcing changes and radiative damping. The mixed layer is coupled to the deep ocean with an upwelling/diffusion equation in order to allow for heat storage in the ocean interior.

The radiative damping term can be adjusted to embrace the standard range of IPCC sensitivities for a doubling of CO2. The EBM is similar to that used in many IPCC assessments and has been validated against both the Wigley-Raper EBM (40) and two different coupled ocean-atmosphere general circulation model (GCM) simulations.

All forcings for the model runs were set to an equilibrium sensitivity of 2°C for a doubling of CO2. This is on the lower end of the IPCC range of 1.5° to 4.5°C for a doubling of CO2 and is slightly less than the IPCC “best guess” sensitivity of 2.5°C [the inclusion of solar variability in model calculations can decrease the best fit sensitivity (9)]. For both the solar and volcanism runs, the calculated temperature response is based on net radiative forcing after adjusting for the 30% albedo of the Earth-atmosphere system over visible wavelengths.

So that’s the model. However, bearing in mind the question of the evolutionary tuning of the forcings as well as of the models, as well as the total dependence of the output on the forcings chosen for the input, I first took a look at the forcings. Figure 4 shows those results:

crowley thousand year model forcingsFigure 4. Forcings used in the Crowley 1000 year model run. (As an aside, the volcanic forcings (black downwards lines) show a natural phenomenon called the “Noah Effect”. The hydrological event called Noah’s flood was allegedly very much larger than any other flood in history. Similarly, in a natural dataset we often find that the largest occurrence is much larger than the next largest occurrence. You can see the “Noah Effect” in the eruption of 1259 … but I digress.)

Bearing in mind that what will happen is that these forcings will simply be lagged and scaled, we can see what will make the differences in the final Crowley temperature hindcast.

First, I note that the volcanic forcings are larger than the volcanic forcings in any other model I’ve seen. The GISS model has the largest volcanic forcings of the models I’ve looked at. Here’s the comparison for the overlap period:

crowley and giss volcanic forcingsFigure 5. A comparison of the Crowley (blue) and GISS (red) volcanic data.

In addition to the overall difference in peak amplitude, you can see that the GISS data has many more small volcanic eruptions. Another oddity is that while some of the events happen in the same year in both dat sets, others don’t.

Next, the solar variations in Figure 4 are so small that they don’t really count for much. So we’re left with volcanic (black), aerosol (red), and GHG forcings (orange).

I have to say that the Crowley aerosol forcing in Figure 4 looks totally bogus. The post-1890 correlation of aerosol forcing (once it is no longer zero) with the GHG forcing is -0.97, and I’m not buying that at all. Why should aerosol forcing be the scaled inverse of the GHG forcing? The only function of the aerosol forcing seems to be to reduce the effect of the GHG forcing.

This is how you generate a model result that fits your needs, you just adjust the forcings. You pick a big size for the volcanic forcings, because that gives you the dip you need for the Little Ice Age. Then you adjust the GHG forcing by using an assumed aerosol forcing that is a smaller mirror reflection of the GHG forcing, and you have it made … and what is it that you do end up with?

This is the best part. You end up with a model that does a pretty good job of replicating the long-discredited Mann “hockey stick”, as proudly exhibited by Crowley …

crowley figure 4Figure 6. This is Figure 4b from the Crowley paper, showing the Crowley model (“All”, blue) compared to the Mann hockey stick (“Mn”, red). Black line is the Jones instrumental data.

Well, this dang post is getting too long. I still haven’t gotten to what I started to talk about, the question of probing the internal workings of the Crowley volcano model, so I’ll put that in the next post. It’s midnight. I’m tired. My job today was picking out and chiseling loose and carrying away pounds and pounds of wood-rat droppings out of an old pump-house in the rain. Ah, well, at least spending the day cleaning up some other creature’s sh*t puts food on the table … and casts a valuable and revealing light on my usual delusions of self-importance …

More to come, as always, as time and the tides of work and family allow.

Best to you all, I’d wish you a better day than mine, but that’s a low bar. Be kind to my typos and the like, it’s late.


5 1 vote
Article Rating
Newest Most Voted
Inline Feedbacks
View all comments
June 25, 2013 2:37 am

The basic problem with all GCM’s is that they cannot model the chaotic system, with cyclic variations, that climate is. standard weather forecasts are OK for two days, they then diverge from the forecast by larger and larger variations as the days progress. This is chaos working.
Models also have the problem of factoring in atmospheric CO2 content because of the GHE which is not a possible effect but here Willis and I diverge completely.

Bloke down the pub
June 25, 2013 2:46 am

They can ‘tune’ or test’ their hindcasts as much as they like, but until they can make a reasonable stab at something more substantial than just ‘plausible outcomes’ I wont be moving to higher ground.

Robert of Ottawa
June 25, 2013 2:47 am

A very good explanation of model tuning .. er … calibration.
Your model’s output is proportional to the inpiut, so adjust the inputs to provide the output you want. Wave hands in justification and obfuscation.

Philip Bradley
June 25, 2013 3:03 am

Willis, you excell your normal excellence.
Although , I’ll point out that the historical temperature record derived from min+max/2 is strongly influenced by effects that influence minimum temperature (aerosols, smoke, aerosol seeded clouds). The models are being tuned to what is a substantially spurious historical record, in the sense it is not reflective of Earth’s energy balance.

Mike McMillan
June 25, 2013 3:42 am

I can see the headline on SkS, “Willis validates MBH98 hockey stick.”
.gif’s for charts, please, not .jpg’s. Too many compression artifacts to blow up to presentation size.

Bill Illis
June 25, 2013 4:16 am

Basically, the models are not simulating the climate.
They are simulating their assumptions about GHGs and forcings.
That is why they so far off right now, why they are so far off on the TMT levels shown by Spencer, why they miss the impacts of volcanoes by such a large amount and have to include many downscaling/tuning factors.
If they are not simulating the climate, then what is the point. They could have simulated their assumptions about GHGs with just simple pen and paper.

Kelvin Vaughan
June 25, 2013 4:30 am

Thanks Willis,.
Not even a model can predict the future. It just assumes that things will go on as they have been and as they are. It’s just a what if machine, and if’s a cliff you will never get over.

June 25, 2013 5:15 am

Another great post, Willis.
To paraphrase Dr. Frederick Frankenstein, climate models are doo-doo.

Alan Watt, Climate Denialist Level 7
June 25, 2013 5:33 am

Best to you all, I’d wish you a better day than mine, but that’s a low bar. Be kind to my typos and the like, it’s late.

WIllis, for the quality and interest of your posts, all typos are forgiven. I look forward to the continuation.
Since you can duplicate all the climate models’ output with a simple one-line equation, I assume you can cover the same time period at a vastly lower computational cost. Therefore you can run a climate projection which is 99% equivalent to the best current models out 1,000 years in the future. In other words, enough playing around with the past, producing “projections” where we already know the historical results — let’s see what our climate will be like out to the year 3,000! Will we have snowball earth, or will the surface temperature be at 1 million degrees like the core? (h/t Al Gore)
Inquiring minds want to know …
Hope your day today (yesterday’s tomorrow) has less rat sh*t.

William C Rostron
June 25, 2013 5:51 am

Good summary. I might add, as a math modeler myself (nuclear power plant simulation), that the higher the order of the system, the more likely it is to diverge from reality once the constraints of the “learning dataset” run out. Anyone that has done polynomial curve fits has seen this. Or better, everyone that has done neural network modeling as seen this. The problem is that data fits aren’t first principles; they are, as many have pointed out, merely enforced correlations. We recently rewrote a part of our turbine model that used to employ curve fits to valve position data to get the simulation to match the plant, with a system that more closely implemented the first principle physics. In this case, the model became more complex, which was necessary to model the physics. Once we got the physics right, all of the extra curve fitting stuff that we had installed to force the model to match the plant data went away. I know that I’m being a little vague here, but the general principle is that you can always create a model that matches data. That’s not the issue. The problem is to get the physics right.
I did control systems engineering for about 20 years before delving into power plant modeling. I’ve commented before about Willis’ hypothesis of climate system control, that automatically regulates earth’s temperature. I happen to agree: that’s what the system looks like to me, too. The fact that the measured water vapor feedback (from satellite data) demonstrate negative slope confirms this (against climate model assumptions of positive slope). Changes in atmospheric conditions (pressure, contaminates, cosmic rays, etc.) are, in effect, fiddling with the thermostat. The actual physical mechanics of the regulator are complex and not easily modeled. But that doesn’t matter as much as the fact that the feedback exists and seems to have been working for thousands of years.
Climate models are never going to work until that energy balance regulator system is included. It is a critical piece of physics, necessary to explain everything else. Until that works in the model, future projections of climate by that model are worthless.

June 25, 2013 6:12 am

My understanding is that the model outputs, by replicating the inputs lagged are equivalent to an time-series model of the ARIMA form. More about this at the end of this comment.
This is an important result and needs to be published in a journal. Your work reveals that the models do not “demonstrate the degree of correspondence between the model and the material world” needed to form the basis of public policy as set out by Oreskes and others in their Science article Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences.
“”A model, like a novel, may resonate with nature, but it is not a “real” thing. Like a novel, a model may be convincing it may “ring true” if it is consistent with our experience of the natural world. But just as we may wonder how much the characters in a novel are drawn from real life and how much is artifice, we might ask the same of a model: How much is based on observation and measurement of accessible phenomena, how much is based on informed judgment, and how much is convenience? Fundamentally, the reason for modeling is a lack of full access, either in time or space, to the phenomena of interest. In areas where public policy and public safety are at stake, the burden is on the modeler to demonstrate the degree of correspondence between the model and the material world it seeks to represent and to delineate the limits of that correspondence.”
Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences Naomi Oreskes; Kristin Shrader-Frechette; Kenneth Belitz Science, New Series, Vol. 263, No. 5147. ( 1994)
If the GCMs are equivalent to time-series models of the ARIMA form then one condition is stationarity. This is addressed by an econometric technique called polynomial cointegration analysis.
An Israeli group carried out such an analysis of the inputs (GHG, temperature and solar irradiance data) and concluded,
“We have shown that anthropogenic forcings do not polynomially cointegrate with global temperature and solar irradiance. Therefore, data for 1880–2007 do not support the anthropogenic interpretation of global warming during this period.”
Beenstock, Reingewertz, and Paldor, Polynomial cointegration tests of anthropogenic impact on global warming, Earth Syst. Dynam. Discuss., 3, 561–596, 2012. URL:

Chris Wright
June 25, 2013 6:53 am

Willis’ concept of evolutionary tuning is fascinating. And it has the ring of truth.
But at the end of the day it’s still a form of curve fitting, the difference being that the modellers can fool themselves that they’re doing something more sophisticated than mere curve fitting.
If models are brilliant at hindcasting the global temperature but fail completely at forecasting future trends, as they have, then the explanation is pretty obvious: they simply adjusted the models to get a good fit.
I’m pretty sure that climate forecasting is futile, just as forecasting weather months ahead is futile. Even if the model were perfect it would still be impossible to know exactly the initial conditions for the whole world – and tiny errors will rapidly escalate, as Lorenz found out in the fifties. If the models do actually accurately reproduce the global temperatures in hindcasts going back decades or centuries, then it’s a sure sign that the models were ‘adjusted’ to get the right answer. But, as others have commented, the models could be adjusted to give the right answer even if their physical assumptions are hopelessly wrong.

Tom Norkunas
June 25, 2013 7:25 am

Watt says “let’s see what our climate will be like out to the year 3,000! Will we have snowball earth, or will the surface temperature be at 1 million degrees like the core? (h/t Al Gore)”
Yeah, I need to know whether to buy a coat and boots or shorts and sandals.

June 25, 2013 8:00 am

Beenstock, Reingewertz, and Paldor, Polynomial cointegration tests of anthropogenic impact on global warming, Earth Syst. Dynam. Discuss., 3, 561–596, 2012. URL:
there was a recent post of Salby’s 2nd presentation that showed CO2 varied as the integral of temperature. this is the same result as given by cointegration, using different methods. cointegration shows temperature varies as the 2nd difference of CO2.
both methods showed the effects of CO2 on temperature are not permanent. the system responds dynamically, not linearly as assumed by the climate models. perhaps for example by changing the volume of plant matter on the surface, changing albedo and atmospheric moisture, changing temperature.
thus, the climate models have the underlying physics wrong. all else does not remain equal when you add CO2, because life itself uses CO2 and life will adapt itself to changing conditions to optimize its own survival. the nonsense in climate models is that they make no provision for the effects of life on earth’s climate.

Jim G
June 25, 2013 8:02 am

Note that even models of the more well known physical processes of the internal mechanisms of stars where mass, spin, age and elemental make up are, in some cases also fairly well known, and are the most significant variables to their behavior, are constantly surprising those attempting models with unpredicted behavioir. Modeling physical processes where not all of the variables are known and where some are highly intercorrelated with chaotic systems is a fools errand. And this is under the huge and unwarranted assumption that those doing the models have no axe to grind, as it is well known that they do in the case of climate. It appears as if Willis has revealed here some of that axe grinding along with the probable naivety of some of the grinders.

Henry Clark
June 25, 2013 8:06 am

The post-1890 correlation of aerosol forcing (once it is no longer zero) with the GHG forcing is -0.97
So aerosols as a fudging factor even more than I already thought, to avoid rather getting and the rest of

David in Texas
June 25, 2013 8:17 am

Willis, you are very kind – “Tuning” vs. “Testing”. When we worked with models (of a different kind), we called it “force fitting”. When you have many unknowns and a few equations, you can always get a fit. For example, you can use US postage prices, high school dropout rates and rabbit fertility and get a “fit” to global temperatures. The more inputs you use the better your R-square is. That doesn’t mean there is any value in doing it, and most certainly it doesn’t mean it has any predictive power. GCM’s use things like Volcanoes, Solar, Aerosol and GHG and their weightings. They will get a fit. Give me enough variables, and I’ll get a fit, especially when I can change not only the weightings, but the input values themselves. Now “force fitting” or “tuning” is a necessary part of model building, but honesty in presentation is also a necessary part. When we presented our results, we explained that it was “force fitted”, and management should be cautious.

June 25, 2013 8:53 am

When we are living right in the middle of the experiment and still can’t measure the effects of any forcings ……..

June 25, 2013 8:55 am

Thanks Willis. This is a good article, again you top yourself!
Global Circulation Models proven to be simple black boxes containing amplifiers and some time delay. How neat is that?

June 25, 2013 9:20 am

“Strange but true, functionally it turns out that all that the climate models do to forecast the global average surface temperature is to lag and resize the forcing. That’s it.”
Or maybe Harry has been mentoring (of Harry_Read_Me fame).

Martin Audley
June 25, 2013 9:23 am

Have I missed a trick?
Doesn’t this generate an obvious 7th implication?
If the models sensitivity is just proportional to the all forcings, doesn’t this take away any amplification (in the context of amplification of human-generated CO2)?
So even from the point of veiw of the models, extra CO2 can only have its own (Arrhrenius) effect of 1.2 degrees, approx, per doubling? So panic over, no chance of extreme warming?
Or alternativelyiis CO2 so static over the tuning/testing period that 0 per cent of nothing is still nothing, so the models tell us nothing about CO2 anyway, in which case they still don’t contribute anything to the panic, (which is left based only on assumed amplification?)

June 25, 2013 9:29 am

all of this testing / tuning relies on an accurate historical global record … which we don’t have so this effort is less than useless, its fraudulant …

JM VanWinkle
June 25, 2013 9:56 am

William C Rostron (June 25, 2013 at 5:51 am), Thank you for that post. Feedbacks that stabilize the climate are indeed missing from the models, especially given Willis’s observation that all the model outputs result in a linear plus a lagged result. Moreover, the simple linear plus lag model results are contrary to the chaotic nature of the real physics as Dr. Robert Brown has observed. Further, the models if they did reflect climate physics somehow don’t identify the attractors in the climate chaos (we see two large ones in the geologic record, interglacial and glacial phases, there must be more, no?) So, it is no wonder that there is no real predictive ability in the models even over 20 years!

Gary Hladik
June 25, 2013 10:18 am

What a life Willis leads! Shoveling rodent droppings during the day, analyzing rodent droppings at night. Ah, the good life! 🙂

June 25, 2013 10:23 am

Climate models are mathy mumbo-jumbo sock-puppets.

Michael D Smith
June 25, 2013 11:00 am

You’re chiseling out and cleaning up rat sh*t even when you’re not in the pump house, Willis.

keith at hastings uk
June 25, 2013 11:01 am

I remind myself that planetary motions were well modelled by circles and epicircles etc, pre Copernicus and Newton, but were fundamentally wrong. When gravity and orbits were better understood, it all became simpler and clearer. Put simply, just because something “hindcasts”, it doesn’t mean it is any good.

Neil Jordan
June 25, 2013 11:03 am

Climate models might have left out some ice, resulting in a sea level change of zero or 82 feet (25 meters) . This morning’s California Water News carries an article about ice, paleo sea level, and modeling:,0,4448493.story
[begin quote]
Ice mass the size of Greenland overlooked in climate models
By Geoffrey Mohan
June 25, 2013, 5:45 a.m.
Far more of Earth’s water was locked up as ice at the height of the last ice age than previously thought, and current climate change models may need to be adjusted to account for it, according to a new study.
Rowley’s calculations show that sea levels either did not change or shifted only about 82 feet.
It allows for the possibility that there is significant melting of the East Antarctic Ice Sheet,” Rowley said. “Or it allows for a simple interpretation of no melting.
[end quote]

Barry Elledge
June 25, 2013 1:19 pm

Interesting discussion of how models are functionally tweaked.
I hope you’re wearing a good quality respirator while shoveling rat feces. The dust can carry hantavirus and a few other unsavory infectious diseases which can be transmitted through nose, mouth and lung mucosa by breathing.

Theo Goodwin
June 25, 2013 1:21 pm

Another brilliant post, Willis, maybe the best of all. In a very practical way, you have shown that “testing” various parts of a model amounts to nothing more than watching the changes propagate throughout the model. If a model is somewhat like an empirical theory, it is so only as a whole. In other words, the individual parts have no integrity of their own. As you have shown, the whole can relate input to output and is something like one grand empirical hypothesis in that respect.
The bottom line is that “climate sensitivity,” as found in models, has no empirical meaning and cannot be used to make claims about the world.

June 25, 2013 1:48 pm

The model-material being discussed here is most exquisite: lovely in fibre and pattern, a quality hitherto unreachable! The fact that you don’t see it enrobing the emperor merely identifies you as one lacking the sophistication of our grand milieu!

Mario Lento
June 25, 2013 4:16 pm

-Willis I love your technical posts… and personal stories when I have time to chill on climate.
From what I believe, the climate models failure to predict the past 17 years, is almost solid proof that Green House gases including CO2 are not major drivers of climate. It’s largely something(s) else… like any mix of the following: ENSO, SUN, COSMIC RAYS, and fill in the blank. A very good case can be made that the ENSO process claims much of what we’ve seen… and that the sun plays a roll in this process.

Berényi Péter
June 25, 2013 4:32 pm

Well, that’s why fitting complex computational models to a single run of a unique physical entity (terrestrial climate system in this case) is not science.
They could of course test & tune models to structural details finer than global average temperature history, like lack of tropical hot spot, ENSO, AMO, Sahel drought of the 1980s, precipitation & wind pattern histories in general, etc., but that’s not happening, apparently. Regional skill of models is abominable.
One thing is clear though. Computational climate model ensemble members are inconsistent pairwise, even over global average temperature datasets. For if they do reproduce this history well with inconsistent forcings, it means just that. Which, according to logic, implies at most one of them can be correct.
In this dire situation sane people would do their best to shoot down (i.e. falsify) as many members of the ensemble as they can (based on lack of “regional skill”, for example). What they do instead is take the average of scores of demonstrably false computational models and call that “ensemble average”, as if an arcane name alone could overcome logic.
Of course such a rigorous filtering process threatens to kill all current computational models, leaving none to base forecasts (“projections(?)”) on. Which may well be a state of the art verdict, but makes difficult to rationalize the common notion about the science being “settled”, as well as justify the untold teraflop-years spent on developing them in the first place.
Models, especially simple (humanely comprehensible) ones are invaluable heuristic tools. But no heuristic tool has predictive value until it is verified experimentally over multiple runs of a wide class of physical systems. BTW, that’s why the case of weather forecast models are not comparable to climate models. Those are verified indeed over a wide set of weather systems daily, while climate models do not have that luxury.

June 25, 2013 5:11 pm

Willis, truly fascinating. I enjoy the way you attack a problem. How you analyze the bases of the issue and pick good starting points where the early cruxes are, where you can bring acute logic to bear on those early decision points to render the simplest conclusions. Then proceed from there up through the feldercarb to the next one, and so on. Coming back later to maybe try some other branch just to see where that leads.
In the process noting some odd association and following your nose to see where that leads. Such as your one-line equation. And then seeing where that might lead.
Very good science Willis. It’s even better watching it play out. Thanks.

June 25, 2013 5:36 pm

My job today was picking out and chiseling loose and carrying away pounds and pounds of wood-rat droppings out of an old pump-house in the rain.

A far more useful occupation than climate modelling 🙂 I’ve been paid to do the same only it was tonnes cowsh!t and goatsh!t. Then when I had carried all that sh!t home I used it in the compost and so was effectively paid twice!
Thanks too for your insights into climate modelling.
keith at hastings uk said @ June 25, 2013 at 11:01 am

I remind myself that planetary motions were well modelled by circles and epicircles etc, pre Copernicus and Newton, but were fundamentally wrong. When gravity and orbits were better understood, it all became simpler and clearer.

The planetary motions continued to be well modelled by circles and epicycles long after Copernicus.Copernicus’ model also used perfect circles and epicycles there being absolutely no need to use any other method of emulating Ptolemy’s results. This meant there was no need to discard Ptolemy’s method which was in fact less complicated than Copernic’s as the latter had many more, albeit smaller, epicycles.
It was Kepler who discovered that the planets moved on elliptical paths rather than perfect circles, an idea that appalled Galileo who refused to read Kepler’s book on the matter, even though Kepler had sent him a copy.

Paul Linsay
June 25, 2013 5:54 pm

If I understood what what’s being done here, the models are fit to the global average temperature anomaly (GATA) to tune them. This means that they are entirely unphysical since GATA is an unphysical quantity. No physical, chemical, or biological process depends on it. It’s supposed to be a proxy for the energy content of the atmosphere, but it isn’t since the energy content is strongly dependent on the humidity of air, which is not well measured. What matters is the temperature map of the Earth, and as Pielke Sr. has said over and over, the models have no skill when it comes to the regional temperatures. The few maps I’ve seen are off by as much as 5 to 10 C.

Theo Goodwin
June 25, 2013 8:28 pm

Berényi Péter says:
June 25, 2013 at 4:32 pm
“In this dire situation sane people would do their best to shoot down (i.e. falsify) as many members of the ensemble as they can (based on lack of “regional skill”, for example). What they do instead is take the average of scores of demonstrably false computational models and call that “ensemble average”, as if an arcane name alone could overcome logic.”
Modelers dare not criticize one another. If criticism were permitted within “the consensus,” the field of modeling would be covered with nothing but rubble.

June 25, 2013 8:28 pm

Willis quotes “The model is tuned (using the threshold relative humidity U00 for the initiation of ice and water clouds) to be in global radiative balance (i.e., net radiation at TOA within 0.5 W m2 of zero) and a reasonable planetary albedo (between 29% and 31%) for the control run simulations. SOURCE ”
That would be fine if relative humidity were the only factor affecting ice and water cloud creation but inevitably its not. And inevitably one of those factors will be temperature. Although the moddlers and people like Mosher cant see it, this is simply a curve fit. And its a curve fit because its the variation in that last mile (well lets say inch actually) that makes all the difference to climate over many iterations.

Brian H
June 25, 2013 9:25 pm

typo: “trend ration”
[Thanks, fixed. -w.]

June 25, 2013 9:33 pm

you quote GISS;

The model is tuned (using the threshold relative humidity U00 for the initiation of ice and water clouds) to be in global radiative balance (i.e., net radiation at TOA within 0.5 W m2 of zero) and a reasonable planetary albedo (between 29% and 31%) for the control run simulations.

Control runs are those done with forcings neutral. They are employed to test the model’s ability to replicate various processes. The trend of these model runs is zero (obviously). The Crowley paper you cited has this in the abstract, for example:

Removal of the forced response from reconstructed temperature time series yields residuals that show similar variability to those of control runs of coupled models, thereby lending support to the models’ value as estimates of low-frequency variability in the climate system.

Complete Paper
Here and there in the article you point out that models are not tuned to temperature observations – which is correct. Also, they are not tuned to temperature trends, a point on which you seem to equivocate in the article (but I could be misunderstanding you).
Components of the climate system are tweaked (parametrised) when it is not possible to model them from basic physics. The observations are the guideline to match for general behaviour – not for trends, and not to match actual observations perfectly. EG, GCMs that replicate ENSO with a fair degree of skill do not purport to predict or post-dict the ups and downs, but instead mimic the amplitude and frequency of the ENSO shifts. (There are models which focus specifically on ENSO that attempt to forecast that metric, but they are not the GCMs you are discussing)
Having read your post, and followed the source material, I’m not sure how you make the leap from tweaking model components to simulate in-system behaviour (periodical events occurring around a mean state, for example), to espousing that the models are fitted in any way to trends. As I understand it, trend-fitting is not what they do, and the stuff you are talking about happens during the control runs, which are trend neutral, having no forcings applied. The trial-and-error method is done to replicate in-system behaviour – they go back to the drawing board on the control runs.
There may be exceptions to this practise, but as I understand it modellers generally do not fit the hindcasts to trends at all. The trial and error method is mainly to replicate in-system behaviour with no forcings. Until the physics are understod better, until monitoring systems can capture more of the detail, and until we have significantly faster processing power, parametrising components of the climate system is the best we can do.
Was unable to open the link to Kiehl 2007, so here is a link to the full paper for others who may be interested in following the sources.
Kiehl’s conclusions are different to yours, citing the different treatment of aerosols in different models, either as a forcing or not, or different loadings, as the main reason for the spread amongst the models.
You have not linked to where you discover the equation that you state drives the model results (is it the basic equation for the energy budget or derivative?), or where you investigae the claims you have made. You have some interesting graphs and assertions, but I do not see any calculations of your own. Have you published/blogged about this in detail somewhere?
I think your description of the trial and error method is plausible, but I can’t see how you go from testing the control runs to ‘tuning’ the forcing runs.

June 25, 2013 9:47 pm

Excellent post, Wiilis! As others have said, this is one of your best (and that’s saying something!)
Now, that you have simplified the models down to a workable level. It seems to me that the obvious next step is to check the modeled temps against the actual measured temps. I am particularily curious how well the models do at replicating the observed ~60 year apparent PDO driven cycle of +/-~0.3C over the period we have good temperature data. I assume that the models don’t do well prior to ~1950.
I am also curious about this statement above:”“TCR” is the temporary climate response to a doubling of CO2. Equilibrium climate sensitivity (ECS) is about 40% larger than the transient response.” – is the finding that ECS is about 40% larger than TCR a finding of the above analysis of models or does it come from another source? In the above, how long a period does a TCR cover?
I would also like to agree with Berenyi Peter’s post above. Models that want to be considered should be forced to make falsifiable predictions not “projections”. Personally, I view the whole purpose of averaging different models together to be a mechanism used solely for the purpose of insulating models from the dangers of falsification. As Willis shows above, if one or more models is on the right track it would relatively straightforward to use it as a basis for evolutionarily approaching a better view of the climate. There is no principled need to average the output of models together to get a better view. All you really accomplish is confusing the issue so no one knows what exactly is supposed to be happening.
Cheers, 🙂

June 25, 2013 9:53 pm

I followed the citations to the Kiehl 2007 paper, and found some discussion of the results. Seems to me you have lit upon a question asked in many climate papers regarding the spread of models and the inverse relationship of forcing/CS. This question has not been resolved, as far as I could discover. I think your post is a good pointer to one of the uncertainties in 20th century modeling. Your conclusions may be a little more emphatic than is warranted, and they’ve certainly been interpreted well beyond the bounds of reasonableness in the comments beneath, but it’s good to read a fairly resonably discussed aspect of climate modeling on this site. Thanks.

June 25, 2013 10:21 pm

Shawnhet “As Willis shows above, if one or more models is on the right track it would relatively straightforward to use it as a basis for evolutionarily approaching a better view of the climate.”
I dont think this is a valid approach. Some of the models in the ensemble will be closer to the measured truth than others. That is the nature of classifying a bunch of things by some arbitrary criteria. The problem is that those models that have apparenrtly been more accurate are still likely to be doing it for the wrong reasons.
Its actually exactly the same problem paleoclimatologists have when selecting their proxies. They cant take ones that match the observed temperatures without introducing bias (and creating hockey sticks as it turns out). Its not valid to select on the criteria you’re trying to measure and this is very well known.
So in fact the best thing they can do is to take the lot, warts and all. I get the impression they dont do that, though as models have to perform well enough to get into the club so there is a selection criteria up front and its incorrectly based on the thing they’re trying to measure.

June 25, 2013 11:27 pm

If I understood what what’s being done here, the models are fit to the global average temperature anomaly (GATA) to tune them.

You have misunderstood. It is speculated (bot not demonstrated), that parametrising various aspects of the climate system based on observed data (and not temperature, not trends) may have incorporated portions of the 20th century trend – not deliberately, but because data of some components of sub-processes within the atmosphere may change over time. (This is what I have gleaned from a cursory look at some of the literature, so do your own cvhecking in case i’ve got it wrong)
Hind-casts are definitely not tuned to observed temperature anomalies, as Willis points out. His own theory is similar to that of ones i’ve read in the literature, although he seems to think that it is the process of filtering out forced model runs that results in the better hindcasts, and may explain the question on inverse forcings/climate sensitivity results across models. His view seems to be as speculative as the others I’ve read, and as I’ve yet to see a detailed analysis for any of the hypotheses – picking apart individual climate models, comparing and contrasting – I don’t think his can be dismissed out of hand.
Hopefully someone with much better knowledge of climate models will weigh in.

June 26, 2013 12:47 am

If Willis is right, this blows the whole thing up. If he is right, he has seen something that has been in plain view, unnoticed, for decades. What has been happening on this account is that there are two variables, temperature, forcing. Senstivity is the relationship between them. If you fix temperature, you have to vary forcing in order to deliver the right values of it. When you do this, you arrive at different values for sensitivity.
There are going to be a very large number of different models with very different levels of forcing. Any particular level of forcing can be accomodated as long as you are prepared to allow sensitivity to vary to the extent that it has to in order to deliver the required temperature. Or put another way, if you have fixed sensitivity, then the only way of delivering the required temperatures is to vary forcing.
People have always alleged that aerosol forcings were being used in this way. Without the right level of negative values for these forcings it was argued that the models would show wildly excessive temperatures, and thus be invalidated.
Willis’ one line equation seems to show that something like this is happening to the models. This is why different models with wildly differing senstitivities can output the same temperature results. They have been made to do so by varying the inputs, and when you do this, the only ones available to vary are the forcings, and the result will be varied sensitivity.
Its quite fundamental, if its correct, and if its correct it destroys the foundations of the whole thing. I guess it would throw everyone back to two questions. One will be what exactly the forcings have been historically. The second will be whether the lagged linear equation is correct as a model. Because it turns out that is all there has ever been. The Met Office did not need their super computer to run it either, it can be run on any desktop.
Blows the whole thing up if correct.

Ian Wilson
June 26, 2013 8:07 am

Here’s a thousand year model that might be worth investigating further:

June 26, 2013 8:09 am

A couple points.
Yes, at its core a GCM can be replaced with a simple lag and scaling of the forcings. That is if you are only interested in the lowest dimensional output : a single time series.
Next: you should understand now why I say you are nor correct when you say the forcings are vastly different.
Other than that what’s shown here is why the models are the best we have in terms of understanding the past and projecting the future.
Certainly not good enough for some policy decisions, but clearly good enough for others.

June 26, 2013 10:48 am

If Willis is right, this blows the whole thing up. If he is right, he has seen something that has been in plain view, unnoticed, for decades

It has not been unnoticed for decades.
These are papers that cite Kiehl 2007,
This is one of the papers Kiehl cites making the point you reiterate.
Kiehl did not originate the observation, nor did it end with him.
Willis has lit upon an uncertainty issue that has beeen discussed in the literature. The citations above are only a handful of the papers that have engaged the issue.

June 26, 2013 10:51 am

Mosher, yes. Uncertainty is not policy prescriptive, but neither does it recommend eschewing policy. It’s a risk management thing.

Neil Jordan
June 26, 2013 12:49 pm

Re barry says: June 26, 2013 at 10:48 am
“If Willis is right, this blows the whole thing up. If he is right, he has seen something that has been in plain view, unnoticed, for decades
It has not been unnoticed for decades.
These are papers that cite Kiehl 2007,
[… end quote]
See at the bottom of Page 3 of 6:
“The century-long lifetime of atmospheric CO2 and the anticipated future decline in atmospheric aerosols mean that greenhouse gases will inevitably emerge as the dominant forcing of climate change…”
Residence time of CO2 in the atmosphere is on the order of a decade, not ten decades, based on actual measurements and experience from atmospheric weapons testing. Reference “Environmental Radioactivity”, Eisenbud & Gesell, 4th ed., 1997.
For additional information, see my post on July 14, 2012 at 11:26 pm

%d bloggers like this: