Guest Post by Willis Eschenbach
Now that my blood pressure has returned to normal after responding to Dr. Trenberth, I returned to thinking about my earlier somewhat unsatisfying attempt to make a very simple emulation of the GISS Model E (herinafter GISSE) climate model. I described that attempt here, please see that post for the sources of the datasets used in this exercise.
After some reflection and investigation, I realized that the GISSE model treats all of the forcings equally … except volcanoes. For whatever reason, the GISSE climate model only gives the volcanic forcings about 40% of the weight of the rest of the forcings.
So I took the total forcings, and reduced the volcanic forcing by 60%. Then it was easy, because nothing further was required. It turns out that the GISSE model temperature hindcast is that the temperature change in degrees C will be 30% of the adjusted forcing change in watts per square metre (W/m2). Figure 1 shows that result:
Figure 1. GISSE climate model hindcast temperatures, compared with temperatures hindcast using the formula ∆T = 0.3 ∆Q, where T is temperature and Q is the same forcings used by the GISSE model, with the volcanic forcing reduced by 60%.
What are the implications of this curious finding?
First, a necessary detour into black boxes. For the purpose of this exercise, I have treated the GISS-E model as a black box, for which I know only the inputs (forcings) and outputs (hindcast temperatures). It’s like a detective game, trying to emulate what’s happening inside the GISSE black box without being able to see inside.
The resulting emulation can’t tell us what actually is happening inside the black box. For example, the black box may take the input, divide it by four, and then multiply the result by eight and output that number.
Looking at this from the outside of the black box, what we see is that if we input the number 2, the black box outputs the number 4. We input 3 and get 6, we input 5 and we get 10, and so on. So we conclude that the black box multiplies the input by 2.
Of course, the black box is not actually multiplying the input by 2. It is dividing by 4 and multiplying by 8. But from outside the black box that doesn’t matter. It is effectively multiplying the input by 2. We cannot use the emulation to say what is actually happening inside the black box. But we can say that the black box is functionally equivalent to a black box that multiplies by two. The functional equivalence means that we can replace one black box with the other because they give the same result. It also allows us to discover and state what the first black box is effectively doing. Not what it is actually doing, but what it is effectively doing. I will return to this idea of functional equivalence shortly.
METHODS
Let me describe what I have done to get to the conclusions in Figure 1. First, I did a multiple linear regression using all the forcings, to see if the GISSE temperature hindcast could be expressed as a linear combination of the forcing inputs. It can, with an r^2 of 0.95. That’s a good fit.
However, that result is almost certainly subject to “overfitting”, because there are ten individual forcings that make up the total. With so many forcings, you end up with lots of parameters, so you can match most anything. This means that the good fit doesn’t mean a lot.
I looked further, and I saw that the total forcing versus temperature match was excellent except for one forcing — the volcanoes. Experimentation showed that the GISSE climate model is underweighting the volcanic forcings by about 60% from the original value, while the rest of the forcings are given full value.
Then I used the total GISS forcing with the appropriately reduced volcanic contribution, and we have the result shown in Figure 1. Temperature change is 30% of the change in the adjusted forcing. Simple as that. It’s a really, really short methods section because what the GISSE model is effectively doing is really, really simple.
DISCUSSION
Now, what are (and aren’t) the implications available within this interesting finding? What does it mean that regarding temperature, to within an accuracy of five hundredths of a degree (0.05°C RMS error) the GISSE model black box is functionally equivalent to a black box that simply multiplies the adjusted forcing times 0.3?
My first implication would have to be that the almost unbelievable complexity of the Model E, with thousands of gridcells and dozens of atmospheric and oceanic levels simulated, and ice and land and lakes and everything else, all of that complexity masks a correspondingly almost unbelievable simplicity. The modellers really weren’t kidding when they said everything else averages out and all that’s left is radiation and temperature. I don’t think the climate works that way … but their model certainly does.
The second implication is an odd one, and quite important. Consider the fact that their temperature change hindcast (in degrees) is simply 0.3 times the forcing change (in watts per meter squared). But that is also a statement of the climate sensitivity, 0.3 degrees per W/m2. Converting this to degrees of warming for a doubling of CO2 gives us (0.3°C per W/m2) times (3.7 W/m2 per doubling of CO2), which yields a climate sensitivity of 1.1°C for a doubling of CO2. This is far below the canonical value given by the GISSE modelers, which is about 0.8°C per W/m2 or about 3°C per doubling.
The third implication is that there appears to be surprisingly little lag in their system. I can improve the fit of the above model slightly by adding a lag term based on the change in forcing with time d(Q)/dt. But that only improves the r^2 to 0.95, mainly by clipping the peaks of the volcanic excursions (temperature drops in e.g. 1885, 1964). A more complex lag expression could probably improve that, but with the initial expression having an r^2 of 0.92, that only leaves 0.08 of room for improvement, and some of that is surely random noise.
The fourth implication is that the model slavishly follows the radiative forcings. The model results are a 5-run average, so it is not clear how far an individual model run might stray from the fold. But since the five runs’ temperatures average out so close to 0.3 times the forcings, no individual one of them can be very far from the forcings.
Anyhow, that’s what I get out of the exercise. Further inferences, questions, objections, influences and expansions welcomed, politeness roolz, and please, no speculation about motives. Motives don’t matter.
w.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.

Joel Shore says:
January 18, 2011 at 5:23 pm
Ah, yes, the intrusions of real life, dang ’em … my apologies, I should’a known.
I can get a better fit using a lagging term, I thought I already said that. Looking back, I see I did say that. But it is small. I assume that this is for the reason that you cited, that most of the change takes place quickly, with a long tail.
I’ll take a further look at the question. However, I was shocked to see the excellence of the fit I obtained above. I don’t think it means we throw the current generation of models in the trash, I have other more valid reasons for doing that …
More reporting as the time and the data come available, my condolences on grading the intro physics exams, and of course my best to you,
w.
PS – if you liked this one, you’ll love my upcoming report. Plenty of equations to pull to pieces there …
“CAN WE TRUST CLIMATE MODELS?” Is an article at Yale e360 readers may find interesting.
I quite liked this quote…
You see, 2 wrongs do make a right. And why not? models are mathematics, and in mathematics, multiply 2 negatives, you get a positive.
Lets hope there is an even number of models, else their output will be wrong.
Jim D says:
“I say this because, and it may be obvious, the volcano forcing is high-frequency spikes, so any kind of time-averaging of the forcing will diminish their effect automatically without need for the fudge factor.”
doh …. (slaps head) …. why didn’t I think of that!
Still, I was half way there because I was thinking what would happen if you added a simple exponential averaging term rather than just time shift.
Willis
It is not necessary to loose time talking about lags.
What your interesting experiment shows is that there is NO lag in the model.
Actually as your fit is much better in the recent decades where the forcing are supposed to have varied much more than in the beginning , it is a further proof that there is (almost) no lag in the model.
Everything else about lags seemed rather irrelevant to me.
Argument1: There is a lag in the real Earth . Might very well be but you don’t look at the real Earth , you look at a model . The argument is irrelevant to what you do .
Argument2: There must be a lag in the model. Might very well be but the output shows a completely lagless behaviour. So whether there is or is not a lag programmed in the model , it behaves like if there was none for whatever reason . The argument is also irrelevant to what you do.
Argument 3: There is lag in the model but it is longer than 120 years so that you don’t see it. Might very well be but we have only 120 points. It appears anyway strange that the lag would be longer than the period about which we have at least some half reasonable data . Again an argument irrelevant to what you do .
Last I will stress how important it is that people do the kind of experiments you do .
I must say that I find it flabbergasting (if you have not done some mistake in the experiment) that the output of a multimillion program over a huge time period of 120 years reduces with an excellent accuracy to a lagless multiplication.
And of course from that follows that such a model has nothing to do with the real Earth even if your purpose was not to examine this part of the problem.
Joel Shore says:
January 18, 2011 at 5:03 pm
You missed the part where they used a different reference period than the period when CO2 concentration is held constant 2000. Reference period 1980 to 1999.
“The multi-model average warming for all radiative forcing agents held constant at year 2000 (reported earlier for several of the models by Meehl et al., 2005c), is about 0.6°C for the period 2090 to 2099 relative to the 1980 to 1999 reference”
And about 0.1C per decade in the first few decades could mean anything from the IPCC – how about 0.05C per decade for the first decade and a half and then not much after that.
Simple chart from the IPCC shows very little lag warming after a constant concentration in 2000 – certainly not 0.1C per decade. Between 0.1C and 0.2C per century.
http://www.ipcc.ch/graphics/ar4-wg1/jpg/fig-10-4.jpg
Willis Eschenbach says:
January 18, 2011 at 2:55 pm
Thanks for the response, Willis.
I have bad news and good news.
First the bad news…
I replicated your results last night and analysed the error function. Unsurprisingly it fails every test for homoscedasticity. Even worse the change in variance is exactly of the shape to be expected (i.e. predicted in hindsight) by your critics, and therefore your analysis is easily rebutted. But wait for the really good news later.
You have replaced a time-dependent response function (delta T), which needs to be integrated forwards , with a single first year impulse response. One might therefore expect (the Team will argue) that the variance of the residuals should be high at the start of the emulation (because of unaccounted-for effects in your model from the forcings in the “spin-up” of the GISS model), reasonable in the middle (because you are free-fitting your “sensitivity” coefficient of 0.3), and should increase towards the end of the plot as the second order errors accumulate from your (curtailing) approximation of the response function. Unfortunately for your analysis, these characteristics are exactly what the error statistics actually reveal. So it would be easy to dismiss your 0.3 value as a meaningless artefact of curve fitting which reflects at best only the short-term transient response to the forcing. If you publish in the present form, I am pretty certain this is the response you will get, among other less polite replies.
(And the good news…) HOWEVER, inspired by your approach, I got to thinking about overcoming this problem by actually building a “forward integration” model. I succeeded in doing so in the early hours of this morning. So, instead of replacing the temperature response function with a single year temperature response, I built a model which actually uses a temperature response function with a physics pedigree, and went through the process of numerical integration by superposition. The result is shown here:
http://img408.imageshack.us/img408/7152/imagessuperpositionmode.jpg
Yes, there really are two lines on the first plot, but you have to squint to tell the difference. Adjusted correlation coefficient is 98.5%. Residual sum of squares is 0.056. So how do you like them apples? Not only are the statistics a lot better, but this model cannot be refuted using the arguments you have heard so far concerning transient vs equilibrium effects.
The temperature response function that this uses is the solution to the single heat-capacity energy balance model referenced in Schwartz 2007.
ΔT = λ F (1 – exp(-t/ τ))
Where:
– ΔT = the change in temperature over time due to the single impulse forcing F
– F is a single impulse forcing applied at time t=0
– λ is the equilibrium climate sensitivity expressed in deg C per W/m2
– τ is the time equilibration constant (years)
– t is the time (years)
As t goes to infinity this expression unequivocally goes to λ F – the EQUILIBRIUM temperature change associated with the forcing. τ controls the shape of the curve, specifically how fast or slow the equilibrium is achieved.
For any given forcing, in the n+1th year after the application of the forcing, we can trivially write the change in temperature in year n+1 due to this forcing, given by
ΔT(n+1) – ΔT(n) = λ F (exp(-n / τ) – exp(-(n +1)/ τ))
The total temperature change in any given year is then the sum of all such expressions from previous forcings. Mathematically this is identical to numerical integration by superposition.
After setting up the programme, I used a brute-force optimiser to select the parameters, minimising the RSS.
The answer was:-
λ = 0.335
τ = 2.613
Volcano factor (applied to the forcing) = 0.724
So the GISS E results can be matched to an incredible accuracy with just a 3 parameter model, and as you suspected, the de facto equilibrium climate sensitivity in the GISS E model is much lower than we normally hear about and the model results can be matched with a much shorter equilibration time than we normally hear about.
I have a lot more commentary, including some thoughts on the volcano issue, but I think I should pause here after this mammoth post to see what your thoughts are!
Question to Paul_K from a lay lurker before he attempts to replicate your work:
Is your equation, i.e., ΔT = λ F (1 – exp(-t/ τ)), intended to be the response to an impulse, i.e., to a one-time slug of F watt-years/m^2 of energy at t = 0, or is it the response to a step, i.e., to a forcing component that is zero before t = 0 and thereafter is F watts/m^2 forever? To this layman, it is ΔT(t) = λ F exp(-t/ τ) that would look more like an impulse response (which–after dropping the F–would be convolved with the historical forcings to get the hindcast temperature anomaly ΔT(t)).
I think I’m questioning only the “impulse” nomenclature and not the result.
Re: Bill Illis says:
January 19, 2011 at 4:48 am
“The multi-model average warming for all radiative forcing agents held constant at year 2000 (reported earlier for several of the models by Meehl et al., 2005c), is about 0.6°C for the period 2090 to 2099 relative to the 1980 to 1999 reference”
I see what you mean. That is an extraordinary way to present these data.
For what it’s worth in the superposition model which I discussed above, there is of course some continued warming because the effects of forcings towards the end of the 20th century haven’t fully worked through. I tested the model by switching off all forcing changes after 2000, and this results in a further temperature gain of 0.11 deg C by 2020. After that there is negligible temperature change. This seems to be about 60% of the gain reported by the IPCC (eyeballing your graph).
One problem with interpreting the IPCC’s “constant composition” data is that we only know that the GHG forcing (change) has been halted and don’t know whether or not other internal or external forcings are frozen for this sensitivity.
Joe Born says:
January 19, 2011 at 3:00 pm
“Question to Paul_K from a lay lurker before he attempts to replicate your work:
Is your equation, i.e., ΔT = λ F (1 – exp(-t/ τ)), intended to be the response to an impulse, i.e., to a one-time slug of F watt-years/m^2 of energy at t = 0, or is it the response to a step, i.e., to a forcing component that is zero before t = 0 and thereafter is F watts/m^2 forever?”
It is the response to a step change of F w/m2 held forever. So in the calculation each year, it is necessary to take the CHANGE in forcing from the previous year, and use that as the basis for calculating the forward temperature response to be added in.
I will try to post a reference for the Schwartz paper. I know that it is available in pdf.
In terms of replication, I’ll be quite happy to make the code available (it’s just an Excel spreadsheet) and let people play with it, but need some help from Willis (or Anthony?) to do that.
For those wanting to understand the derivation of the temperature response function I used above, the Schwatz 2007 paper is to be found here:
http://www.ecd.bnl.gov/steve/pubs/HeatCapacity.pdf
Damn. Paul K. Speechless. Slick idea using Schwartz.
Here is a thought.
Go get the forcings for the various SRES. these are the future scenarios.
Emulate the projections.
http://www.ipcc-data.org/ddc_co2.html
The real test I suppose would be seeing how your model reacts if you take C02 forcing to Zero as gavin did in a recent GCM experiment..
Stumped for now. I know that doesnt help
Paul_K says:
January 19, 2011 at 12:56 pm
Fascinating. Thanks for the great info. I’ll need a bit of time to digest it, but it certainly a) gives a convincing fit and b) answers the questions about the time lag.
More later,
w.
Paul_K, now that’s impressive!
Now I’m going to ask you a big one, are you willing to make it about four times harder and extend your program. You fit that close using the three parameters but can you go back to Willis’s last article, his first stab, with the 10 forcings and simultaneously fit all 10 forcings to fit with minimal RSS to GISS temps? That is what I did but just using linear and excels solver to do the grunt work. You just might push the r2 to .99xxx and would give us much better view of what of the 10 forcings are really in play and how much. Mine even detected some forcings being inversely correlated and others had minimal effects. A little table of the fitted forcing weight I cam up with is here, your method should really tighten them up.
Good work! Impulses… and that makes perfect physical sense how to smear the lags.
Paul_K:
Thanks for the response–and for your excellent work.
I would indeed welcome the spreadsheet, since my attempt at replication suggests either that I improperly disambiguated your explanation or that I was using the wrong input values. (Also, it would be helpful to us laymen if you could suggest tools we might use to obtain the lambda, tau, and volcano-fudge-factor numbers for ourselves.)
Anyway, to demonstrate how imperfectly some of us less-gifted readers may have comprehended your comment, I’ll observe that your [1 – exp(-t/ τ)] λ step response (if it is indeed a step response) corresponds to an impulse response of (λ/τ) exp(-t/ τ). Since (as I understand it) you chose an approach considerably more complicated than simply convolving that impulse response with the composite input F ( where F=sum of nine constituent inputs + volcano fudge factor times the remaining, volcano constituent, right?) , i.e, than simply using ΔT(n+1) = λF(n+1)/τ + ΔT(n) exp(-1/ τ)), I assume there’s a subtlety I failed to grasp.
So any hints you find time to give will be welcome.
Bill Illis says:
Bill,
I just blow up that graph and did the measurement and it looks like the central value for the rise from 2000 to 2100 is close to 0.4 C. As you noted, this is a bit off the 0.6 C that they discussed, presumably because their baseline for that number is actually the 20 year period under in 2000. However, I cannot see how you possibly got 0.1 to 0.2 C of warming over the century from that graph.
Paul_K says:
January 19, 2011 at 3:47 pm (Edit)
Glad to do what I can, what language is the code in? I’ll shoot at replicating your elegant method in spreadsheet by hand, and any code in any language is always welcome by me.
Seems like very nice work, Paul. Makes me think of running it the other direction, seeing if the parameters used by GISS et al. can be found by brute force against the GISSTEMP & GISS forcing data …
The usual argument is that sure, you can replicate the climate models results using a few parameters … but you need the complexity of the models in order to determine those parameters.
I suspect, however, that the decades of tuning and refining the climate model to fit the past is merely a slow and cumbersome way to do a brute force determination of lambda, tau, and the volcano factor, using their assumed forcings and the GISSTEMP … I can see work ahead.
w.
Paul_K says:
January 19, 2011 at 3:47 pm
Sorry, I misunderstood. Easiest way is to post it up on the web somewhere, and then just post a link to it.
w.
Paul_K: I agree that Schwartz’s paper is a good starting point. However, there were several comments on that paper and a reply by Schwartz in which he made some changes (and got a higher estimate of the climate sensitivity).
In particular, I think that one of the criticisms of Schwartz was precisely that it diagnosed too low a climate sensitivity when it was applied to temperature data generated by model runs themselves. (In such a case, the actual climate sensitivity of the model is known, so you are testing the method’s ability to correctly reproduce a known answer.) So, I think one would have to understand these issues and how one might correct them before one could diagnose the climate sensitivity of a model (or the real world) using this method.
Willis,
Thanks for the comments. I thought you might like it.
I don’t know how to post a spreadsheet onto a URL accessible from WUWT. I am blog-ignorant. Can you point me to a guide?
Paul
REPLY: Sign up for Google Docs, then it will give you a URL to share that spreadsheet from – Anthony
Steven Mosher,
The problem, Mosh, is that I don’t know what forcings GISS has used in any of its projections. I can estimate the direct forcing from CO2, of course, but I can’t find any statement of other forcings.
I did try, as I reported above, switching off all forcings in 2000, and allowing temperature to equilibrate, but without knowing what GISS has done with aerosols and solar for example, it is difficult to match projected results.
Joe Born,
“Anyway, to demonstrate how imperfectly some of us less-gifted readers may have comprehended your comment, I’ll observe that your [1 – exp(-t/ τ)] λ step response (if it is indeed a step response) corresponds to an impulse response of (λ/τ) exp(-t/ τ).”
No, Joe. I thiknk that the nomenclature is getting you a bit. In climate science the term impulse forcing is used to describe a step change in forcing which lasts forever. (I know that this is confusingly different from typical use of the term in dynamics.) The impulse character is observed in the perturbation of net flux. An impulse forcing F applied to a system in radiative balance results in a perturbation in radiative flux which is F at t=0, and which declines to 0 as t becomes large. For example, say that the sun suddenly starts spitting out an additional 1 W/m2 in net received flux at TOA and stays at that level. The impulse forcing is F=1 w/m2. The initial perturbation to the flux balance is equal to F at t=0. However, pretty soon the Earth is going to start to heat up, until eventually the outgoing radiation balances the incoming. At that point, the perturbation in radiative flux is reduced to zero. I don’t like this terminology, but I didn’t invent it. The term impulse is typically used to distinguish the forcing from a continuously changing or ccumulative forcing. However, any cumulative forcing can be represented as the sum of a series of impulse forcings, and that’s all I am doing here. The temperature response function to an impulse forcing is the temperature response functino and doesn’t need to be differentiated. It does however need to be partitioned in time i.e. expressed as the sum of a series of annual temperature increments. My complicated-looking expression for DT(n+1) -DT(n) is just calculating these annual increments from the temperature response function.
Paul_K:
You can also sign up for a Dropbox account if you don’t have a Google account already. Or, if you don’t want to sign up for anything and the spreadsheet is small enough, you can e-mail it to me at troy_ca (at) live (dot) com and I’ll just host it in my public Dropbox for you and send the link.
Regardless, as I said at Lucia’s, this seems pretty amazing, and it should be fun to tinker with.
OK, Willis et al,
You should be able to pick up the spreadsheet from this link:
https://spreadsheets.google.com/ccc?key=0AtwVGjhtohA2dEN4ZW9JOG83ejZydFFlUWMwTkpHZ0E&hl=en&authkey=COu7lc8G
The critical worksheet is called “Willis”, and contains a replication of Willis’s fit up to column “R’. To the right of Column R are the calculations of the temperature differences to be added year by year. The blue box shows the parameters which control the results.
Paul
Wayne:
You wrote
“…can you go back to Willis’s last article, his first stab, with the 10 forcings and simultaneously fit all 10 forcings to fit with minimal RSS to GISS temps?”
This is a bit challenging. If I separate out all of the forcings for separate characterisation, I would end up looking at a climate sensitivity and equilibration time constant for each of them giving me 20 coefficients to fit! I am fairly sure that we couldn’t extract meaningful results this way.
However, there is potentially another way to skin this cat and address a separate question. Do some of the forcings perhaps have a much longer equilibration time and higher sensitivity – which is lost in the grouping of all such forcings?
The simpler way to test this is to abstract each of the forcings in turn and assign it its own parameters, leaving the remainder all grouped. We can then test if the assignment of the additional two parameters and the loss of two degrees of freedom really does improve the match. This seems a worthwhile test. I will do it and post if Willis doesn’t beat me to it.
Joel,
You wrote:
“In particular, I think that one of the criticisms of Schwartz was precisely that it diagnosed too low a climate sensitivity when it was applied to temperature data generated by model runs themselves. (In such a case, the actual climate sensitivity of the model is known, so you are testing the method’s ability to correctly reproduce a known answer.)”
Joel, I understand that Schwartz revised his estimate of climate sensitivity in 2009 in response to criticisms of his previous underestimation of autocorrelation in the temperature time series. (His updated value came in at the lower boundary of estimates from the CMIP suite.) I would be very interested in any references to the work you are talking about, since this is central to the issue here. One serious question is this. You generate a number of runs in a climate model, and produce a number of temperature series for a given set of forcings. Now how do you calculate what the equilibium climate sensitivity really is in the model? If someone in the literature has addressed this question I would really like to know how it is done. Thanks.