Guest Post by Willis Eschenbach
The GISS Model E is the workhorse of NASA’s climate models. I got interested in the GISSE hindcasts of the 20th century due to an interesting posting by Lucia over at the Blackboard. She built a simple model (which she calls “Lumpy”) which does a pretty good job of emulating the GISS model results, using only a model including forcings and a time lag. Stephen Mosher points out how to access the NASA data here (with a good discussion), so I went to the NASA site he indicated and got the GISSE results he points to. I plotted them against the GISS version of the global surface air temperature record in Figure 1.
Figure 1. GISSE Global Circulation Model (GCM or “global climate model”) hindcast 1880-1900, and GISS Global Temperature (GISSTemp) Data. Photo shows the new NASA 15,000-processor “Discover” supercomputer. Top speed is 160 trillion floating point operations per second (a unit known by the lovely name of “teraflops”). What it does in a day would take my desktop computer seventeen years.
Now, that all looks impressive. The model hindcast temperatures are a reasonable match both by eyeball and mathematically to the observed temperature. (R^2 = 0.60). True, it misses the early 20th century warming (1920-1940) entirely, but overall it’s a pretty close fit. And the supercomputer does 160 teraflops. So what could go wrong?
To try to understand the GISSE model, I got the forcings used for the GISSE simulation. I took the total forcings, and I compared them to the GISSE model results. The forcings were yearly averages, so I compared them to the yearly results of the GISSE model. Figure 2 shows a comparison of the GISSE model hindcast temperatures and a linear regression of those temperatures on the total forcings.
Figure 2. A comparison of the GISSE annual model results with a linear regression of those results on the total forcing. (A “linear regression” estimates the best fit of the forcings to the model results). Total forcing is the sum of all forcings used by the GISSE model, including volcanos, solar, GHGs, aerosols, and the like. Deep drops in the forcings (and in the model results) are the result of stratospheric aerosols from volcanic eruptions.
Now to my untutored eye, Fig. 2 has all the hallmarks of a linear model with a missing constant trend of unknown origin. (The hallmarks are the obvious similarity in shape combined with differing trends and a low R^2.) To see if that was the case I redid my analysis, this time including a constant trend. As is my custom, I merely included the years of the observation in the analysis to get that trend. That gave me Figure 3.
Figure 3. A comparison of the GISSE annual model results with a regression of the total forcing on those results, including a constant annual trend. Note the very large increase in R^2 compared to Fig. 2, and the near-perfect match of the two datasets.
There are several surprising things in Figure 3, and I’m not sure I see all of the implications of those things yet. The first surprise was how close the model results are to a bozo simple linear response to the forcings plus the passage of time (R^2 = 0.91, average error less than a tenth of a degree). Foolish me, I had the idea that somehow the models were producing some kind of more sophisticated, complex, lagged, non-linear response to the forcings than that.
This almost completely linear response of the GISSE model makes it trivially easy to create IPCC style “scenarios” of the next hundred years of the climate. We just use our magic GISSE formula, that future temperature change is equal to 0.13 times the forcing change plus a quarter of a degree per century, and we can forecast the temperature change corresponding to any combination of projected future forcings …
Second, this analysis strongly suggests that in the absence of any change in forcing, the GISSE model still warms. This is in agreement with the results of the control runs of the GISSE and other models that I discussed st the end of my post here. The GISSE control runs also showed warming when there was no change in forcing. This is a most unsettling result, particularly since other models showed similar (and in some cases larger) warming in the control runs.
Third, the climate sensitivity shown by the analysis is only 0.13°C per W/m2 (0.5°C per doubling of CO2). This is far below the official NASA estimate of the response of the GISSE model to the forcings. They put the climate sensitivity from the GISSE model at about 0.7°C per W/m2 (2.7°C per doubling of CO2). I do not know why their official number is so different.
I thought the difference in calculated sensitivities might be because they have not taken account of the underlying warming trend of the model itself. However, when the analysis is done leaving out the warming trend of the model (Fig. 2), I get a sensitivity of 0.34°C per W/m2 (1.3°C per doubling, Fig. 2). So that doesn’t solve the puzzle either. Unless I’ve made a foolish mathematical mistake (always a possibility for anyone, check my work), the sensitivity calculated from the GISSE results is half a degree of warming per doubling of CO2 …
Troubled by that analysis, I looked further. The forcing is close to the model results, but not exact. Since I was using the sum of the forcings, obviously in their model some forcings make more difference than other forcings. So I decided to remove the volcano forcing, to get a better idea of what else was in the forcing mix. The volcanos are the only forcing that makes such large changes on a short timescale (months). Removing the volcanos allowed me to regress all of the other forcings against the model results (without volcanos), so that I could see how they did. Figure 4 shows that result:
Figure 4. All other forcings regressed against GISSE hindcast temperature results after volcano effect is removed. Forcing abbreviations (used in original dataset): W-M_GHGs = Well Mixed Greenhouse Gases; O3 = Ozone; StratH2O = Stratospheric Water Vapor; Solar = Energy From The Sun; LandUse = Changes in Land Use and Land Cover; SnowAlb = Albedo from Changes in Snow Cover; StratAer = Stratospheric Aerosols from volcanos; BC = Black Carbon; ReflAer = Reflective Aerosols; AIE = Aerosol Indirect Effect. Numbers in parentheses show how well the various forcings explain the remaining model results, with 1.0 being a perfect score. (The number is called R squared, usually written R^2) Photo Source
Now, this is again interesting. Once the effect of the volcanos is removed, there is very little difference in how well the other forcings explain the remainder. With the obvious exception of solar, the R^2 of most of the forcings are quite similar. The only two that outperform a simple straight line are stratospheric water vapor and GHGs, and that is only by 0.01.
I wanted to look at the shape of the forcings to see if I could understand this better. Figure 5 has NASA GISS’s view of the forcings, shown at their actual sizes:
Figure 5: The radiative forcings used by the GISSE model as shown by GISS. SOURCE
Well, that didn’t tell me a lot (not GISS’s fault, just the wrong chart for my purpose), so I took the forcing data, standardized it, and took a look at the forcings in a form in which they could be seen. I found out the reason that they all fit so well lies in the shape of the forcings. All of them increase slowly (either negatively or positively) until 1950. After that, they increase more quickly. To see these shapes, it is necessary to standardize the forcings so that they all have the same size. Figure 6 shows what the forcings used by the model look like after standardization:
Figure 6. Forcings for the GISSE model hindcast 1880-2003. Forcings have been “standardized” (set to a standard deviation of 1.0) and set to start at zero as in Figure 4.
There are several oddities about their forcings. First, I had assumed that the forcings used were based at least loosely on reality. To make this true, I need to radically redefine “loosely”. You’ll note that by some strange coincidence, many of the forcings go flat from 1990 onwards … loose. Does anyone believe that all those forcings (O3, Landuse, Aerosol Indirect, Aerosol Reflective, Snow Albedo, Black Carbon) really stopped changing in 1990? (It is possible that this is a typographical or other error in the dataset. This idea is supported by the slight post-1990 divergence of the model results from the forcings as seen in Fig. 3)
Next, take a look at the curves for snow albedo and black carbon. It’s hard to see the snow albedo curve, because it is behind the black carbon curve. Why should the shapes of those two curves be nearly identical? … loose.
Next, in many cases the “curves” for the forcings are made up of a few straight lines. Whatever the forcings might or might not be, they are not straight lines.
Next, with the exception of solar and volcanoes, the shape of all of the remaining forcings is very similar. They are all highly correlated, and none of them (including CO2) is much different from a straight line.
Where did these very strange forcings come from? The answer is neatly encompassed in “Twentieth century climate model response and climate sensitivity”, Kiehl, GRL 2007 (emphasis mine):
A large number of climate modeling groups have carried out simulations of the 20th century. These simulations employed a number of forcing agents in the simulations. Although there are established data for the time evolution of well-mixed greenhouse gases [and solar and volcanos although Kiehl doesn’t mention them], there are no established standard datasets for ozone, aerosols or natural forcing factors.
Lest you think that there is at least some factual basis to the GISSE forcings, let’s look again at black carbon and snow albedo forcing. Black carbon is known to melt snow, and this is an issue in the Arctic, so there is a plausible mechanism to connect the two. This is likely why the shapes of the two are similar in the GISSE forcings. But what about that shape, increasing over the period of analysis? Here’s one of the few actual records of black carbon in the 20th century, from 20th-Century Industrial Black Carbon Emissions Altered Arctic Climate Forcing, Science Magazine (paywall)
Figure 7. An ice core record from the Greenland cap showing the amount of black carbon trapped in the ice, year by year. Spikes in the summer are large forest fires.
Note that rather than increasing over the century as GISSE claims, the observed black carbon levels peaked in about 1910-1920, and have been generally decreasing since then.
So in addition to the dozens of parameters that they can tune in the climate models, the GISS folks and the other modelers got to make up some of their own forcings out of the whole cloth … and then they get to tell us proudly that their model hindcasts do well at fitting the historical record.
To close, Figure 8 shows the best part, the final part of the game:
Figure 8. ORIGINAL IPCC CAPTION (emphasis mine). A climate model can be used to simulate the temperature changes that occur from both natural and anthropogenic causes. The simulations in a) were done with only natural forcings: solar variation and volcanic activity. In b) only anthropogenic forcings are included: greenhouse gases and sulfate aerosols. In c) both natural and anthropogenic forcings are included. The best match is obtained when both forcings are combined, as in c). Natural forcing alone cannot explain the global warming over the last 50 years. Source
Here is the sting in the tale. They have designed the perfect forcings, and adjusted the model parameters carefully, to match the historical observations. Having done so, the modelers then claim that the fact that their model no longer matches historical observations when you take out some of their forcings means that “natural forcing alone cannot explain” recent warming … what, what?
You mean that if you tune a model with certain inputs, then remove one or more of the inputs used in the tuning, your results are not as good as with all of the inputs included? I’m shocked, I tell you. Who would have guessed?
The IPCC actually says that because the tuned models don’t work well with part of their input removed, this shows that humans are the cause of the warming … not sure what I can say about that.
What I Learned
1. To a very close approximation (R^2 = 0.91, average error less than a tenth of a degree C) the GISS model output can be replicated by a simple linear transformation of the total forcing and the elapsed time. Since the climate is known to be a non-linear, chaotic system, this does not bode well for the use of GISSE or other similar models.
2. The GISSE model illustrates that when hindcasting the 20th century, the modelers were free to design their own forcings. This explains why, despite having climate sensitivities ranging from 1.8 to 4.2, the various climate models all provide hindcasts which are very close to the historical records. The models are tuned, and the forcings are chosen, to do just that.
3. The GISSE model results show a climate sensitivity of half a degree per doubling of CO2, far below the IPCC value.
4. Most of the assumed GISS forcings vary little from a straight line (except for some of them going flat in 1990).
5. The modelers truly must believe that the future evolution of the climate can be calculated using a simple linear function of the forcings. Me, I misdoubts that …
In closing, let me try to anticipate some objections that people will likely have to this analysis.
1. But that’s not what the GISSE computer is actually doing! It’s doing a whole bunch of really really complicated mathematical stuff that represents the real climate and requires 160 teraflops to calculate, not some simple equation. This is true. However, since their model results can be replicated so exactly by this simple linear model, we can say that considered as black boxes the two models are certainly equivalent, and explore the implications of that equivalence.
2. That’s not a new finding, everyone already knew the models were linear. I also thought the models were linear, but I have never been able to establish this mathematically. I also did not realize how rigid the linearity was.
3. Is there really an inherent linear warming trend built into the model? I don’t know … but there is something in the model that acts just like a built-in inherent linear warming. So in practice, whether the linear warming trend is built-in, or the model just acts as though it is built-in, the outcome is the same. (As a side note, although the high R^2 of 0.91 argues against the possibility of things improving a whole lot by including a simple lagging term, Lucia’s model is worth exploring further.)
4. Is this all a result of bad faith or intentional deception on the part of the modelers? I doubt it very much. I suspect that the choice of forcings and the other parts of the model “jes’ growed”, as Topsy said. My best guess is that this is the result of hundreds of small, incremental decisions and changes made over decades in the forcings, the model code, and the parameters.
5. If what you say is true, why has no one been able to successfully model the system without including anthropogenic forcing?
Glad you asked. Since the GISS model can be represented as a simple linear model, we can use the same model with only natural forcings. Here’s a first cut at that:
Figure 9. Model of the climate using only natural forcings (top panel). All forcings model from Figure 3 included in lower panel for comparison. Yes, the R^2 with only natural forcings is smaller, but it is still a pretty reasonable model.
6. But, but … you can’t just include a 0.42 degree warming like that! For all practical purposes, GISSE does the same thing only with different numbers, so you’ll have to take that up with them. See the US Supreme Court ruling in the case of Sauce For The Goose vs. Sauce For The Gander.
7. The model inherent warming trend doesn’t matter, because the final results for the IPCC scenarios show the change from model control runs, not absolute values. As a result, the warming trend cancels out, and we are left with the variation due to forcings. While this sounds eminently reasonable, consider that if you use their recommended procedure (cancel out the 0.25°C constant inherent warming trend) for their 20th century hindcast shown above, it gives an incorrect answer … so that argument doesn’t make sense.
To simplify access to the data, I have put the forcings, the model response, and the GISS temperature datasets online here as an Excel worksheet. The worksheet also contains the calculations used to produce Figure 3.
And as always, the scientific work of a thousand hands continue.
Regards,
w.
[UPDATE: This discussion continues at Where Did I Put That Energy.]
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.









So if someone commits fraud to gain time/access to a supercomputer then is that also considered theft? How much is the time of a supercomputer worth?
Well who’d a thunk it… another nail in the coffin of trying to base policy only based on ‘models’, which for all their teraflopiness are really rather simple and by the looks of it not very good at all…
This is a classic example of why the data, methodologies, code, etc. require rigorous scrutiny, quality control, version control, archiving AND access. This kind of analysis can only be done if the data, widely defined, are available. Does anyone really wonder why this is proving to be so difficult to obtain?
This is an insufficiently understood characteristic of complex modeling. After a certain point, the only real impact of adding complexity becomes its ability to conceal from the modelers the basic nature of what they have done.
As in so many other areas, the climate modelers seem here to have magnified a common scientific error to the point of absurd self-parody. If they now choose to go with the usual flat denial response, their absurdity will only be more apparent, and the eventual judgment of history will only be more damning.
I think that instead of saying “a linear regression of the total forcings on those temperatures” you meant to say “a linear regression of the temperatures on those total forcings”. You are predicting the temperatures using the forcings, not the other way around.
FTA:” 2. The GISSE model illustrates that when hindcasting the 20th century, the modelers were free to design their own forcings.”
Which means that, essentially, all it is, is a glorified multivariable curve fit, and the CAGWers have convinced themselves that it is somehow miraculous that the curve fit fits the data, and that this therefore confirms their worst fears.
This is the kind of (non) thinking which brought humankind voodoo dolls and leeches. How depressingly… primitive.
Now, that 0.25 degree increase per century in the absence of any change in forcing is interesting, and really makes any action on GHG’s quite irrelevant, since it implies that the oceans will start boiling before we’re even halfway through the next glacial cycle no matter what we do.
Judith Curry has resumed her thread on Climate model verification and validation: Part II at Climate Etc judithcurry.com/2010/12/18/climate-model-verification-and-validation-part-ii/ Her reason is the interest that an invited paper received at AGU last week. The title of the paper is: “Do Over or Make Do? Climate Models as a Software Development Challenge (Invited)” and is found at adsabs.harvard.edu/abs/2010AGUFMIN14B..01E I reproduce the abstract below.Please delete if there are any IP issues. As several of my friends in the legal profession say, res ipsa loquitor.
“We present the results of a comparative study of the software engineering culture and practices at four different earth system modeling centers: the UK Met Office Hadley Centre, the National Center for Atmospheric Research (NCAR), The Max-Planck-Institut für Meteorologie (MPI-M), and the Institut Pierre Simon Laplace (IPSL). The study investigated the software tools and techniques used at each center to assess their effectiveness. We also investigated how differences in the organizational structures, collaborative relationships, and technical infrastructures constrain the software development and affect software quality. Specific questions for the study included 1) Verification and Validation – What techniques are used to ensure that the code matches the scientists’ understanding of what it should do? How effective are these are at eliminating errors of correctness and errors of understanding? 2) Coordination – How are the contributions from across the modeling community coordinated? For coupled models, how are the differences in the priorities of different, overlapping communities of users addressed? 3) Division of responsibility – How are the responsibilities for coding, verification, and coordination distributed between different roles (scientific, engineering, support) in the organization? 4) Planning and release processes – How do modelers decide on priorities for model development, how do they decide which changes to tackle in a particular release of the model? 5) Debugging – How do scientists debug the models, what types of bugs do they find in their code, and how they find them? The results show that each center has evolved a set of model development practices that are tailored to their needs and organizational constraints. These practices emphasize scientific validity, but tend to neglect other software qualities, and all the centers struggle frequently with software problems. The testing processes are effective at removing software errors prior to release, but the code is hard to understand and hard to change. Software errors and model configuration problems are common during model development, and appear to have a serious impact on scientific productivity. These problems have grown dramatically in recent years with the growth in size and complexity of earth system models. Much of the success in obtaining valid simulations from the models depends on the scientists developing their own code, experimenting with alternatives, running frequent full system tests, and exploring patterns in the results. Blind application of generic software engineering processes is unlikely to work well. Instead, each center needs to lean how to balance the need for better coordination through a more disciplined approach with the freedom to explore, and the value of having scientists work directly with the code. This suggests that each center can learn a lot from comparing their practices with others, but that each might need to develop a different set of best practices.”
“
Where is the planetary mechanics influence on solar climate which solely dictates our climate?
The Landscheidt Grand Solar Minimum that started in 1990 is grinding to its peak in 2030. And CO2 can do nothing to stop the increasing brutal cold and crop-killing famine that will result. Nothing.
Brilliant paper. Needs to be published far and wide, starting in Journal of Climate or similar, except it’s over their heads!
GIGO….
RayG,
Yes, as I have written about extensively. The state of GISSE, or any of the other models I have inspected, would not come close to passing muster anywhere I have worked (except Gov.). I have been designing and developing software for something closing in on 30 years now, even in the early days we maintained higher levels of controls and scrutiny. Today it is mandatory, or you don’t eat.
Graphs with background photos disrupt and disturb comprehension of the data while adding nothing but prettification. The first photo, with the many colors and sharp contrast, is particularly distracting and is an excellent example of chartjunk:
http://en.wikipedia.org/wiki/Chartjunk
The temperature impact from GHG forcing in Model E follows 4.053 ln(CO2) -23.0 so it is not quite linear (using CO2 as a proxy for all the GHGs). We are just in a particular part of the formula which is close to linear right now.
They are not playing around with the GHG forcings, it is all the other forcings like Aerosols and the unrealistically high Volcano forcings that are being used for the plugs to match the historical record.
http://img183.imageshack.us/img183/6131/modeleghgvsotherbc9.png
The TempC response per watt/m2 has always bothered me. One needs to assume all the feedbacks will occur to get to the higher numbers often quoted (1 W/m2 of GHG forcing results in an additional 2 W/m2 of water vapour and Albedo feedbacks). Hansen also assumes there is lag as the oceans absorb some of the forcing and then some of the feedbacks like Albedo are more long-term. The response could start out at 0.5C/W/m2 and rise to 0.81C/W/m2 after the lags kick in.
But GISS Model E net forcing was +1.9 W/m2 in 2003 and that would only produce 0.34C/W/m2 of response (including all the feedbacks). After 2003, the oceans stopped absorbing some of the forcing so it might even be falling from this low number. It is probably the actual response that the Earth’s climate gives because I have seen this same number in all the historical climate reconstructions I have done.
GHG doubling +3.7W/m2 X 0.34C/W/m2 = +1.26C
REPLY: Bill I sent you an email a few days ago, but got no response. Check your spam folder – Anthony
An interesting analysis Willis. Somehow it all looks suspiciously like the process of using a simple mechanical model to fit a known data set. This is the technique used to come up with race horse tipping programs based on historic race results. Unscrupulous scammers continue to sell these race tipping programs to gullible punters.
Doesn’t this seem to have a familiar ring to it?
Wow – nice job Willis!
The models are apparently able to reproduce their inputs. Not exactly predictive.
Thanks Willis! Will need to look at the spreadsheet.
Everyone knows that the GISS temperatures are not correct and have been exaggerated by selection of sites with UHI effects and the speading of temperatures from these sites to areas where there is no measurement.
If I have your comments correct then it is possible to take out the supposed GHG effect altogether and still be able to model the actual temperatures. This would add to the findings in icecores and past experimental data (such as compiled by Beck) that CO2 lags temperature and so has no effect on climate (or weather).
Madman2001,
I would have to agree. I personally don’t care for the background images on the charts. I could live without them.
Willis says:
“4. Is this all a result of bad faith or intentional deception on the part of the modelers? I doubt it very much. I suspect that the choice of forcings and the other parts of the model “jes’ growed”, as Topsy said. My best guess is that this is the result of hundreds of small, incremental decisions and changes made over decades in the forcings, the model code, and the parameters.”
IMHO, Hansen and Schneider and the other team leaders wanted to show warming, and their programmers had the task to deliver that warming while doing a good hindcasting. The motivation of everybody in the system was to make this happen, by parameters or by inventing the past history of the forcings. Everybody turned a blind eye on it. It would have been the job of QA to find this. There was no QA. Where there is no QA, anything can happen. Oh, we have peer review, but that was rigged.
That being said – the job of the modelers is even easier than i thought. They are all natural born slackers. We’ve all been taken for a ride.
I am wondering what they do with all of those extra teraflops, sounds to me like I could do the same processing on my WII at 2.5 MIPS (million instructions per second) with equal results (and a little bit cheaper). Maybe those extra teraflops are contributing to catastrophic warming? Perhaps someone should design another model and look in to that…
160 teraflops? I could do that with a pencil and some graph paper. Oh and a pencil sharpener. Better make that two pencils. Geez, the expenses just keep piling up!
The use and reference to a 160 teraflops capable machine to add heft to the credibility of models/simulations, calls to mind the tale of a mega-rancher in Texas. Wanting to know why his black cattle ate more grain than his white cattle, he hired a team of experts and leased a couple of Cray computers. After a year of effort, the report concluded that he had more black cattle.
I an not a scientist, I do read quite a bit, and perhaps understand some of it. A computer model can only attempt to simulate reality (however defined). And then, as I understand it, must be verified by actually measuring the reality that was simulated. The KISS principle seems to tell me, that if you must make up fudge factors, to get the model to work, then the model itself didn’t simulate this reality at all. We may learn quite a bit about the modelers intentions, by studying their efforts, but nothing at all about the reality we are studying.
Thanx Mr E, I love a free education.
This validates what Richard S Courtney has been saying all along, I’m sure he’ll be along soon to verify.
Now we await either Gavin (no chance) or Lacis to show up (nil chance)
Cementafriend says:
December 19, 2010 at 3:38 pm
Only in the simplest sense. The problem is that both my model (and apparently the GISS model) contain a built-in trend. This makes their predictive value something like zero.