New peer reviewed paper finds the same global forecast model produces different results when run on different computers
Did you ever wonder how spaghetti like this is produced and why there is broad disagreement in the output that increases with time?
Graph above by Dr. Roy Spencer
Increasing mathematical uncertainty from initial starting conditions is the main reason. But, some of it might be due to the fact that while some of the models share common code, they don’t produce the same results with that code owing to differences in the way CPU’s, operating systems, and compilers work. Now with this paper, we can add software uncertainty to the list of uncertainties that are already known unknowns about climate and climate modeling.
I got access to the paper yesterday, and its findings were quite eye opening.
The paper was published 7/26/13 in the Monthly Weather Review which is a publication of the American Meteorological Society. It finds that the same global forecast model (one for geopotential height) run on different computer hardware and operating systems produces different results at the output with no other changes.
They say that the differences are…
“primarily due to the treatment of rounding errors by the different software systems”
…and that these errors propagate over time, meaning they accumulate.
According to the authors:
“We address the tolerance question using the 500-hPa geopotential height spread for medium range forecasts and the machine ensemble spread for seasonal climate simulations.”
…
“The [hardware & software] system dependency, which is the standard deviation of the 500-hPa geopotential height [areas of high & low pressure] averaged over the globe, increases with time.”
The authors find:
“…the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.”
The initial conditions of climate models have already been shown by many papers to produce significantly different projections of climate.
It makes you wonder if some of the catastrophic future projections are simply due to a rounding error.
Here is how they conducted the tests on hardware/software:
Table 1 shows the 20 computing environments including Fortran compilers, parallel communication libraries, and optimization levels of the compilers. The Yonsei University (YSU) Linux cluster is equipped with 12 Intel Xeon CPUs (model name: X5650) per node and supports the PGI and Intel Fortran compilers. The Korea Institute of Science and Technology Information (KISTI; http://www.kisti.re.kr) provides a computing environment with high-performance IBM and SUN platforms. Each platform is equipped with different CPU: Intel Xeon X5570 for KISTI-SUN2 platform, Power5+ processor of Power 595 server for KISTI-IBM1 platform, and Power6 dual-core processor of p5 595 server for KISTI-IBM2 platform. Each machine has a different architecture and approximately five hundred to twenty thousand CPUs.
And here are the results:

While the differences might appear as small to some, bear in mind that these differences in standard deviation are only for 10 days worth of modeling on a short term global forecast model, not a decades out global climate model. Since the software effects they observed in this study are cumulative, imagine what the differences might be after years of calculation into the future as we see in GCM’s.
Clearly, an evaluation of this effect is needed over the long term for many of the GCM’s used to project future climate to determine if this also affects those models, and if so, how much of their output is real, and how much of it is simply accumulated rounding error.
Here is the paper:
An Evaluation of the Software System Dependency of a Global Atmospheric Model
Abstract
This study presents the dependency of the simulation results from a global atmospheric numerical model on machines with different hardware and software systems. The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.
h/t to The Hockey Schtick
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.

Great report and read Anthony.
Tree Rings in trees also do the same thing in a way. The roots are grounded but soil conditions under each tree can vary from tree to tree giving different results even though the trees are same species and grow right next to each other or in same area.
Like how computer models run on different software.
Didn’t some warmist researcher make a comment about not trusting data when it conflicted with models?
I don’t know that I thought much about computing errors, but it should be known by everyone who wants to be taken seriously in the climate debate, that even the approximations to the actual equations of science being modeled in the Climate models cannot be kept from diverging for very long. This is why I cringe whenever i see the modelers trying to clam that their models are using the actually scientific equations underlying earth’s climate. They aren’t and they can’t.
Rounding errors. Digits at the extreme that are too small to deal with, but that, over time, end up affecting the outcome. Miniscule physical properties that we are unable to measure. The general uncertainty principle involved in trying to forecast how climate (made up of weather, made up of air particles, made up of individual molecules, made up of individual particles — both the precise location and trajectory of which cannot, in principle, even be measured).
Is it just the case that we need better computers, more accurate code, more decimals, more money, more measurements?
The doubt that keeps gnawing at me is whether it is possible — even in principle — to accurately model something like climate over any reasonable length of time.
[snipped to prevent your usual threadjacking. You are wrong- read it and try again – Anthony]
Wild ass guess plus or minus rounding error is still just a WAG. Wake me up when they properly model oceans and other natural climate factors.
Now did I lambast people for years here now with the mathematical definition of chaos as used by chaos theory, which states that a system is chaotic IFF its simulation on a finite resolution iterative model develops an error that grows beyond any constant bound over time? And was this argument ignored ever since by all warmists (to be fair; our resident warmists surely had their eyes glaze over after the word mathematical)?
Yes and yes.
Floating point numbers are not precise and computer use floating point numbers for very large or very small numbers. This is not a secret and while everybody who ever took a programming course probably learned it, most of us forget about it unless reminded.
Somebody who works as a programmer in the scientific or engineering fields where floating point numbers are routinely used should be aware of this issue. However, it appears that much of the climate model code is written by anybody but professionally trained computer programmers or software engineers.
honest questions, I’m no longer sure that I fully understand these things:
1. the beginning of each model run is really hindcasting to tune the models??…if so, they missed that too
2. each model run is really an average of dozens/hundreds/thousands of runs, depending on how involved each model is?
3. if they hindcast/tuned the models to past temps, then they tuned them to past temps that have been jiggled and even if the models work, they will never be right?
4. the models have never been right…not one prediction has come true?
..I’m having an old age moment…and doubting what I thought I knew…mainly because it looks to me like people are still debating garbage
Can’t wait to see what RGB has to say about this!
Butterfly errors producing storms of chaos in the models.
Look that’s just sad. I haven’t had to fool with it for 20 years since school, but there are methods for controlling this sort of thing, it’s not like numerical approximation and analysis are unknown frontiers for goodness sakes.
[snip – I nipped his initial comment, because it was wrong and gave him a chance to correct it, you can comment again too – Anthony]
more soylent green says:
July 27, 2013 at 11:15 am
“Somebody who works as a programmer in the scientific or engineering fields where floating point numbers are routinely used should be aware of this issue. However, it appears that much of the climate model code is written by anybody but professionally trained computer programmers or software engineers.”
They just don’t care. Anything goes as long as the funding comes in.
Nick Stokes says:
July 27, 2013 at 11:13 am
[snip – I nipped his initial comment, because it was wrong and gave him a chance to correct it, you can comment again too – Anthony]
[snip – try again Nick, be honest this time
Note this:
– Anthony]
@Nick OK I see where you picked up “climate model” from. It was the initial text I got from “The Hockey Schtick” in the subtitle, which I made a point of saying later was:
So I fixed that reference from THS to “climate model” to say “global forecast model”. Now that you can get past that, we can move on to a relevant discussion.
Thanks for pointing it out, though the way you did it was a bit irritating, so my semi-apologies for the snip. I think maybe you commented without reading what was said later.
Rounding errors ? How many decimal places are they working to ? 1? 0?
hmm, I imagine this could become the primary selection criteria when purchasing new supercomputers for running climate models. The supercomputer manufacturers will start to highlight this “capability” in their proposals to climate scientists — “The Cray SPX -14 implements a proprietary rounding algorithm that produces results that are 17% more alarming than our nearest competitor”. Maybe the benchmarking guys can also get a piece of the action — “The Linpack-ALRM is the industry’s most trusted benchmark for measuring total delivered alarminess”.
Sorry, gotta go. I’ve got an idea for a new company I need to start.
Nick Stokes says:
July 27, 2013 at 11:13 am
Nick, read the UK Met off proud announcement that their climate model is also used for the weather forecast and in so doing validates their model.
I find it difficult to believe that this is due purely to the handling of rounding errors. Have the authors established what it takes to produce identical results? I would start with a fix hardware, OS, complier and optimization. Do two separate runs produce almost exactly matching result? If not, then the variation over different hardware/compiler is irrelevant. I am inclined to believe it is more likely something in the parallel compiler, in determining whether communication between nodes is synchronous. Modern CPU has cycle time of 0.3ns (inverse of frequency). Communication time between nodes is on the order of 1 micro-sec. So there maybe an option in the parallel compiler to accept “safe” or insensitive assumptions in the communication between nodes?
Billy Liar says: July 27, 2013 at 11:21 am
“So climate models are immune to rounding errors?”
No. Nothing is. As some have observed above, an atmosphere model is chaotic. It has very sensitive dependence on initial conditions. That’s why forecasts are only good for a few days. All sorts of errors are amplified over time, including, as this paper notes, rounding errors.
That has long been recognised. Climate models, as Latitude notes, have a long runup period. They no longer attempt to predict from the initial conditions. They follow the patterns of synthetically generated weather, and the amplification of deviation from an initial state is no longer an issue.
Nick – Climate models may not have specific initial conditions (e.g. today’s actual weather conditions to predict next week’s weather), but they must have initial conditions — that is, starting numerical values for the states of the system. If different software implementations of the same model diverge for the same initial conditions, that is indeed a potential problem.
Anthony, you say I’m wrong. What’s your basis for saying these are climate models?
@Nick See comment upthread.
Remarkable!
I expected the server to tell me it was pay walled, but got:
It worked the second try – and was told the paper was paywalled.
This is rather interesting. I’m not a floating point expert, though I got a lesson in all that on one of my first “for the heck of it” programs that simulated orbital motion on Univac 1108.
I would think that IEEE floating point should lead to near identical results, but I bet the issues lie outside of that. The different runtime libraries use very different algorithms to produce transcendental functions, e.g. trig, exponentials, square root, etc. Minor differences will have major changes in the simulation (weather). They might even produce changes in the long term average of the output (climate).
CPUs stopped getting faster several years ago, coincidentally around the time the climate stopped warming. Supercomputers keep getting faster by using more and more CPUs and spreading the compute load across the available CPUs. If one CPU simulates it’s little corner of the mesh and shares its results with its neighbors, the order of doing that can lead to round off errors. Also, as the number of CPUs increase, that makes it feasible to use a smaller mesh, and a different range of roundoff errors.
The mesh size may not be automatically set by the model, so the latter may not apply. Another sort of problem occurs when using smaller time increments. That can lead to computing small changes in temperature and losing accuracy when adding that to a much larger absolute temperature. (Something like that was part of my problem in the orbital simulation, though IIRC things also got worse when I tried to deal with behavior of tan() near quadrant boundaries. Hey, it was 1968. I haven’t gotten back to it yet.)
Wow. So we could keep the initial conditions fixed, but vary the decomposition. Edward Lorenz would be impressed.