New peer reviewed paper finds the same global forecast model produces different results when run on different computers
Did you ever wonder how spaghetti like this is produced and why there is broad disagreement in the output that increases with time?
Graph above by Dr. Roy Spencer
Increasing mathematical uncertainty from initial starting conditions is the main reason. But, some of it might be due to the fact that while some of the models share common code, they don’t produce the same results with that code owing to differences in the way CPU’s, operating systems, and compilers work. Now with this paper, we can add software uncertainty to the list of uncertainties that are already known unknowns about climate and climate modeling.
I got access to the paper yesterday, and its findings were quite eye opening.
The paper was published 7/26/13 in the Monthly Weather Review which is a publication of the American Meteorological Society. It finds that the same global forecast model (one for geopotential height) run on different computer hardware and operating systems produces different results at the output with no other changes.
They say that the differences are…
“primarily due to the treatment of rounding errors by the different software systems”
…and that these errors propagate over time, meaning they accumulate.
According to the authors:
“We address the tolerance question using the 500-hPa geopotential height spread for medium range forecasts and the machine ensemble spread for seasonal climate simulations.”
…
“The [hardware & software] system dependency, which is the standard deviation of the 500-hPa geopotential height [areas of high & low pressure] averaged over the globe, increases with time.”
The authors find:
“…the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.”
The initial conditions of climate models have already been shown by many papers to produce significantly different projections of climate.
It makes you wonder if some of the catastrophic future projections are simply due to a rounding error.
Here is how they conducted the tests on hardware/software:
Table 1 shows the 20 computing environments including Fortran compilers, parallel communication libraries, and optimization levels of the compilers. The Yonsei University (YSU) Linux cluster is equipped with 12 Intel Xeon CPUs (model name: X5650) per node and supports the PGI and Intel Fortran compilers. The Korea Institute of Science and Technology Information (KISTI; http://www.kisti.re.kr) provides a computing environment with high-performance IBM and SUN platforms. Each platform is equipped with different CPU: Intel Xeon X5570 for KISTI-SUN2 platform, Power5+ processor of Power 595 server for KISTI-IBM1 platform, and Power6 dual-core processor of p5 595 server for KISTI-IBM2 platform. Each machine has a different architecture and approximately five hundred to twenty thousand CPUs.
And here are the results:

While the differences might appear as small to some, bear in mind that these differences in standard deviation are only for 10 days worth of modeling on a short term global forecast model, not a decades out global climate model. Since the software effects they observed in this study are cumulative, imagine what the differences might be after years of calculation into the future as we see in GCM’s.
Clearly, an evaluation of this effect is needed over the long term for many of the GCM’s used to project future climate to determine if this also affects those models, and if so, how much of their output is real, and how much of it is simply accumulated rounding error.
Here is the paper:
An Evaluation of the Software System Dependency of a Global Atmospheric Model
Abstract
This study presents the dependency of the simulation results from a global atmospheric numerical model on machines with different hardware and software systems. The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.
h/t to The Hockey Schtick
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.

If you are not aquainted with how complex chaotic behavior can be for even the simplest system look at the section on chaos in http://en.wikipedia.org/wiki/Logistic_map and at http://mathworld.wolfram.com/LogisticMap.html for an introduction. Assuming your mathematical skills exceed Phil Jones’, it’s easy to set up an Excel spreadsheet to play with the the logistic map to get a feel for deterministic chaos. The fluid equations used in climate models are guaranteed to be way more complicated than this.
Alan Watt, Climate Denialist Level 7 says:
July 27, 2013 at 12:09 pm
“All” is a strong word. Software engineers rarely use absolutes.
1.0 can be represented exactly by an IEEE floating point number, so can 2.0, 3.0, and so on up to the precision of the significand.
0.5 and other negative powers of 2 can be represent (up to the range of the exponent). As can integral multiples – my first example applied for multiples 2^0, i.e. multiples of 1.
On the other hand, 1/10 cannot be represented exactly, nor can 1/3 or 1/7 – we can’t represent the latter two with a decimal floating point number, the best we can do is create a notation for a repeating decimal.
In practice, this is a moot point – values used in modeling are don’t have exact representations in any form.
I disagree. Amateur mathematicians have made many contributions to number theory. That statement implies that only research scientists can contribute to climate science. I disagree with that too.
I will agree that numerical programming is rife with surprises and pitfalls.
My first COBOL professor had a saying that I’ve kept in mind ever since.
Programs are languages that allow humans, usually, to communicate with hardware. Only hardware doesn’t understand languages. Hardware communicates via on or off bits accumulated into bytes accumulated into words, doublewords and so on.
Systems analysts or system engineers often work in a higher language but must resort to Assembler or even bits and bytes when determining what is happening at hardware level.
Most high level languages are aggregations of lower level language codings that accomplish specific tasks. Compiling a higher level language used to be essential before a computer could actually perform the code’s instructions. Compiling was/is rendering the higher level language into a lower level language that the computer accepts.
Nowadays many higher level languages are compiled as they are run so most programmers no longer need to learn Assembly or hardware protocols. For many years, assembly language was accepted by the kernel operating system and translated into the machine protocol in a similar fashion.
Many modules, function calls, data base calls of the higher level languages are assemblages of coding from a very wide diverse group of individuals. Yes, they were tested, but within the parameters of ‘common usage’. Does anyone consider climate or even atmospheric physics ‘common usage’?
What the translates to for the uninitiated is that the higher or perhaps better phrased easier levels of coding are assemblages of pieces of code from many other programmers who have their own ideas of what was/is needed for their contribution.
Which brings us back to my COBOL professor Dr. Gomez. “Never state implicitly what you can state explicitly!”. Meaning unless the code is well documented or until tested and the results are explicitly known, most code modules and function calls are implicit. Make it explicit.
One more comment, huge linear or multilayered databases are treated as dimensional arrays in a program. The more complex the system, the more dimensions added to the overall design and the more arrays to be processed.
Unsurprising. For a chaotic system this behavior is completely expected. There is a mathematical theorem that errors for each run of a numerical approximation (with roundoff) there is a set of real world conditions close to the original ones that would yield exactly the same real world result. So having suchsensitivity to roundoff doesn’t add a new type of error to the models. It can be dealt with by looking at the sensitivity to initial conditions.
The real world is itself chaotic. Even if we had an absolutely perfect model of the climate on a godlike compuler with no roudoff errors, runs with slightly different initial conditions would still give you a pile of spaghetti. The pile of spaghetti is not a sign that the models are defective. The only way to get rid of the spaghetti is to find that damned butterfly and kill it.
ikh says:
July 27, 2013 at 4:06 pm
Scientists (even computer scientists) tend to write code that’s hard to follow and is ill-commented. Don’t get me started about David Parnas and directions he tried to explore to remove descriptive information from subroutines in a module. He hid some much information about a stack implementation that Alan Perlis couldn’t figure out what he had described. Oops, I got started, sorry! “There are two ways to write error-free programs; only the third one works.” – Alan Perlis
Early Unix code and user level utilities were pretty awful. Engineers who don’t comment their code should be (and often are) stuck with supporting it because no one else understands it.
I was involved in the mini-super computer field for a while and was quite surprised at what Fortran had become. Fortran 2003 includes object oriented constructs. Just because a lot of people are writing code as though they have a Fortran IV compiler should not be cause to denigrate modern Fortran. Note that C does not have OOP elements, you can’t even multiply a pair of maxtrices together like you can in Fortran with “A = B * C”. While C is one of the main languages I use, C is an archaic language. More so than Fortran.
I bet you’ll find a lot of Fortran code in places like Wall Street, it’s not just for R&D any more!
Janice Moore says:
July 27, 2013 at 3:50 pm
“Jonathan Abbott (at 1:11PM)! So, you’re the boss in “Dilbert”!!! At least, you are bright and want to learn, unlike that guy. (and I’m sure you don’t style your hair like he does, either, lol)”
I’ve seen balding men who have their hair styled at the Dairy Queen, but Dilbert’s boss has a double scoop!
Yes, perhaps and no. Even within the same programming language there are different versions and different modules. Extensive testing is needed to determine modules needing change.
If the hardware is different, which is almost absolutely, there are different system implementations, access procedures, storage procedures, times available and so on ad infinitum.
Just upgrading within a software system to a new version can cause clumps of hair to appear around one’s desk. (Yes, I still have a full head of hair; but can you think of some program operators who are a little shy of hair?)
Your short answer is yes, but this we only assume from his letter and there may be more, much more to Professor Salby’s planned move and rebuilding. One rarely plans a new venture only intending to rebuild one’s old venture
ikh says:
July 27, 2013 at 4:51 pm
As you say, re-ordering means non-dependancy. Assuming no CPU erratum this does not affect the outcome. Determinism is maintained.
That’s why I mentioned race conditions as a possible source of non-deterministic behaviour.
Ric Werme says:
July 27, 2013 at 5:16 pm
“The trajectory is analogous to weather – it has data that can be described as discrete points with numerical values. If some of the coefficients that describe the system change, then the overall appearance will change and that’s analogous to climate.”
First, it’s not clear that the concept of an attractor has any meaning for a high dimensional system like the climate.
Second, even for chaotic systems with say three or four dimensions it’s common for multiple attractors to coexist, parallel worlds in effect. Each attractor has distinct statistical properties and samples different parts of phase space. Worse, the “basin of attraction” for the attractors is a fractal. This means that tiny variations in initial conditions leads you unpredictably to one of the attractors and without infinite precision you can’t tell ahead of time where you’ll wind up.
Are we to believe that a complex fluid system like the coupled atmosphere-ocean system has simpler behavior?
Ah. My mind drift back to 1978 and the “State Of Computing.”
A graduate-level class, I was an undergraduate at that time, in numerical analysis where we learn about “heat”, i.e. heat buildup in the CPU and Memory and the growth of round-off error, I/O misreads and miswrites and byte misrepresentation in memory from one cycle to the next cycle.
And I thought this was going to be a class on ‘Mathematics.’
And it WAS. 🙂
Then came “Endianness”.
From Wikipedia:
“Endianness is important as a low-level attribute of a particular data format. Failure to account for varying endianness across architectures when writing code for mixed platforms can lead to failures and bugs. The term big-endian originally comes from Jonathan Swift’s satirical novel Gulliver’s Travels by way of Danny Cohen in 1980.[1]
[1] ^ a b c Danny Cohen (1980-04-01). On Holy Wars and a Plea for Peace. IEN 137. “…which bit should travel first, the bit from the little end of the word, or the bit from the big end of the word? The followers of the former approach are called the Little-Endians, and the followers of the latter are called the Big-Endians.” Also published at IEEE Computer, October 1981 issue.
🙂
Thank you Anthony for reminding me!
Bugs the heck is right. What I want to know is how can one forecast the climate which is dependent on weather and which the final result is weather? Climate predictions simplified to input and response sound far too stripped to work in a real world.
Using your own fractal modeling analogy. The fractal model is a fractal model; changing the input is not new code nor a new model. Encompassing a fractal or even a simplified convection model is not what I consider a climate encompassed world.
“As you say, re-ordering means non-dependancy. Assuming no CPU erratum this does not affect the outcome. Determinism is maintained.”
Indeed. For out-of-order execution to work effectively, particularly when running old code built for in-order CPUs, it has to produce the same results as though the program ran in-order. We realised twenty years ago that relying on the programmer or compiler to deal with the consequences of internal CPU design was a disaster; e.g. some of the early RISC chips where the results of an instruction weren’t available until a couple of instructions later, but there were no interlocks preventing you from trying to read the result of that instruction early and getting garbage instead. One compiler or programmer screwup, and suddenly your program starts doing ‘impossible’ things.
A bad day for the Infinite Gods of the UN IPCC … indeed.
At the bar:
Mix geographers, computing machines, data in a small room with bad ventilation and what do you get ….
Catastrophic Global Anthropogenic Clathrate Gun Bomb Climate Runaway Heating Warming Over Tripping Point exercise in Pong.
The ‘Heating’ only exists in the CPU of the INTEL little-endian computers and and groins of the hapless geographers (like Micky Mick Mick Mann and Jimmery Jim Jim Hansen) and NOWHERE else.
Hardy har har.
My Name IS Loki!
Say My Name! Say My Name! Say My Name!
As a mathematician by training, in the 60s all were taught Numerical Analysis, the art of avoiding rounding error by hand and with those new fangled computer things! But MUCH more serious is
CHAOS Theory 1996 which showed that the integration of differentiable manifolds was unstable with respect to initial conditions. This is a proven Mathematical Theory, which means that:
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Models RIP, MFG, omb
ikh says:
July 27, 2013 at 4:06 pm
davidmhoffer says:
If the programmer didn’t take errata into account, the most likely results is that they are ALL wrong.
No. Unless you are programming in Assembler language. It is generally the job of the compiller to deal with CPU bugs.
>>>>>>>>>>>>>>>
Yes it is. Only this kind of code isn’t general at all. Plus compilers have considerable ability to be tuned to the specific instruction set of the specific cpu. That’s why you can get completely different benchmark results using the same code on the same hardware using the same compiler, just by setting various flags in the compiler different ways. If you don’t understand how the compiler deals with instructions or conditions that turn out to have errata associated with them, you are asking for trouble. Your confidence in C as a replacement for Fortran is also misplaced in this type of application. You said it yourself in your own comment, the code is spaghetti code, often in different languages, most likely written by different people at different points in time for some specific purpose at that time, and there’s no documentation to give anyone any clue as to what each section of code is supposed to do. Compile that mess written over decades on modern hardware with completely different cpu’s and expect a consistent result? good luck.
Someone mentioned compilers and hardware platforms as a dangerous beast for FP calculations. About 1000 years ago I was coding numerical algorithms and – when moving code from one computer to another – I got nasty surprises on FP results due to the ‘endianness’ problem (little endian vs. big endian, etc). Then I left the field but I guess some of the code embedded into the models is pretty old and is never been reviewed or adapted. I remember a routine on splines I wrote which produced absurd results depending on the hardware platform used. However astronomers in the team where using it to smooth data – despite my alert not to trust 100%- like if my routine were the word of the Lord. Sigh! So long time ago!
Subject: Lucky fortune time
From: Ric Werme
Date: Sat, 27 Jul 2013 21:00:01 -0400 (EDT)
To: <werme…>
—— Fortune
One person’s error is another person’s data.
We are trying to model an open system. The nature and effects of the largest single known influence, the sun, are poorly understood. There may be other significant influences, not yet apprehended. The models are basically curve-fitting exercises based on precedent, faith and luck. There is no reason to suppose that any model that depends on our present state of knowledge will have reliable predictive power. It is reasonable to conclude that these horrendously expensive ongoing exercises are simply a waste of resources.
No doubt, the modellers will disagree. However, I seem to recall another serious, hotly contested debate about how many angels could dance on the head of a pin. A century ago respectable intellectuals happily agreed on the natural inferiority of the Negro. How about the briefly fashionable ‘runaway global warming’? (We’ll end up like Venus!) There may be one or two people here who would prefer to forget their involvement in that one.
It seems to be an unfortunate fact that in intellectual discourse (as in most other things) fashion, rather than rigour, is the prime determinant of of an idea’s ‘correctness’.
As Edward Lorenz showed in 1960, Earth’s climate is sensitively dependent on initial conditions, the so-called “butterfly effect”. This of course entails not merely Chaos Theory with its paradigmatic “strange attractors” but Benoit Mandelbrot’s Fractal Geometry wherein all complex dynamic systems tend to self-similarity on every scale.
If Global Climate Models (GCMs) indeed are initiating their simulations to some arbitrary Nth decimal, their results will devolve to modified Markov Chains, non-random but indeterminate, solely dependent on the decimal-resolution of an initial “seed.” To say that this invalidates over-elaborate computer runs not just in practice but in math/statistical principle severely understates the case: Over time, absolutely no aspect of such series can have any empirical bearing whatsoever.
Kudos to this paper’s authors. Their results seem obvious, once pointed out, but it takes a certain inquiring –might one say “skeptical”– mindset to even ask such questions.
Worrying about the precision of calculations is a waste of time. Sure; the calculations will be more accurate but the results, however deviant, are still dependent upon a description of initial state where each parameter has in the range of 6 to 10 bits of “resolution” – before homogenisation – which tends to “dampen” the information content, reducing effective bits of state information.
Internally, the models tend to be a swill of ill-conceived code apparently written by undergraduates with little appreciation of software engineering, let alone the physical systems that they’re “modelling”. Physical “constants” that are e.g. only constant in ideal gases, are hard-coded to between 10 and 16 bits of imprecision. Real constants such as π, are not infrequently coded as a decimal (float) value with as few as 5 digits. (I knew that that was very wrong when I was an undergraduate.) Other formula constants are often lazily hard-coded as integer values. Typical for multiplication or division of DOUBLE (or better) PRECISION variables can have different results; often determined by the order in which the equations are written and subsequently compiled.
Superficially, that doesn’t make much difference. But as the GCM are iterative models, it’s the changes in parameters that determine the behaviour of the simulated system. i.e. Lots of nearly-equal values are subtracted and the result is used to determine what to do next.
Subtraction of “nearly-equal” values is also a numerical technique for generating random numbers.
I’ve previously commented on the credibility of ensembles of models and model runs.
I repeat what I have said before.
If they hindcast to HadCrud or GISS, the result will ALWAYS be a large overestimate of future temperatures.
None of them are close to the observations, so why should we care if they are close to each other? They are all wrong.
There are a number of additional points that need to be made about modeling and Computation,
1. Fortran is used for two reasons (a) the Numerical libraries eg NAG are best tested there, and (b) the semantics of the language allow compilers to produce better code with more agressive optimization. For this reason they are still extensively used in Finance Oil Exploration … for new work. Different ABIs make it slow to mix Fortran and C, C++.
2. Interpreted languages, which can give you indefinite precision eg Perl are 3-5 times too slow.
3. Do not listen to the OO club, once you get into that you bring huge, unnecessary overhead, which usually dosn’t matter, People are more expensive than machines … but simplicity and speed are key to the linear extrapolation that has to be done by many cores to make these computations hum. Hand coded Asseombly can often pay off.
4. The limits are set bu chaos and continual rounding which moves the initial conditions for each cycle.
MFG, omb
The models work just fine. They produce the propaganda they are intended to produce.
From a computer and mathematically challenged old timer this has been a fascinating and very illuminating post on the soft ware engineering and computer coding’s fallibility’s as it is currently being expressed and promoted in the climate models.
It is also these same incompetently coded climate models that are being used for an almost endless array, a whole band wagon of well endowed tax payer funded alarmist research claims in nearly every scientific field.
The software engineers and computer savvy folk here have very effectively shredded the whole of climate science’s modeling claims and in doing so have almost destroyed the climate modelers claims they are able to predict the future directions of the global climate through the use of dedicated climate models.
And I may be wrong but through this whole post and particularly the very illuminating comments section nary a climate scientist or climate modeler was to be seen or heard from.
Which says a great deal about the coding skills or relative lack of any real competency and high level coding capabilities by the various climate science modelers and the assorted economists, astronomers, meteorologists, geologists, dendrochronologists and etc and etc who have all been very up front in getting themselves re-badged and their bio’s nicely padded out as so called Climate Scientists.
All this incompetency in Climate Science’s modeling has probably cost three quarters of a trillion dollars worth of global wealth and god knows how much suffering and how many lives. Along with the still only partly known extent of the destruction of living standards for so many in both the wealthy western countries and even more so in the undeveloped world where development and increases in living standards have been held back and even crippled by the near fanatical belief of the ruling political and watermelon cabals in Climate Science’s incompetently modeled and increasingly disparaged predictions for the future of the global climate.
We are all paying and will likely continue to pay for some time a truly horrific price for the incompetency and ever more apparent real ignorance of the self professed climate scientists and their failed and incompetently coded climate models on which the entire global warming / climate change / climate extreme meme is entirely based.