New peer reviewed paper finds the same global forecast model produces different results when run on different computers
Did you ever wonder how spaghetti like this is produced and why there is broad disagreement in the output that increases with time?
Graph above by Dr. Roy Spencer
Increasing mathematical uncertainty from initial starting conditions is the main reason. But, some of it might be due to the fact that while some of the models share common code, they don’t produce the same results with that code owing to differences in the way CPU’s, operating systems, and compilers work. Now with this paper, we can add software uncertainty to the list of uncertainties that are already known unknowns about climate and climate modeling.
I got access to the paper yesterday, and its findings were quite eye opening.
The paper was published 7/26/13 in the Monthly Weather Review which is a publication of the American Meteorological Society. It finds that the same global forecast model (one for geopotential height) run on different computer hardware and operating systems produces different results at the output with no other changes.
They say that the differences are…
“primarily due to the treatment of rounding errors by the different software systems”
…and that these errors propagate over time, meaning they accumulate.
According to the authors:
“We address the tolerance question using the 500-hPa geopotential height spread for medium range forecasts and the machine ensemble spread for seasonal climate simulations.”
…
“The [hardware & software] system dependency, which is the standard deviation of the 500-hPa geopotential height [areas of high & low pressure] averaged over the globe, increases with time.”
The authors find:
“…the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.”
The initial conditions of climate models have already been shown by many papers to produce significantly different projections of climate.
It makes you wonder if some of the catastrophic future projections are simply due to a rounding error.
Here is how they conducted the tests on hardware/software:
Table 1 shows the 20 computing environments including Fortran compilers, parallel communication libraries, and optimization levels of the compilers. The Yonsei University (YSU) Linux cluster is equipped with 12 Intel Xeon CPUs (model name: X5650) per node and supports the PGI and Intel Fortran compilers. The Korea Institute of Science and Technology Information (KISTI; http://www.kisti.re.kr) provides a computing environment with high-performance IBM and SUN platforms. Each platform is equipped with different CPU: Intel Xeon X5570 for KISTI-SUN2 platform, Power5+ processor of Power 595 server for KISTI-IBM1 platform, and Power6 dual-core processor of p5 595 server for KISTI-IBM2 platform. Each machine has a different architecture and approximately five hundred to twenty thousand CPUs.
And here are the results:

While the differences might appear as small to some, bear in mind that these differences in standard deviation are only for 10 days worth of modeling on a short term global forecast model, not a decades out global climate model. Since the software effects they observed in this study are cumulative, imagine what the differences might be after years of calculation into the future as we see in GCM’s.
Clearly, an evaluation of this effect is needed over the long term for many of the GCM’s used to project future climate to determine if this also affects those models, and if so, how much of their output is real, and how much of it is simply accumulated rounding error.
Here is the paper:
An Evaluation of the Software System Dependency of a Global Atmospheric Model
Abstract
This study presents the dependency of the simulation results from a global atmospheric numerical model on machines with different hardware and software systems. The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.
h/t to The Hockey Schtick
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.

Tom says:
July 27, 2013 at 11:53 am
Well said. And many a decision is made on the basis of overall cost.
Frank K. says: July 27, 2013 at 12:16 pm
“Unfortunately, Nick, climate (as formulated in most GCMs) is an initial value problem. You need initial conditions and the solution will depend greatly on them (particularly given the higlhy coupled, non-linear system of differential equations being solved).
“They follow patterns of synthetic weather”??
REALLY? Could you expand on that?? I have NEVER heard that one before…”
Here is something that is familiar, and is just a small part of what AOGCM’s do. Time varying ocean currents, shown with SST. You see all the well known effects – Gulf Stream, ENSO, Agulhas. It’s time varying, with eddies.
If you run that on another computer, the time features won’t match. You’ll see eddies, but not synchronous. That’s because of the accumulation of error. There is no prediction of exactly what the temperature will be at any point in time. But the main patterns will be the same. The real physics being shown is unaffected.
Well, whenever I read about these computer “glitches”, a whole flood of thoughts come over me. Dirk H’s point on chaotic systems, being just one of those concerns. I think about these problems often, when driving.
It occurs to me (all the time) that all those traffic lights are programmed by the same sort of people who gave us Micro$oft Windows; the largest of all computer viruses. Well the keep sending out new viruses every few days too.
So traffic lights are programmed to answer the question: “Which car(s) should I let go (if any) ?”
This results in most traffic lights being mostly red, most of the time, and most cars standing still burning gas very inefficiently.
If they changed the algorithm, to answer the question: “Which car(s) should I stop (if any) ?”
Then most traffic lights, would be mostly green most of the time, and most of the cars would be moving (safely), and conserving gasoline.
So a lot depends not on the programmer, but on the designer of the algorithms.
Let me give you an example from Optical Ray Tracing, to demonstrate the problem.
In lens design, you are dealing with optical surfaces, that most commonly (but not always) are portions of spheres of radius (R). Now sometimes the value of R can be large, even very large.
In a typical camera lens, the difference between a surface with a radius of 100 cm and one of radius 1,000 cm is not tat much in practice. You even have surfaces of infinite radius, sometimes called “flats” or planes.
Well now we have a problem, because my keyboard doesn’t have an infinity key on it. Well not to worry, we can write a separate routine, to deal with plane surfaces. That doesn’t deal with the 100-1,000 problem.
Well instead, we recognize that it is the CURVATURE of the surface that determines the optical power; not the radius. Moreover, most keyboards, DO have a zero key, to designate the curvature of a plane.
Does it ever occur to anyone, that no matter how small a segment of a circle (or sphere), you take, whether a micron of arc length, or a light year of arc length, the curvature is still the same. it NEVER becomes zero.
Well the issue is still not solved. Suppose, I have a portion of a spherical surface of radius of curvature (R), and that zonal portion has an aperture radius of (r).
We can calculate the sag of the surface (from flat) with a simple Pythagorean calculation, which will give us:-
s = R – sqrt (R^2 – r^2) How easy is that ? ………….(1)
Well now we have a real computer problem, because if R is 1,000 mm, and r is 10 mm , then we find that sqrt is 999.95 with an error of about 1.25 E-6.
So our sag is the small difference in two large numbers; a dangerous rounding error opportunity.
So we never use equation (1) aside from the infinity problem..
I can multiply (and then divide) by (R + sqrt (R^2 – r^2)) to get :
(R^2 – (R^2 – r^2)) / (R + sqrt (R^2 – r^2)) = r^2 / (R + sqrt (R^2 – r^2))
So now I divide through by (R); well why nor multiply by (C) (= 1/R)
This gives me s = Cr^2 / (1 + sqrt (1- C^2.r^2)) ……………..(2)
So the small difference problem has vanished, replaced by the sum of two nearly equal numbers, and suddenly the plane surface is no different from any other sphere.
Most geometers might not immediately recognize equation (2) as the equation of a sphere; but to a lens designer; well we live and breathe it.
This is just one example of writing computer algorithms, that are mathematically smart, rather than the red light traffic light codes.
Gary Pearse says:
July 27, 2013 at 12:26 pm
“2) Aren’t the rounding errors distributed normally? Can’t we simply take the mean path through the spaghetti arising from these errors?”
In an iterative model any error amplifies over time. Assuming the Law Of Large Numbers held (which is only true for distributions that depend on only one variable; so the assumption is overly generous, but anyway) we could dampen the error by averaging. Average N^2 models and you dampen the error by a factor of N.
As the error amplifies over time, N must become ever larger to keep the error mean at the end of the simulation under the predefined desired bound.
This gives us a forecasting horizon behind which computational power is insufficient to keep the error under the desired bound.
No such examinations or considerations by warmist climate scientists are known to me. All of this has been ignored by climate science. Their motto is
just wing it.
Gary Pearse@12:26.
“1) How can one model any complex phenomenon confidently given such confounding factors? Is there a fix?”
The short answer is no. In a low dimensional nonlinear system where there are only a few variables one can model the “attractor” and learn a great deal about how the system behaves. This still doesn’t allow you to make long term predictions though short term predictions are possible with errors that grow exponentially with time. The climate is a very high dimensional system with severe uncertainty about the actual dynamical mechanisms, and mediocre data that covers the atmosphere and oceans very sparsely.
I confess to becoming cross-eyed when I try to understand computers, however I like the idea of using a “rounding error” to escape blame for those times I utterly and totally screw up.
I’d like those of you who are wise to explain the concept further, in terms a layman can understand, so that I might employ it in possible future scenarios involving my wife and the IRS.
Nick Stokes says:
July 27, 2013 at 12:40 pm
“If you run that on another computer, the time features won’t match. You’ll see eddies, but not synchronous. That’s because of the accumulation of error. There is no prediction of exactly what the temperature will be at any point in time. But the main patterns will be the same. The real physics being shown is unaffected.”
We get whirly patterns that look like real whirly patterns, ergo we’re right? Really? That’s it? Now in that case I have a surprise for you – see first graph in the headpost – the momdel temperatures don’t look anything like real ones. Ergo your “looks about right” argument can be used by skeptics to say – hmm, yeah, ok, according to the arguments of the warmists, the GCM’s are junk because it doesn’t look right. Well thank you.
Algorithm based upon integration and random number seeding will produce divergent results.
The random number algorithm will be slightly different on different operating systems and compilers; the seed generation will be different (if it is to be truly random) and the different results each iteration are amplified with the integration algorithms.
Actually, that is why they produce multiple runs in Monte Carlo batches. Howevver, I don’t think it should give us much confidence in the “models”
It’s disturbing to hear them blame this on rounding errors. As Ric noted all of the hardware should be conforming to the same IEEE standard which means it will result in the same precision regardless. Now the libraries and compilers are a different beast altogether and I could see them causing divergent results. Different numerical integration methods could also be at play here. Whatever the ultimate cause this is a creative definition of the word, “robust.”
Nick,
Seriously, you’ve just supplied an excellent example of why I don’t trust some warmists. The real physics being shown is unaffected. Did you go study the code really quickly there to determine what the possible sources of the discrepancies being discussed are? Did you check anything?
I doubt it. As Dirk says, the whirly patterns look pretty good, so … must be right.
At my job, in the areas we really care about and absolutely must not be wrong about, we try pretty hard to assume that everything is wrong until it’s verified as being right. Obviously there is only so far you can take this, but I’ll say this – if climate scientists treated the models as seriously and with as much rigor as engineers writing a fire detection / suppression system for a modern airliner treat their code (for example), you wouldn’t see issues like this.
Important question: Is this why Prof Murry Salby needed resources to rebuild his studies when he moved to Macquarie University?
In this post: http://wattsupwiththat.com/2013/07/08/professor-critical-of-agw-theory-being-disenfranchised-exiled-from-academia-in-australia/
Prof Murry Salby wrote,
.
In the following online discussion this was thought peculiar. It seemed strange to me too but not being an IT guy I made no comment… at least no comment worth remembering.
But now it seems to make sense to me.
Am I right?
Very interesting. I manage a team of software engineers but am new enough to coding to not fully understand some of the discussion here. It appears to be suggested that unless the coding team really know their stuff, differences between models when run on different machines are to be expected. Could someone expand on this a bit or tell me where to start reading?
Don’t you just love those rounding errors? I remember when a major airplane manufacturer upgraded their mainframes back in the ’60s and some parts they designed on the new system no longer fit properly. Turns out both the old system and the new one had errors in the floating point routines that no one had apparently found. The errors between the systems were just different enough. The engineers had apparently learned to work with the old ones, probably without even realizing what or why. When they moved to the new system, their adaptations no longer worked, so, the parts no longer fit. Needless to say that shook thinks up for a bit.
“primarily due to the treatment of rounding errors by the different software systems”
Why in the bloody hell are they just figuring this out? Those of us who are engineering physicists, engineers, or even straight code programmers are taught this in class and we even learn to write programs to determine the magnitude of these errors. That these people are just now studying this and figuring it out is the height of incompetence!
Geez!
Back in my 286/287 ASM days, I was
playingworking with the then-novel Mandelbrot set, which has incredible detail down to a gazillion decimal places. Double precision with the 287 ran only about 17 places, though. As long as I was working in the zero point something region of the set, I could get down to the sub-atomic details, but once I got above one point zero, all that fine stuff went away and I was limited to larger patterns. That’s a characteristic of floating point math, but with the models, I don’t think it makes a nickel’s worth of difference.The models are not real climates, the initial values are not real weather values, only approximations, so computing to 40 or 50 decimal places just wastes irreplaceable electrons. Sort of a reverse of the ‘measure with a micrometer, cut with a chainsaw’ idea.
Dan Margulis, the Photoshop guru, did some work trying to see if there was an advantage by image processing in 12 bits per primary as opposed to the usual 8 bits, that’s 1024 shades of red, green, or blue, as opposed to 256 shades apiece. There’s a similarity to climate in that you have a great many little pieces adding to an overall picture. His conclusion was that examining at the pixel level, you could see subtle differences, but the overall picture looked the same. Waste of time.
The unreality of the climate models is not due to any imprecision in the calculations or to any failure to translate any particular model into verified code. It’s due to the assumptions behind the models being wrong.
Could someone expand on this a bit or tell me where to start reading?
ALL computer math processing systems use approximations in order to achieve a mathematical result. I just read where in C they are trying to get rid of the term granularity, which has to do with the rounding errors. Here is what they say.
https://www.securecoding.cert.org/confluence/display/seccode/VOID+Take+granularity+into+account+when+comparing+floating+point+values
P.J. Plauger objected to this rule during the WG14 review of the guidelines, and the committee agreed with his argument. He stated that those who know what they are doing in floating point don’t do equality comparisons except against a known exact value, such as 0.0 or 1.0. Performing a fuzzy comparison would break their code. He said that if a fuzzy comparison would be necessary, then it is because someone has chosen the wrong algorithm and they need to go back and rethink it.
ALL Floating point calculating systems have this problem.
A few years ago (2008), I tried to git GISS GCM Model-E, one of the climate models used by James Hansen and Gavin Schmidt at NASA, to run under Windows. Since it is written in a combination of Fortran, Perl, and Unix shell scripts, I needed to make a few, fairly simple, modifications. After simulating about 28 days of weather, it would always crash with because the radiation from some source was less than zero. Since inspection revealed that the code was full of tests for other parameters being less than zero, I assumed that these were added as necessary.
Besides the fact that it wouldn’t run, there were a number of other issues.
* Years were exactly 365 days long – no leap years
* Some of the physical constants were different than their current values
* The orbital computation to determine the distance between the Earth and the Sun was wrong
It was when I discovered a specific design error in using Kepler’s equation to compute the orbital position that I quit playing with the code. I wrote a short paper explaining the error, but it was rejected for publication because “no one would be interested”.
Ric Werme says:
July 27, 2013 at 11:35 am
Yes .. except that single precision floating point numbers may use a different number of bits, depending on the compiler and the compile flags.
Nick Stokes: There is no prediction of exactly what the temperature will be at any point in time. But the main patterns will be the same.
That is what is hoped for: that the distributions of the predicted quantities (e.g. July 2017, 2018, 2019, 2020 mean and standard deviations of temp and rainfall, etc.) Has it been shown to be true? Over a 30 year simulation, there is no reason to expect these small system variations to cancel out. What you’d expect would be a greater and greater divergence of the model from that which is intended to be modeled.
Jonathan Abbott says:
July 27, 2013 at 1:11 pm
“Very interesting. I manage a team of software engineers but am new enough to coding to not fully understand some of the discussion here. It appears to be suggested that unless the coding team really know their stuff, differences between models when run on different machines are to be expected. Could someone expand on this a bit or tell me where to start reading?”
As to the precision question, George E Smith put it best with his example. Yes, you must know what you’re doing when working with number formats. The first Ariane 5 rocket was lost because they copied an algorithm for the stabilization from a smaller Ariane and didn’t have the budget to re-test it. Turned out that the bigger mass of the new rocket lead to an overflow. The algorithm was fine for the small rocket but not for the big one. All that would have been necessary was going from, I think, a 16 bit integer to a 32 bit integer or something like that; I think it wasn’t even a floatingpoint number.
Didn’t test; lost the rocket.
Jonathan Abbott says:
July 27, 2013 at 1:11 pm
” where to start reading?”
Find out the professor who invented the IEEE 754 floating point standard. He’s written very good articles and books about it.
Dennis Ray Wingo: That these people are just now studying this and figuring it out is the height of incompetence!
It does strike me as rather late in the game for this. Imagine, for the sake of argument, that a patent application or medical device application depended on the validation of this code.
Edward Lorenz pretty much came to almost the exact same conclusion, in regards to almost the exact same computational problem almost 50 years ago; this is as the Warmistas would say is “settled science”..
Climate is a stochastic process, driven by the laws of random probability. To emulate this, a GCM will need a random number generator. There must be thousands of random number generator algorithms in use in computer systems, but none of them are anywhere near perfect. To get around this, you set constraints for the upper and lower limits of the number returned by the random number generator, and if the return value doesn’t pass the test, you ask the random number generator to try again. So to debug an unstable GCM, the first thing I’d look at is the constraints.
Some variation between identical programs running on different hardware and software platforms is to be expected. The usual way to seed a random number generator is from the system clock. Small differences in system speed can yield large departures.
Then there’s CROE, or cumulative round off error. This was a big problem back in the 1950’s but it has since been solved, so that CROE will only occur once in maybe 10 to the 20th calculations. However. A typical GCM will run on a superfast computer for 20, 30, 40 or more days and perform gazillions of calculations. Inevitably CROE is going to happen several times.So the second place I’d look at in an unstable GCM is the waypoints where numbers are examined for reasonableness.
Normally as I said the odds on a computational error are so small it’s not worth wasting time checking the results, but here the law of large numbers comes into play.
Jonathan Abbott says:
July 27, 2013 at 1:11 pm
————-
I found this discussion linked from Stack Overflow I think:
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
Paul Linsay says:
July 27, 2013 at 12:50 pm
[…] The climate is a very high dimensional system with severe uncertainty about the actual dynamical mechanisms, and mediocre data that covers the atmosphere and oceans very sparsely.
———————————————————————————————————
Very well stated but, also, very irrelevant.
Whether or not the models have, or are even capable of developing, skill in forecasting in no way impacts their utility as long as 9.7 out of 10 Climate Scientists (who’s owners expressed a preference) accept there output.