Another uncertainty for climate models – different results on different computers using the same code

New peer reviewed paper finds the same global forecast model produces different results when run on different computers

Did you ever wonder how spaghetti like this is produced and why there is broad disagreement in the output that increases with time?

CMIP5-73-models-vs-obs-20N-20S-MT-5-yr-means1[1]Graph above by Dr. Roy Spencer

Increasing mathematical uncertainty from initial starting conditions is the main reason. But, some of it might be due to the fact that while some of the models share common code, they don’t produce the same results with that code owing to differences in the way CPU’s, operating systems, and compilers work. Now with this paper, we can add software uncertainty to the list of uncertainties that are already known unknowns about climate and climate modeling.

I got access to the paper yesterday, and its findings were quite eye opening.

The paper was published 7/26/13 in the Monthly Weather Review which is a publication of the American Meteorological Society. It finds that the same global forecast model (one for geopotential height) run on different computer hardware and operating systems produces different results at the output with no other changes.

They say that the differences are…

“primarily due to the treatment of rounding errors by the different software systems”

…and that these errors propagate over time, meaning they accumulate.

According to the authors:

“We address the tolerance question using the 500-hPa geopotential height spread for medium range forecasts and the machine ensemble spread for seasonal climate simulations.”

“The [hardware & software] system dependency, which is the standard deviation of the 500-hPa geopotential height [areas of high & low pressure] averaged over the globe, increases with time.”

The authors find:

“…the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.”

The initial conditions of climate models have already been shown by many papers to produce significantly different projections of climate.

It makes you wonder if some of the catastrophic future projections are simply due to a rounding error.

Here is how they conducted the tests on hardware/software:

Table 1 shows the 20 computing environments including Fortran compilers, parallel communication libraries, and optimization levels of the compilers. The Yonsei University (YSU) Linux cluster is equipped with 12 Intel Xeon CPUs (model name: X5650) per node and supports the PGI and Intel Fortran compilers. The Korea Institute of Science and Technology Information (KISTI; http://www.kisti.re.kr) provides a computing environment with high-performance IBM and SUN platforms. Each platform is equipped with different CPU: Intel Xeon X5570 for KISTI-SUN2 platform, Power5+ processor of Power 595 server for KISTI-IBM1 platform, and Power6 dual-core processor of p5 595 server for KISTI-IBM2 platform. Each machine has a different architecture and approximately five hundred to twenty thousand CPUs.

model_CPUs_table1

And here are the results:

model_CPUs_table2
Table 2. Globally-averaged standard deviation of the 500-hPa geopotential height eddy (m) from the 10-member ensemble with different initial conditions for a given software system 383 (i.e., initial condition ensemble), and the corresponding standard deviation from the 10-member ensemble with different software systems for a given initial condition (i.e., software system ensemble).

While the differences might appear as small to some, bear in mind that these differences in standard deviation are only for 10 days worth of modeling on a short term global forecast model, not a decades out global climate model. Since the software effects they observed in this study are cumulative, imagine what the differences might be after years of calculation into the future as we see in GCM’s.

Clearly, an evaluation of this effect is needed over the long term for many of the GCM’s used to project future climate to determine if this also affects those models, and if so, how much of their output is real, and how much of it is simply accumulated rounding error.

Here is the paper:

An Evaluation of the Software System Dependency of a Global Atmospheric Model

Song-You Hong, Myung-Seo Koo,Jihyeon Jang, Jung-Eun Esther Kim, Hoon Park, Min-Su Joh, Ji-Hoon Kang, and Tae-Jin Oh Monthly Weather Review 2013 ; e-Viewdoi: http://dx.doi.org/10.1175/MWR-D-12-00352.1

Abstract

This study presents the dependency of the simulation results from a global atmospheric numerical model on machines with different hardware and software systems. The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.

h/t to The Hockey Schtick

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
281 Comments
Inline Feedbacks
View all comments
Mike McMillan
July 27, 2013 9:01 pm

DirkH says: July 27, 2013 at 2:33 pm
… The Mandelbrot equation is not very complex yet chaotic.

The Mandelbrot equation terms are complex numbers, which should qualify it as complex.

Mike McMillan
July 27, 2013 9:49 pm

wsbriggs says: July 27, 2013 at 4:14 pm
For those of sufficient curiosity, get the old Mandelbrot set code, and set it up to use the maximum resolution of your machine. Now take a Julia set and drill down, keep going until you get to the pixels. This is the limit of resolution for your machine, if you’re lucky, your version of the Mandelbrot algorithm lets you select double precision floating point numbers which are subsequently truncated to integers for display, but still give you some billions of possible colors.
DirkH says: July 27, 2013 at 4:22 pm
… What he [wsbriggs] means is, zoom into it until it becomes blocky. The blocks you see are atrefacts because your computer has run out of precision. They shouldn’t be there if your computer did “real” maths with real numbers. floating point numbers are a subset of real numbers.

The colors in a Mandelbrot image have nothing to do with the values computed by the Mandelbrot algorithm.
Here’s a rundown on what’s going on.
The Mandelbrot set is a set of points in the complex plane. The points’ x coordinates are real numbers, and their y coordinates are imaginary numbers (numbers with i (the square root of minus 1) attached. Points whose distance from the origin is 2 or greater are not in the set.
The algorithm takes the coords (a complex number) of each point and squares it. Due to the quirky nature of complex number math, we get another complex number, the coords of another point somewhere else. We check to see if that next point is more than 2 away from the origin. If not, we repeat the operation until we get a point 2 or greater away, or until we’ve repeated a selected limit number of times. When that happens, we throw away all the results of our calculations and write down only the number of iterations performed, which becomes the value for the original point.
The color pixel we place on that point is one we’ve already chosen to represent that number of iterations and placed in a lookup table. We can pick as pretty a bunch of colors as we wish, and as many of them as we decide to limit the algorithm iterations to. Since points actually in the set will never exceed 2 regardless of the number of iterations, they’re all one color and not interesting, as are original points greater than 2. All the action is in the border regions, which in truth are not in the Mandelbrot set.
PDF file of the original Scientific American article that got the whole thing started.

Surfer Dave
July 27, 2013 10:18 pm

One should read Donald Knuth’s “The Art of Computer Programming” Section 4.2.2 “Accuracy of Floating Point Numbers” in Volume 2 to get a feel for how bad it can be. It turns out it is worse when there is addition and subtraction operations on floating point than for multiplication or division and in fact the normal axioms of arithmetic do not hold, for example the associative law does not hold, eg (a+b)+c is not always the same as a+(b+c), depending on the actual values of a, b and c.
Knuth also writes about random numbers, and they are an exceptionally difficult thing to produce.
So, when I did look at some of GCM source code a few years ago, I could see that there was no attempt to track errors from the underlying floating point system and that random numbers are used widely (which by itself is an indication that the “models” are suspect). I found instances where repetitive iterations used floating point additions, and it was clear to me that these models would quickly degenerate into nonsense results.

Chris Jesshope
July 27, 2013 11:07 pm

Surfer Dave says: (a+b)+c is not always the same as a+(b+c). This is absolutely correct. Deterministic results from a program rely on applying the order of operations in the same order. Different compilers may generate machine code in different schedules to achieve the same results, This is one source of non-determinism int he results of the same algorithm. Unfortunately there is another, which means that the same machine code may not generate the same results twice on the same machine as the hardware is able to reschedule operations that are not dependent on each other, i.e. the macine instructions may not necessarily excute in the same order between two runs on the same computer.
With a stable system, this is not a problem but where you have amplification of results which lose precision (i.e. differencing two almost equal numbers) then the results can be off by ordrs of magnitude.
I did my PhD modelling semi-conductor equations, very similar field equations to those used in weather and climate modelling. Because of this I have always distrusted the certainty expressed in the results. I was able to fit simulated to measured results just by making small adjustments to certain parameters, where those adjustments were not much larger than the rounding error.

jorgekafkazar
July 28, 2013 12:00 am

“Climate model” is an oxymoron. There is not and never will be a valid model of the Earth’s climate. What climatologists create are climate emulators, conglomerations of algorithms that produce an output with short-run climate-like properties. The result is not Earth climate, but meaningless academic exercises whose primary use is to deceive people, including themselves.

DirkH
July 28, 2013 12:38 am

Mike McMillan says:
July 27, 2013 at 9:49 pm

“DirkH says: July 27, 2013 at 4:22 pm

“… What he [wsbriggs] means is, zoom into it until it becomes blocky. The blocks you see are atrefacts because your computer has run out of precision. They shouldn’t be there if your computer did “real” maths with real numbers. floating point numbers are a subset of real numbers.”

The colors in a Mandelbrot image have nothing to do with the values computed by the Mandelbrot algorithm. […]”

If the values computed are not used, then why do you go on to say

“We check to see if that next point [the value computed in the last step – Dirk] is more than 2 away from the origin. If not, […]”

Your description of the algorithm is correct; but your first sentence is an obvious absurdity.

DirkH
July 28, 2013 12:40 am

Mike McMillan says:
July 27, 2013 at 9:01 pm

“DirkH says: July 27, 2013 at 2:33 pm
… The Mandelbrot equation is not very complex yet chaotic.
The Mandelbrot equation terms are complex numbers, which should qualify it as complex.”

….rimshot.

sophocles
July 28, 2013 12:52 am

Woo hoo! So 0.65 Deg.C or 0.7 Deg. C or whatever, is just a
cumulative rounding error!
Can all those PhD’s who created this nonsense called CAGW
and “Climate Change” please hand in their diplomas for
immediate incineration? C’mon, Mikey, this means YOU too!

Louis
July 28, 2013 2:03 am

No problem. Now they have an excuse to make adjustments to the model output data like they already do to temperature data. If they know how to adjust for things like urban heat effects, time-of-day problems, thermometer upgrades, and various errors, it must be a simple matter to adjust model output for rounding errors, right? After yearly adjustments, model predictions will magically match observations quite nicely. It was those pesky rounding errors that caused the models to be so far off in the first place. /sarc

Carbon500
July 28, 2013 2:31 am

The temperatures on the graph’s vertical axis are presumably anomalies – if so, from what period are the deviations please? I don’t want to go to the expense of buying the paper just for this scrap of information! Thank you in anticipation.

PaulM
July 28, 2013 2:57 am

This error wouldn’t be possible outside of academia.
In the real world it is important that the results are correct so we write lots of unit tests. These are sets of tests for each of the important sub routines in the program where we pass in a wide range of papmeters and compare the result to the expected result. So if you moved your program to a different computer that produces slightly differing results your unit tests would fail.
Unit tests also ensure that when you make changes to the program that you haven’t broken anything. A lot of software professionals that I know wouldn’t trust the results of any program that didn’t have a comprehensive suite of unit tests.

Chris Jesshope
Reply to  PaulM
July 28, 2013 4:00 am

As floating point operations are not associative, then if you change the order of operations you will get a different results (and you can not avoid this, you will get different ordering with different compilers/hardware). Unit tests on floating point have to look at value ranges. Anything outside that range is not necessarily an error but may indicate an unstable computation.

johnmarshall
July 28, 2013 3:01 am

It only goes to show that formulating government energy policy on model output is a stupid thing to do.

DirkH
July 28, 2013 3:34 am

Carbon500 says:
July 28, 2013 at 2:31 am
“The temperatures on the graph’s vertical axis are presumably anomalies – if so, from what period are the deviations please? I don’t want to go to the expense of buying the paper just for this scrap of information! Thank you in anticipation.”
The graph says it. All trend lines normalized to zero in 1979. No reference period necessary. 1979 is the reference.
I think the graph comes from Dr. Roy Spencer, you might be able to find out more, if necessary, on his blog.
http://www.drroyspencer.com

DirkH
July 28, 2013 3:36 am

Carbon500, the graph is not from the paper. And Anthony does not say it does. He put it there to explain the spaghetti graph concept only.

Nick Stokes
July 28, 2013 3:42 am

PaulM says: July 28, 2013 at 2:57 am
“This error wouldn’t be possible outside of academia.”

It isn’t an error. It is very well known that atmosphere modelling is chaotic (as is reality). Numerical discrepancies of all kinds grow, so that the forecast is no good beyond about 10 days. Rounding errors grow like everything else.
The alternative is no numerical forecast at all.

Mark
July 28, 2013 4:48 am

Dennis Ray Wingo says:
Why in the bloody hell are they just figuring this out? Those of us who are engineering physicists, engineers, or even straight code programmers are taught this in class and we even learn to write programs to determine the magnitude of these errors. That these people are just now studying this and figuring it out is the height of incompetence!
Maybe it has something to do with the mentality of “only a climate scientist is qualified to even comment about climate science”.
Thus an engineer, physicist, computer scientist, etc. who points out such issues is simply called a “denier” and ignored.

Mark
July 28, 2013 5:02 am

Robert Clemenzi says:
Besides the fact that it wouldn’t run, there were a number of other issues.
* Years were exactly 365 days long – no leap years
* Some of the physical constants were different than their current values
* The orbital computation to determine the distance between the Earth and the Sun was wrong

At which point any “rounding errors” resulting from calculations by the machine don’t really matter. Since the basic “physics” of the model is fiction.
It was when I discovered a specific design error in using Kepler’s equation to compute the orbital position that I quit playing with the code. I wrote a short paper explaining the error, but it was rejected for publication because “no one would be interested”.
More likely it would cause too much “loss of face”…

ikh
July 28, 2013 5:06 am

Nick Stokes says
“It isn’t an error.”
No Nick, its not an eror and you are not a Troll. It is a humongous novice error that is easily avoided by using fixed point in integral data types.
Btw, nice to see you admitting that the GCMs are no use beyond 10 days. That means we can throw away all those pesky projections to 2100.
/ikh
:

ikh
July 28, 2013 5:17 am

DirkH says:
July 27, 2013 at 6:13 pm
“As you say, re-ordering means non-dependancy. Assuming no CPU erratum this does not affect the outcome. Determinism is maintained.”
Nope. you are forgetting that re-ordering non-dependant floating point operations can give you different rounding error. That is not deterministic.
The same is also true in multi-threaded code without race conditions. But if climate scientists don’t even understand the basics of numerical programming, they are not likely to do well in the more complex world of multi-threading.
/ikh

ikh
July 28, 2013 5:26 am

davidmhoffer: There never has been and never will be a programming language that stops people from writing rubbish code. The advantage of C over Fortran is that it has block structure and strongly typed function signatures. Also, there are also a lot of open source tools to help untangle the spaghetti 🙂
/ikh

RC Saumarez
July 28, 2013 5:35 am

Do models work? Are they verifiable?
A modest suggestion for an experiment: Take a very large building and cause a flux of energy through it. Include a large amount of water, some of which is cooled until it is frozen. Instrument the building for temperature, pressure, flows etc.
Invite, say, 5 groups of climate modellers to simulate the behaviour of this experiment. Will they all get the same results? I doubt it.

July 28, 2013 5:46 am

The point about GCMs is that they’re unverifiable, which means they neither prove or predict anything to a rational person, unless they’re into belief rather than science.
http://thepointman.wordpress.com/2011/01/21/the-seductiveness-of-models/
Pointman

DirkH
July 28, 2013 5:55 am

ikh says:
July 28, 2013 at 5:17 am
“DirkH says:
July 27, 2013 at 6:13 pm
“As you say, re-ordering means non-dependancy. Assuming no CPU erratum this does not affect the outcome. Determinism is maintained.”
Nope. you are forgetting that re-ordering non-dependant floating point operations can give you different rounding error. That is not deterministic.”
You’re right! Thanks, I didn’t think of that!

DirkH
July 28, 2013 6:03 am

Robert Clemenzi says:
July 27, 2013 at 1:20 pm
“Besides the fact that it wouldn’t run, there were a number of other issues.
* Years were exactly 365 days long – no leap years
* Some of the physical constants were different than their current values
* The orbital computation to determine the distance between the Earth and the Sun was wrong
It was when I discovered a specific design error in using Kepler’s equation to compute the orbital position that I quit playing with the code. I wrote a short paper explaining the error, but it was rejected for publication because “no one would be interested”.”
It looks like we have entrusted the future of our energy infrastructure and economy to a bunch of amateur enthusiasts who somehow slipped into research institutes where they were mistaken for scientists.
Reminds me of the guy who sold the Eiffel tower… twice. (wikipedia).
Today he would become a climate scientist.

Mark Negovan
July 28, 2013 6:03 am

This is one of the most important posts here at WUWT on the subject of using GCM crystal balls and is one of the first ones that I have seen here that shows the true computational insanity of all of the climate models. Although many other articles have shown GCMs to be inconsistent with each other and in comparison to actual data, the actual problem with using GCMs to compute the future is the underlying fact that from a computer science perspective, any errors or invalid initial conditions are amplified and propagated through any model run. Also, GCMs fail to address error propagation, model uncertainty, or any other chaotic influence.
If you are designing an airplane wing using CFD, you absolutely have to determine the computational uncertainty to properly validate the engineering model. Error uncertainty in this case is bounded and the model can be adjusted to test the error extremes and hence validate that the wing stresses are within design parameters. You cannot do that in a time series extrapolation of the future state of a chaotic system. The problem is that if you just test the extremes of the error uncertainty after a time step in a GCM, you are just computing a different state of the system. Thus errors and uncertainty at any time step is unbounded and the error bound will increase to the extremes of where the climate has been in the past in a very short time.
The arguments that the scientists use in their claims that GCMs are robust (such as the one Nick Stokes uses in comments here ) are not scientifically demonstrable. You cannot change the rules of science and claim that error propagation does not matter in GCMs because of some magical property of the model. THIS IS THE ACHILLES HEAL OF GCMs.
There is no scientific basis that can support the prestidigitation of GCM prognostication.

1 5 6 7 8 9 12