Another uncertainty for climate models – different results on different computers using the same code

New peer reviewed paper finds the same global forecast model produces different results when run on different computers

Did you ever wonder how spaghetti like this is produced and why there is broad disagreement in the output that increases with time?

CMIP5-73-models-vs-obs-20N-20S-MT-5-yr-means1[1]Graph above by Dr. Roy Spencer

Increasing mathematical uncertainty from initial starting conditions is the main reason. But, some of it might be due to the fact that while some of the models share common code, they don’t produce the same results with that code owing to differences in the way CPU’s, operating systems, and compilers work. Now with this paper, we can add software uncertainty to the list of uncertainties that are already known unknowns about climate and climate modeling.

I got access to the paper yesterday, and its findings were quite eye opening.

The paper was published 7/26/13 in the Monthly Weather Review which is a publication of the American Meteorological Society. It finds that the same global forecast model (one for geopotential height) run on different computer hardware and operating systems produces different results at the output with no other changes.

They say that the differences are…

“primarily due to the treatment of rounding errors by the different software systems”

…and that these errors propagate over time, meaning they accumulate.

According to the authors:

“We address the tolerance question using the 500-hPa geopotential height spread for medium range forecasts and the machine ensemble spread for seasonal climate simulations.”

“The [hardware & software] system dependency, which is the standard deviation of the 500-hPa geopotential height [areas of high & low pressure] averaged over the globe, increases with time.”

The authors find:

“…the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.”

The initial conditions of climate models have already been shown by many papers to produce significantly different projections of climate.

It makes you wonder if some of the catastrophic future projections are simply due to a rounding error.

Here is how they conducted the tests on hardware/software:

Table 1 shows the 20 computing environments including Fortran compilers, parallel communication libraries, and optimization levels of the compilers. The Yonsei University (YSU) Linux cluster is equipped with 12 Intel Xeon CPUs (model name: X5650) per node and supports the PGI and Intel Fortran compilers. The Korea Institute of Science and Technology Information (KISTI; http://www.kisti.re.kr) provides a computing environment with high-performance IBM and SUN platforms. Each platform is equipped with different CPU: Intel Xeon X5570 for KISTI-SUN2 platform, Power5+ processor of Power 595 server for KISTI-IBM1 platform, and Power6 dual-core processor of p5 595 server for KISTI-IBM2 platform. Each machine has a different architecture and approximately five hundred to twenty thousand CPUs.

model_CPUs_table1

And here are the results:

model_CPUs_table2
Table 2. Globally-averaged standard deviation of the 500-hPa geopotential height eddy (m) from the 10-member ensemble with different initial conditions for a given software system 383 (i.e., initial condition ensemble), and the corresponding standard deviation from the 10-member ensemble with different software systems for a given initial condition (i.e., software system ensemble).

While the differences might appear as small to some, bear in mind that these differences in standard deviation are only for 10 days worth of modeling on a short term global forecast model, not a decades out global climate model. Since the software effects they observed in this study are cumulative, imagine what the differences might be after years of calculation into the future as we see in GCM’s.

Clearly, an evaluation of this effect is needed over the long term for many of the GCM’s used to project future climate to determine if this also affects those models, and if so, how much of their output is real, and how much of it is simply accumulated rounding error.

Here is the paper:

An Evaluation of the Software System Dependency of a Global Atmospheric Model

Song-You Hong, Myung-Seo Koo,Jihyeon Jang, Jung-Eun Esther Kim, Hoon Park, Min-Su Joh, Ji-Hoon Kang, and Tae-Jin Oh Monthly Weather Review 2013 ; e-Viewdoi: http://dx.doi.org/10.1175/MWR-D-12-00352.1

Abstract

This study presents the dependency of the simulation results from a global atmospheric numerical model on machines with different hardware and software systems. The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.

h/t to The Hockey Schtick

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

281 Comments
Inline Feedbacks
View all comments
Joe
July 27, 2013 1:42 pm

Need a shamefaced smiley and preview / edit function! That should, of course, read “their outputs”

DirkH
July 27, 2013 1:49 pm

Matthew R Marler says:
July 27, 2013 at 1:30 pm
“Dennis Ray Wingo: That these people are just now studying this and figuring it out is the height of incompetence!
It does strike me as rather late in the game for this. Imagine, for the sake of argument, that a patent application or medical device application depended on the validation of this code.”
When you write code for medical or transportation or power plant applications you know from the start about the safety integrity level (SIL 0..4) the critical core of the application must fullfill, and the level of validations necessary to get it certified. Including code reviews, test specifications, documentation that proves that the tests have been fullfilled by the software – including signatures of the responsible persons etc etc etc. (In the case of the Ariane 5 lost I mentioned above – no human life was endangered so no biggy.)
As climate science is not directly affecting human lifes it has no such burden and doesn’t need code validation, it is all fun and games. The life-affecting decisions only come about through the ensuing policies; and what would happen if we had to validate political decisions for their impacts on human life…. probably the state would not be able to get ANY decision through the validation process.

NZ Willy
July 27, 2013 1:53 pm

My reading is that the authors are comparing different software, and that the hardware is incidental. The operating systems may matter though, depending on native precision handling — I presume no “endian” effects. I disagree with Nick that chaotic system amplify discrepancies in initial conditions, because chaotic systems randomize and so the initial conditions get lost — the point of the “butterfly effect” is that there *isn’t* one. Maybe climatological software does have a butterfly effect, in which case, “their bad”.

dp
July 27, 2013 1:54 pm

My recollection from first classes in chaotic systems is that the starting point matters in the extreme and there are no insignificant digits. The underlying precision of the computing hardware, the floating point/integer math libraries, and math co-processors all contribute error in different ways at very small and very large numbers (regardless of sign).

DirkH
July 27, 2013 1:56 pm

Mike Mellor says:
July 27, 2013 at 1:32 pm
“Climate is a stochastic process, driven by the laws of random probability. To emulate this, a GCM will need a random number generator. ”
No. A chaotic system amplifies low order state bits so that they are over time left-shifted in the state word; this leads to small perturbations becoming larger differences over time. Chaos has nothing to do with an external source of randomness.
The left-shifting of the state bits means that any simulation with a limited number of state bits runs out of state bits over time; while the real system has a near infinitely larger resolution.
Meaning, in short, a perfectly deterministic system can be a chaotic system.

ikh
July 27, 2013 1:57 pm

I am absolutely flabbergasted !!! This is a novice programming error. Not only that, but they did not even test their software for this very well known problem.
Software Engineers avoid floating point numbers like the plague. Where ever possible we prefer to use scaled integers of the appropiate size. IEEE floating point implementations can differ within limits. Not all numbers can be exactly represented in a floating point implementation so it is not just rounding errors that can accumulate but representational errors as well.
When you do floating point ops, you always need to scale the precision of the calculation and then round to a lower precesion that meets your error bar requirements.
Thye problem is not the libraries they have used, or the differing compillers, or the hardware. It is the stupid code that makes the system non-portable.
If they can not even manage these basic fundementals. What the hell are the doing using multi-threading and multi-processing! These are some of the most difficiult Comp Sci techniques to master correctly. So I would be less than surprised if the also had race conditions throwing out their results.
Incredible!
/ikh

DirkH
July 27, 2013 1:58 pm

NZ Willy says:
July 27, 2013 at 1:53 pm
“I disagree with Nick that chaotic system amplify discrepancies in initial conditions, because chaotic systems randomize and so the initial conditions get lost”
No.

July 27, 2013 2:03 pm

DirkH says:
July 27, 2013 at 12:49 pm
Paul Linsay says:
July 27, 2013 at 12:50 pm
Thank you both for answering my questions and enlightening me on the ineluctable nature of these errors.
On climate forecasting, I’ve wondered out loud on some other threads that we know over the very long term that temperature varies only 8-10C between the depths of an ice age and the peaks of an interglacial, and that life on the planet seems to be an unbroken chain back >1 B yrs, so there weren’t comparatively short hidden periods of such severity in climate that ‘life’ was terminated (individual species like the dinosaurs and countless others, of course, didn’t survive – some climate, some asteroid impacts).
This would mean that the things we argue about here would be small ripples on the much larger trend – warming, ~stationary and cooling squarish waves. I would say this main mega trend is what we should be trying to nail down first. Hopefully it is at least a horizontal trend with oscillations of +/-4 to 5C. Despite the chaotic nature of climate, these mega trends are not so chaotic and some plausible explanations have been explored. The idea that we will drop like fishflies by 2C either way is a total crock and -2C is a heck of a lot more worrying than +2C . Try living in Winnipeg for a few years or Kano, Nigeria (I’ve done both). Cheap energy will fix any problems that arise on this scale, although when I was in Nigeria there was no airconditioning and I got acclimatized just fine.

Man Bearpig
July 27, 2013 2:03 pm

Ric Werme says:
July 27, 2013 at 11:46 am
Man Bearpig says:
July 27, 2013 at 11:26 am
> Rounding errors ? How many decimal places are they working to ? 1? 0?
“Digital” computers don’t use decimal numbers frequently, the most recent systems I know that do are meant for financial calculations.
The IEEE double precision format has 53 bits of significance, about 16 decimal places. Please don’t offer stupid answers.
===========================
Yes, and isn’t that wonderful? However, to what level can we actually measure the values that are entered as a starting point into the models. To calculate them to 16 decimal places is not a representation of the the real world.
What is the point of running a model to that precision when the error in real world data is nowhere near it? Particularly since the models do not represent reality. Most of the model outputs I have seen have shown a predicted rise in temperature, so these models are wrong to 16 decimal places.

July 27, 2013 2:11 pm

Just the tip of a rather large iceberg.
I know how much time and effort goes into identifying and fixing errors in commercial software of similar size to the climate models. With the crucial difference in commercial software that you can always say categorically whether a result is correct or not, which, of course, you can’t with the climate model outputs. Even so, commercial software will get released with hundreds of errors, which come to light over time.
Then the iterative nature of climate models will compound any error, no matter how minor.
As I’ve said before, baseing climate models on iterative weather models, was a fundamentally wrong decision from the beginning.

July 27, 2013 2:12 pm

Climate is chaotic and nonlinear. As such any equations are nonlinear and cannot be solved unless they are approximated with piece-wise linear equations. Such results are good for only a short time and must be reinitialized. The whole mess depends on accurate assumptions which are continually changing, making accurate predictions impossible.
The only thing we have is an observable past to varying degrees of accuracy. The further back we go the less resolution we have, both in measurements and detectable events. I think the best guess is to say the future will be similar to the past but not exactly like it. The biggest variable is the Sun and how it behaves. We see somewhat consistent solar cycles but never exactly the same. There is so much about the Sun’s internal dynamics we don’t know – if fact, we don’t know what we don’t know. That makes future predictions of climate variability to merely be a WAG since they are based on assumptions that are a WAG.

MarkG
July 27, 2013 2:15 pm

“Yes .. except that single precision floating point numbers may use a different number of bits, depending on the compiler and the compile flags.”
I believe it’s even more complex than that?
It’s a long time since I did any detailed floating point work on x86 CPUs, but from what I remember, the x87 FPU would actually perform the math using 80-bit registers, but when you copied the values out of the FPU to RAM, they were shrunk to 64-bit for storage in eight bytes. So, depending on the code, you could end up performing the entire calculation in 80-bit floating point, then getting a 64-bit end result in RAM, or you could be repeatedly reading the value back to RAM and pushing it back into the FPU, with multiple conversions between 64-bit and 80-bit along the way. That could obviously produce very different results depending on the code.
I would presume that a modern compiler would be using SSE instead of the old FPU, but I don’t know for sure.

July 27, 2013 2:19 pm

Non-linear complex systems such as climate are by their very nature chaotic, in the narrow mathematical meaning of that word. A tiny change in a constant value, such as numeric precision, leads to different results. But since we don’t have the math to handle such systems, which means they’re programmed in a linear fashion, this one goes into the usual GCM cockup bin.
Pointman

Don K
July 27, 2013 2:25 pm

more soylent green says:
July 27, 2013 at 11:15 am
Floating point numbers are not precise and computer use floating point numbers for very large or very small numbers. This is not a secret and while everybody who ever took a programming course probably learned it, most of us forget about it unless reminded.
True enough
Somebody who works as a programmer in the scientific or engineering fields where floating point numbers are routinely used should be aware of this issue. However, it appears that much of the climate model code is written by anybody but professionally trained computer programmers or software engineers.
In my experience except when doing fixed point arithmetic in an embedded system even trained programmers, scientists and engineers depend on the guys that designed the FPU and wrote the math libraries to handle the details of managing truncation and rounding. At very best, they might rearrange an operation to avoid subtracting a big number from another big number. And even that doesn’t happen very often.
Usually, that works pretty well. I doubt that anyone who wasn’t doing scientific programming 50 years ago has ever encountered 2.0+2.0=3.9999999….
However, it appears that we might have a situation that requires thought and analysis. Maybe a little — might just be a library bug in some systems or a hardware flaw in some CPU/FPU similar to the infamous Pentium 5 divide bug. Or it may be something much more fundamental.
How about we wait until we have sufficient facts before we rush to a judgement?

Tom Bakewell
July 27, 2013 2:26 pm

I remember the fine buzzwords ” intermediate product swell” from a review of one of the data fitting software packages on offer in the early 90’s. However a Google search shows nothing. So I guess we’re doomed to repeat the past (again)

Heather Brown (aka Dartmoor resident)
July 27, 2013 2:27 pm

The problems of floating point calculation (mainly rounding and losing significance by mixing very large an very small numbers as others have commented) are well-known to any competent computer scientist. For models of this complexity it is essential to have someone who specialises in numerical analysis write/check the code to avoid problems. When I worked in a University computer science department we frequently despaired about the “results” published by physicists and other scientists who were good at their subject but thought any fool could write a Fortran program full of floating point operations and get exact results.
After following many of the articles (at Climate Audit and elsewhere) about the lack of statistical knowledge displayed by many climate scientists, I am not surprised that they are apparently displaying an equal lack of knowledge about the pitfalls of numerical calculations..

DirkH
July 27, 2013 2:31 pm

Man Bearpig says:
July 27, 2013 at 2:03 pm
“The IEEE double precision format has 53 bits of significance, about 16 decimal places. Please don’t offer stupid answers.
===========================
Yes, and isn’t that wonderful? However, to what level can we actually measure the values that are entered as a starting point into the models. To calculate them to 16 decimal places is not a representation of the the real world. ”
The recommended way for interfacing to the real world is
-enter data in single float format (32 bit precision)
-During subsequent internal computations use as high a precision as you can – to reduce error propagation
-during output, output with single precision (32 bit floats) again – because of the precision argument you stated.
It is legit to use a higher precision during the internal workings. It is not legit to assign significance to those low order digits when interpreting the output data.
In this regard, the GCM’s cannot be faulted.

DirkH
July 27, 2013 2:33 pm

Pointman says:
July 27, 2013 at 2:19 pm
“Non-linear complex systems such as climate are by their very nature chaotic,”
No. Only when they amplify low order state bits. Complexity alone is not necessary and not sufficient. The Mandelbrot equation is not very complex yet chaotic.

July 27, 2013 2:32 pm

I think this is brilliant.
The computer modelers get to blame the disparity between their results and actual observations on their computers being unable to support enough significant digits, and they justify the need for new computers with more significant digits all in one fell swoop.
I’m not active in the CPU wars anymore. Anyone know if one of the semi-conductor companies is close to releasing 128 bit CPU’s? (and did they fund this study /snark)

DirkH
July 27, 2013 2:39 pm

MarkG says:
July 27, 2013 at 2:15 pm
“It’s a long time since I did any detailed floating point work on x86 CPUs, but from what I remember, the x87 FPU would actually perform the math using 80-bit registers, but when you copied the values out of the FPU to RAM, they were shrunk to 64-bit for storage in eight bytes. So, depending on the code, you could end up performing the entire calculation in 80-bit floating point, then getting a 64-bit end result in RAM, or you could be repeatedly reading the value back to RAM and pushing it back into the FPU, with multiple conversions between 64-bit and 80-bit along the way. That could obviously produce very different results depending on the code.”
Yes. OTOH you are allowed to store the full 80 bit in RAM; the “extended” data type supported by various compilers.
“I would presume that a modern compiler would be using SSE instead of the old FPU, but I don’t know for sure.”
SSE is a newer instruction set and requires either handcoding or an according library (maybe BOOST numerical does that, I’m not sure) or a very smart compiler who recognizes opportunities for vectorizing computations; when you’re lazy and don’t have top performance needs your throwaway computations like
double a=…;
double b=…;
b *= a;
will still create instructions for the old FPU core.

michael hart
July 27, 2013 2:42 pm

I’m not surprised. Not surprised at all.

Berényi Péter
July 27, 2013 2:43 pm

There is a general theory of non-equilibrium stationary states in reproducible systems (a system is reproducible if for any pair of macrostates (A;B) A either always evolves to B or never)
Journal of Physics A: Mathematical and General Volume 36 Number 3
Roderick Dewar 2003 J. Phys. A: Math. Gen. 36 631 doi:10.1088/0305-4470/36/3/303
Information theory explanation of the fluctuation theorem, maximum entropy production and self-organized criticality in non-equilibrium stationary states
Now, the climate system is obviously not reproducible. Earth is not black as seen from outer space, although in systems radiatively coupled to their environment maximum entropy production occurs when all incoming short wave radiation is thermalized, none reflected.
Unfortunately we do not have any theory at the moment, rooted in statistical mechanics, about non-reproducible (chaotic) systems.
Therefore the guys are trying to do computational modelling based on no adequate physical theory whatsoever. That’s a sure sign of coming disaster, according to my dream book.

DirkH
July 27, 2013 2:46 pm

Gary Pearse says:
July 27, 2013 at 2:03 pm
“This would mean that the things we argue about here would be small ripples on the much larger trend – warming, ~stationary and cooling squarish waves. I would say this main mega trend is what we should be trying to nail down first. ”
Yes. All my arguments about chaotic systems do not exclude the possibility of a coupling of the chaotic system to an external independent influence; meaning that a low frequency non-chaotic signal could be present; while the chaotic subsystem does its dance on top of that.
In fact I strongly believe in a Solar inflluence – due to recordings of the Nile level, the periodicity of Rhine freezings, etc etc. Svensmark will be vindicated.

DirkH
July 27, 2013 2:50 pm

Berényi Péter says:
July 27, 2013 at 2:43 pm
“Unfortunately we do not have any theory at the moment, rooted in statistical mechanics, about non-reproducible (chaotic) systems.”
Again: Reproducibility (or Determinism) and Chaos do describe different aspects.
A real life chaotic system of course has a for all practical matters infinite resolution of the state word; so the exact same starting condition cannot be maintained between two trials, giving the impression of “randomness”. Randomness might be present; but is not necessary for chaos.

July 27, 2013 2:58 pm

I see a number of questions upthread from people trying to understand what this is all about. To provide a WAY over simplified answer, cpu’s are just fine with basic math. No need to worry about them giving you a wrong answer when you are trying to balance your check book. But for very long numbers that must be very precise, there are conditions where the CPU itself will return a wrong answer. These conditions are called errata, and the semi-conductor companies actually publish the ones they know about. If there is a way to get around the problem, they publish that too. Here’s an example of same for the Intel i7, scroll down to page 17 to see what I mean:
http://www.intel.com/content/www/us/en/processors/core/core-i7-lga-2011-specification-update.html?wapkw=i7+errata
Since the errata for different CPU’s is different, even between two iterations of a CPU from the same manufacturer, the programmer must be aware of these things and make certain that the way they’ve written the code takes the errata into account.
In essence, the problem is worse than what the paper suggests. If the programmers failed to take the errata into account, there is little wonder that they get different results on different computers. The obvious question however, was never asked. Given a variety of results, which one is the “right one”? The answer to that is this:
If the programmer didn’t take errata into account, the most likely results is that they are ALL wrong.
Start the gravy train anew. They’ll need grants to modify their code, grants for new computers, grants for hiring students to write grant proposals…

Verified by MonsterInsights