Another uncertainty for climate models – different results on different computers using the same code

New peer reviewed paper finds the same global forecast model produces different results when run on different computers

Did you ever wonder how spaghetti like this is produced and why there is broad disagreement in the output that increases with time?

CMIP5-73-models-vs-obs-20N-20S-MT-5-yr-means1[1]Graph above by Dr. Roy Spencer

Increasing mathematical uncertainty from initial starting conditions is the main reason. But, some of it might be due to the fact that while some of the models share common code, they don’t produce the same results with that code owing to differences in the way CPU’s, operating systems, and compilers work. Now with this paper, we can add software uncertainty to the list of uncertainties that are already known unknowns about climate and climate modeling.

I got access to the paper yesterday, and its findings were quite eye opening.

The paper was published 7/26/13 in the Monthly Weather Review which is a publication of the American Meteorological Society. It finds that the same global forecast model (one for geopotential height) run on different computer hardware and operating systems produces different results at the output with no other changes.

They say that the differences are…

“primarily due to the treatment of rounding errors by the different software systems”

…and that these errors propagate over time, meaning they accumulate.

According to the authors:

“We address the tolerance question using the 500-hPa geopotential height spread for medium range forecasts and the machine ensemble spread for seasonal climate simulations.”

“The [hardware & software] system dependency, which is the standard deviation of the 500-hPa geopotential height [areas of high & low pressure] averaged over the globe, increases with time.”

The authors find:

“…the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.”

The initial conditions of climate models have already been shown by many papers to produce significantly different projections of climate.

It makes you wonder if some of the catastrophic future projections are simply due to a rounding error.

Here is how they conducted the tests on hardware/software:

Table 1 shows the 20 computing environments including Fortran compilers, parallel communication libraries, and optimization levels of the compilers. The Yonsei University (YSU) Linux cluster is equipped with 12 Intel Xeon CPUs (model name: X5650) per node and supports the PGI and Intel Fortran compilers. The Korea Institute of Science and Technology Information (KISTI; http://www.kisti.re.kr) provides a computing environment with high-performance IBM and SUN platforms. Each platform is equipped with different CPU: Intel Xeon X5570 for KISTI-SUN2 platform, Power5+ processor of Power 595 server for KISTI-IBM1 platform, and Power6 dual-core processor of p5 595 server for KISTI-IBM2 platform. Each machine has a different architecture and approximately five hundred to twenty thousand CPUs.

model_CPUs_table1

And here are the results:

model_CPUs_table2

Table 2. Globally-averaged standard deviation of the 500-hPa geopotential height eddy (m) from the 10-member ensemble with different initial conditions for a given software system 383 (i.e., initial condition ensemble), and the corresponding standard deviation from the 10-member ensemble with different software systems for a given initial condition (i.e., software system ensemble).

While the differences might appear as small to some, bear in mind that these differences in standard deviation are only for 10 days worth of modeling on a short term global forecast model, not a decades out global climate model. Since the software effects they observed in this study are cumulative, imagine what the differences might be after years of calculation into the future as we see in GCM’s.

Clearly, an evaluation of this effect is needed over the long term for many of the GCM’s used to project future climate to determine if this also affects those models, and if so, how much of their output is real, and how much of it is simply accumulated rounding error.

Here is the paper:

An Evaluation of the Software System Dependency of a Global Atmospheric Model

Song-You Hong, Myung-Seo Koo,Jihyeon Jang, Jung-Eun Esther Kim, Hoon Park, Min-Su Joh, Ji-Hoon Kang, and Tae-Jin Oh Monthly Weather Review 2013 ; e-Viewdoi: http://dx.doi.org/10.1175/MWR-D-12-00352.1

Abstract

This study presents the dependency of the simulation results from a global atmospheric numerical model on machines with different hardware and software systems. The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.

h/t to The Hockey Schtick

Advertisements

  Subscribe  
newest oldest most voted
Notify of
Chris @NJSnowFan

Great report and read Anthony.
Tree Rings in trees also do the same thing in a way. The roots are grounded but soil conditions under each tree can vary from tree to tree giving different results even though the trees are same species and grow right next to each other or in same area.
Like how computer models run on different software.

Rhoda R

Didn’t some warmist researcher make a comment about not trusting data when it conflicted with models?

Dave Dardinger

I don’t know that I thought much about computing errors, but it should be known by everyone who wants to be taken seriously in the climate debate, that even the approximations to the actual equations of science being modeled in the Climate models cannot be kept from diverging for very long. This is why I cringe whenever i see the modelers trying to clam that their models are using the actually scientific equations underlying earth’s climate. They aren’t and they can’t.

Eric Anderson

Rounding errors. Digits at the extreme that are too small to deal with, but that, over time, end up affecting the outcome. Miniscule physical properties that we are unable to measure. The general uncertainty principle involved in trying to forecast how climate (made up of weather, made up of air particles, made up of individual molecules, made up of individual particles — both the precise location and trajectory of which cannot, in principle, even be measured).
Is it just the case that we need better computers, more accurate code, more decimals, more money, more measurements?
The doubt that keeps gnawing at me is whether it is possible — even in principle — to accurately model something like climate over any reasonable length of time.

Nick Stokes

[snipped to prevent your usual threadjacking. You are wrong- read it and try again – Anthony]

Richard M

Wild ass guess plus or minus rounding error is still just a WAG. Wake me up when they properly model oceans and other natural climate factors.

DirkH

Now did I lambast people for years here now with the mathematical definition of chaos as used by chaos theory, which states that a system is chaotic IFF its simulation on a finite resolution iterative model develops an error that grows beyond any constant bound over time? And was this argument ignored ever since by all warmists (to be fair; our resident warmists surely had their eyes glaze over after the word mathematical)?
Yes and yes.

more soylent green

Floating point numbers are not precise and computer use floating point numbers for very large or very small numbers. This is not a secret and while everybody who ever took a programming course probably learned it, most of us forget about it unless reminded.
Somebody who works as a programmer in the scientific or engineering fields where floating point numbers are routinely used should be aware of this issue. However, it appears that much of the climate model code is written by anybody but professionally trained computer programmers or software engineers.

Latitude

honest questions, I’m no longer sure that I fully understand these things:
1. the beginning of each model run is really hindcasting to tune the models??…if so, they missed that too
2. each model run is really an average of dozens/hundreds/thousands of runs, depending on how involved each model is?
3. if they hindcast/tuned the models to past temps, then they tuned them to past temps that have been jiggled and even if the models work, they will never be right?
4. the models have never been right…not one prediction has come true?
..I’m having an old age moment…and doubting what I thought I knew…mainly because it looks to me like people are still debating garbage

steven

Can’t wait to see what RGB has to say about this!

Butterfly errors producing storms of chaos in the models.

Mark Bofill

Look that’s just sad. I haven’t had to fool with it for 20 years since school, but there are methods for controlling this sort of thing, it’s not like numerical approximation and analysis are unknown frontiers for goodness sakes.

DirkH

[snip – I nipped his initial comment, because it was wrong and gave him a chance to correct it, you can comment again too – Anthony]

DirkH

more soylent green says:
July 27, 2013 at 11:15 am
“Somebody who works as a programmer in the scientific or engineering fields where floating point numbers are routinely used should be aware of this issue. However, it appears that much of the climate model code is written by anybody but professionally trained computer programmers or software engineers.”
They just don’t care. Anything goes as long as the funding comes in.

Billy Liar

Nick Stokes says:
July 27, 2013 at 11:13 am
[snip – I nipped his initial comment, because it was wrong and gave him a chance to correct it, you can comment again too – Anthony]

[snip – try again Nick, be honest this time
Note this:

While the differences might appear as small to some, bear in mind that these differences in standard deviation are only for 10 days worth of modeling on a short term global forecast model, not a decades out global climate model.

– Anthony]

Man Bearpig

Rounding errors ? How many decimal places are they working to ? 1? 0?

mpaul

hmm, I imagine this could become the primary selection criteria when purchasing new supercomputers for running climate models. The supercomputer manufacturers will start to highlight this “capability” in their proposals to climate scientists — “The Cray SPX -14 implements a proprietary rounding algorithm that produces results that are 17% more alarming than our nearest competitor”. Maybe the benchmarking guys can also get a piece of the action — “The Linpack-ALRM is the industry’s most trusted benchmark for measuring total delivered alarminess”.
Sorry, gotta go. I’ve got an idea for a new company I need to start.

Stephen Richards

Nick Stokes says:
July 27, 2013 at 11:13 am
Nick, read the UK Met off proud announcement that their climate model is also used for the weather forecast and in so doing validates their model.

I find it difficult to believe that this is due purely to the handling of rounding errors. Have the authors established what it takes to produce identical results? I would start with a fix hardware, OS, complier and optimization. Do two separate runs produce almost exactly matching result? If not, then the variation over different hardware/compiler is irrelevant. I am inclined to believe it is more likely something in the parallel compiler, in determining whether communication between nodes is synchronous. Modern CPU has cycle time of 0.3ns (inverse of frequency). Communication time between nodes is on the order of 1 micro-sec. So there maybe an option in the parallel compiler to accept “safe” or insensitive assumptions in the communication between nodes?

Billy Liar says: July 27, 2013 at 11:21 am
“So climate models are immune to rounding errors?”

No. Nothing is. As some have observed above, an atmosphere model is chaotic. It has very sensitive dependence on initial conditions. That’s why forecasts are only good for a few days. All sorts of errors are amplified over time, including, as this paper notes, rounding errors.
That has long been recognised. Climate models, as Latitude notes, have a long runup period. They no longer attempt to predict from the initial conditions. They follow the patterns of synthetically generated weather, and the amplification of deviation from an initial state is no longer an issue.

Curt

Nick – Climate models may not have specific initial conditions (e.g. today’s actual weather conditions to predict next week’s weather), but they must have initial conditions — that is, starting numerical values for the states of the system. If different software implementations of the same model diverge for the same initial conditions, that is indeed a potential problem.

Anthony, you say I’m wrong. What’s your basis for saying these are climate models?

Remarkable!

I expected the server to tell me it was pay walled, but got:

The server is experiencing an unusually high volume of requests and is temporarily unable to process your request.
Please try again in a moment or two.

It worked the second try – and was told the paper was paywalled.
This is rather interesting. I’m not a floating point expert, though I got a lesson in all that on one of my first “for the heck of it” programs that simulated orbital motion on Univac 1108.
I would think that IEEE floating point should lead to near identical results, but I bet the issues lie outside of that. The different runtime libraries use very different algorithms to produce transcendental functions, e.g. trig, exponentials, square root, etc. Minor differences will have major changes in the simulation (weather). They might even produce changes in the long term average of the output (climate).
CPUs stopped getting faster several years ago, coincidentally around the time the climate stopped warming. Supercomputers keep getting faster by using more and more CPUs and spreading the compute load across the available CPUs. If one CPU simulates it’s little corner of the mesh and shares its results with its neighbors, the order of doing that can lead to round off errors. Also, as the number of CPUs increase, that makes it feasible to use a smaller mesh, and a different range of roundoff errors.
The mesh size may not be automatically set by the model, so the latter may not apply. Another sort of problem occurs when using smaller time increments. That can lead to computing small changes in temperature and losing accuracy when adding that to a much larger absolute temperature. (Something like that was part of my problem in the orbital simulation, though IIRC things also got worse when I tried to deal with behavior of tan() near quadrant boundaries. Hey, it was 1968. I haven’t gotten back to it yet.)

“…the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.”

Wow. So we could keep the initial conditions fixed, but vary the decomposition. Edward Lorenz would be impressed.

The climate models will forever be useless because the intial state of the climate can’t be put into them properly,in addition they will never have complete existing data,r all of the data needed that influences the climatic system of the earth in addition to not being able to account for all the data that might influence the climate of the earth , to give any sort of accurate climate forecast.
Anotherwords they are USELESS, and one can see that not only in their temperature forecast but by their basic atmospheric circulation and temperature profile forecast which have been 100% wrong.
As this decade proceeds the temp. trend will be down n response to the prolonged solar minimum and they will be obsolete.

For a non-tech savy person like me and others too, this is undoubtedly an eye opener and vital piece of information.

Laurie Bowen

Richard M . . . . “when they properly model oceans and other natural climate factors” . . . . “they” will HAVE TO take into account all the causes of “natural climate variation factors” “weather” (whether) long term, short term and/or temporary term causes . . . to build that model around . . . . at this point “they” only look at the data which are the effects of the causes. Even I can confidently forecast that! Some of the “they” have been doing it bass ackwards for a long time. In my humble observation . . . and opinion.
Google: “methods and procedures for forecasting long-term climate in specific locals” see what you get!

TerryS

Let me repeat a comment I made over a years ago:

I’ve often thought it would be interesting to run one of these models with 128 bit floating point numbers and then repeat the exercise with 64 bit and 32 bit numbers and then compare the outputs. Any differences would highlight the futility in attempting to use models to predict a chaotic system.

Man Bearpig says:
July 27, 2013 at 11:26 am
> Rounding errors ? How many decimal places are they working to ? 1? 0?
“Digital” computers don’t use decimal numbers frequently, the most recent systems I know that do are meant for financial calculations.
The IEEE double precision format has 53 bits of significance, about 16 decimal places. Please don’t offer stupid answers.

Mark Bofill

The pitfalls may not just be in rounding. When accuracy is important it’s useful to understand how computers represent numbers. Mantissa and exponent for floating point numbers for example. The thing is, you have to bear in mind that you don’t have good intermediary results when you perform operations in an order that mixes very large numbers with a lot of important significant digits with very small numbers with a lot of important significant digits; if the programmer isn’t thinking this through its easy to loose significant digits along the way – one has to bear in mind that the intermediate answer (with floating point) will always be a limited number of significant digits at one scale, or exponent.
But these are well known, well studied problems in comp sci. There’s no reason for anybody who cares about numerical accuracy to be stung this way.

Terry, It would cost a lot of money to run at 128bit – one would need millions of dollars of coding, then the program would run 20 – 100x slower at 128 bit. At 32 bit you get muddy sludge in minutes with this kind of code.
The authors of the paper have done the right thing – its much cheaper to change compilers and optimization settings while staying at 64 bit.
Essentially this is the butterfly effect in numerical computing.

“Nick in his initial comment (that I snipped because he and I had a misunderstanding) said that climate models don’t have initial starting conditions.”
I didn’t say that they don’t have starting conditions – I’m well aware that all time dependent differential equations have to. I said that they don’t forecast from initial conditions, as I expanded on here.

So it’s really important to split hairs about the differnce between ‘global forecast models’ and ‘climate models’? So, until a ‘scientist’ produces a peer-reviewed paper showing that this divergence also occurs in ‘climate models’, we can safety asssume they are unaffected? Ha!

positive lyapunov exponents

I have been a programmer since 1968 and I am still working. I have been programming in many different areas including forecasting. If I have undestood this correctly this type of forecasting is architected so that forecastin day N is built on results obtained for day N – 1. If that is the case I would say that its meaningless. Its hard enough to predict from a set of external inputs. If you include results from yesterday it will go wrong. Period!

DirkH

Anthony Watts says:
July 27, 2013 at 11:46 am
“Nick in his initial comment (that I snipped because he and I had a misunderstanding) said that climate models don’t have initial starting conditions.
They have to. They have to pick a year to start with, and levels of CO2, forcing, temperature, etc must be in place for that start year. The model can’t project from a series of arbitrary uninitialized variables.”
Initial starting conditions include the initial state of each cell of the models – energy, moisture, pressure (if they do pressure), and so on. The state space is obviously giantic (number of cells times variables per cell times resolution of the variables in bits – or in other words, if you can hold this state in one Megabyte, the state space is 2 ^ (8 * 1024 * 1024) – assuming every bit in the megabyte is actually used), and the deviation of the simulated system from the real system should best be expressed as the vector distance between the state of the simulation expressed as a vector of all its state variables, against the according vector representation of the real system. It is this deviation (length of the vector difference) that grows beyond all bounds when the system being simulated is chaotic.

DirkH

Ingvar Engelbrecht says:
July 27, 2013 at 11:59 am
“I have been a programmer since 1968 and I am still working. I have been programming in many different areas including forecasting. If I have undestood this correctly this type of forecasting is architected so that forecastin day N is built on results obtained for day N – 1. ”
Yes, Ingvar, weather forecasting models as well as climate models are iterative models (basically very large finite state machines; where the program describes the transition table from one step to the next).

Alan Watt, Climate Denialist Level 7

Numerical computation is an entire distinct subfield of computer science. There are many traps for the unwary, some language-specific and others deriving from hardware differences. It used to be worse, with different manufacturers using incompatible formats and algorithms, and most had serious accuracy problems on over/underflow events.
A major advance was the 1985 adoption of the IEEE 754 standard for floating point. One principal goal was to avoid abrupt loss of accuracy as quantities neared representation limits in calculation. But even with much better underlying representation and algorithms, there are plenty of opportunities for unseasoned programmers to get results significantly off what they should be. All floating point quantities are approximate, unlike integers which are always exact. It sounds like a simple distinction but it has profound implications for programming.
One assumes all the environments used were validated using a floating-point accuracy benchmark before they ran these models, but there is no benchmark to catch sloppy programming.
Intense numerical programming is not for amateurs.

DirkH

Nick Stokes says:
July 27, 2013 at 11:30 am
“They no longer attempt to predict from the initial conditions. They follow the patterns of synthetically generated weather, and the amplification of deviation from an initial state is no longer an issue.”
That’s great to hear. As every single model run will develop an individual ever-growing discrepancy in its state from the state of the real system ( in the vector space, this can be imagined as every single model run accelerating into a different direction; picture an explosion of particles here), how is the mean of a thousand or a million model runs meaningful?
TIA

Frank K.

Unfortunately, Nick, climate (as formulated in most GCMs) is an initial value problem. You need initial conditions and the solution will depend greatly on them (particularly given the higlhy coupled, non-linear system of differential equations being solved).
“They follow patterns of synthetic weather”??
REALLY? Could you expand on that?? I have NEVER heard that one before…

Greytide. Middle England sceptic

mpaul says:
July 27, 2013 at 11:26 am
Excellent! When are you selling shares? 1 penny from everyone’s bank account will make me a millionaire in no time, 0.001 of a degree each month will prove the warmists are right!

DirkH

DirkH says:
July 27, 2013 at 11:15 am
Partial restauration of my comment above that got snipped:
Reminder: mathematical definition of chaos as used by chaos theory is that a system is chaotic IFF its simulation on a finite resolution iterative model develops a deviation from the real system that grows beyond any constant bound over time.

Theo Goodwin

Excellent article about an excellent article. If you “looked under the hood” of a high level model and talked to the very high level programmers who manage it, you would learn that they use various heuristics in coding the model and that the effects of those heuristics cannot be separated from the model results. Run a model on different computers, especially supercomputers, and you are undoubtedly using different programmers, heuristics, and code optimization techniques, none of which can be isolated for effects on the final outcome. Problems with rounding errors are peanuts compared to problems with heuristics and code optimization techniques.
But the bottom line is that no VP of Finance, whose minions run models and do time series analyses all the time, believes that his minions are practicing science. He/she knows that these are tools of analysis only.

TerryS says: July 27, 2013 at 11:43 am
“128 bit floating point numbers”

Extra precision won’t help. The system is chaotic, and amplifies small differences. It amplifies the uncertainty in the initial state, and amplifies rounding errors. The uncertainty about what the initial numbers should be far exceeds the uncertainty of how the computer represents them. Grid errors too are much greater. The only reason why numerical error attracts attention here is that it can be measured by this sort of machine comparison. Again, this applies to forecasting from initial conditions.
Ric Werme says: July 27, 2013 at 11:35 am
“The mesh size may not be automatically set by the model”

It’s constrained, basically by speed of sound. You have to resolve acoustics, so a horizontal mesh width can’t be (much) less than the time it takes for sound to cross in a timestep (Courant condition). For 10 day forecasting you can refine, but have to reduce timestep in proportion.

Gary Pearse

Out of my depth here, but that’s how one gleans an eclectic education these days (usually someone good at this sort of thing puts it all into a brief “for dummies” essay). Meanwhile, on faith at this point, I’m overwhelmed. A few thoughts come to mind for some kind reader to explain:
1) How can one model any complex phenomenon confidently given such confounding factors? Is there a fix?
2) Aren’t the rounding errors distributed normally? Can’t we simply take the mean path through the spaghetti arising from these errors?
3) Surely since we can’t hope for perfection in modelling future climate, i.e. the errors in components (and from missing components) is necessarily sizable, rounding errors would seem to be the smaller of the issues. If we were predicting a temperature increase of 2C by 2100, what would the std from the rounding errors be as an example?

Green Sand

[Removed as requested, see Green Sand’s comment below]

Theo Goodwin

Frank K. says:
July 27, 2013 at 12:16 pm
“Unfortunately, Nick, climate (as formulated in most GCMs) is an initial value problem. You need initial conditions and the solution will depend greatly on them (particularly given the higlhy coupled, non-linear system of differential equations being solved).
“They follow patterns of synthetic weather”??
REALLY? Could you expand on that?? I have NEVER heard that one before…”
Synthetic weather is what we have in Virginia. Nature could not produce the rain that we have “enjoyed” this year. The synthetic “pattern” is endless dreariness.

Green Sand

Mods, sorry for previous OT comment, posted in error, please delete if not too inconvenient.