New peer reviewed paper finds the same global forecast model produces different results when run on different computers
Did you ever wonder how spaghetti like this is produced and why there is broad disagreement in the output that increases with time?
Graph above by Dr. Roy Spencer
Increasing mathematical uncertainty from initial starting conditions is the main reason. But, some of it might be due to the fact that while some of the models share common code, they don’t produce the same results with that code owing to differences in the way CPU’s, operating systems, and compilers work. Now with this paper, we can add software uncertainty to the list of uncertainties that are already known unknowns about climate and climate modeling.
I got access to the paper yesterday, and its findings were quite eye opening.
The paper was published 7/26/13 in the Monthly Weather Review which is a publication of the American Meteorological Society. It finds that the same global forecast model (one for geopotential height) run on different computer hardware and operating systems produces different results at the output with no other changes.
They say that the differences are…
“primarily due to the treatment of rounding errors by the different software systems”
…and that these errors propagate over time, meaning they accumulate.
According to the authors:
“We address the tolerance question using the 500-hPa geopotential height spread for medium range forecasts and the machine ensemble spread for seasonal climate simulations.”
…
“The [hardware & software] system dependency, which is the standard deviation of the 500-hPa geopotential height [areas of high & low pressure] averaged over the globe, increases with time.”
The authors find:
“…the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.”
The initial conditions of climate models have already been shown by many papers to produce significantly different projections of climate.
It makes you wonder if some of the catastrophic future projections are simply due to a rounding error.
Here is how they conducted the tests on hardware/software:
Table 1 shows the 20 computing environments including Fortran compilers, parallel communication libraries, and optimization levels of the compilers. The Yonsei University (YSU) Linux cluster is equipped with 12 Intel Xeon CPUs (model name: X5650) per node and supports the PGI and Intel Fortran compilers. The Korea Institute of Science and Technology Information (KISTI; http://www.kisti.re.kr) provides a computing environment with high-performance IBM and SUN platforms. Each platform is equipped with different CPU: Intel Xeon X5570 for KISTI-SUN2 platform, Power5+ processor of Power 595 server for KISTI-IBM1 platform, and Power6 dual-core processor of p5 595 server for KISTI-IBM2 platform. Each machine has a different architecture and approximately five hundred to twenty thousand CPUs.
And here are the results:

While the differences might appear as small to some, bear in mind that these differences in standard deviation are only for 10 days worth of modeling on a short term global forecast model, not a decades out global climate model. Since the software effects they observed in this study are cumulative, imagine what the differences might be after years of calculation into the future as we see in GCM’s.
Clearly, an evaluation of this effect is needed over the long term for many of the GCM’s used to project future climate to determine if this also affects those models, and if so, how much of their output is real, and how much of it is simply accumulated rounding error.
Here is the paper:
An Evaluation of the Software System Dependency of a Global Atmospheric Model
Abstract
This study presents the dependency of the simulation results from a global atmospheric numerical model on machines with different hardware and software systems. The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.
h/t to The Hockey Schtick
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.

The climate models will forever be useless because the intial state of the climate can’t be put into them properly,in addition they will never have complete existing data,r all of the data needed that influences the climatic system of the earth in addition to not being able to account for all the data that might influence the climate of the earth , to give any sort of accurate climate forecast.
Anotherwords they are USELESS, and one can see that not only in their temperature forecast but by their basic atmospheric circulation and temperature profile forecast which have been 100% wrong.
As this decade proceeds the temp. trend will be down n response to the prolonged solar minimum and they will be obsolete.
For a non-tech savy person like me and others too, this is undoubtedly an eye opener and vital piece of information.
Richard M . . . . “when they properly model oceans and other natural climate factors” . . . . “they” will HAVE TO take into account all the causes of “natural climate variation factors” “weather” (whether) long term, short term and/or temporary term causes . . . to build that model around . . . . at this point “they” only look at the data which are the effects of the causes. Even I can confidently forecast that! Some of the “they” have been doing it bass ackwards for a long time. In my humble observation . . . and opinion.
Google: “methods and procedures for forecasting long-term climate in specific locals” see what you get!
Let me repeat a comment I made over a years ago:
Man Bearpig says:
July 27, 2013 at 11:26 am
> Rounding errors ? How many decimal places are they working to ? 1? 0?
“Digital” computers don’t use decimal numbers frequently, the most recent systems I know that do are meant for financial calculations.
The IEEE double precision format has 53 bits of significance, about 16 decimal places. Please don’t offer stupid answers.
Nick in his initial comment (that I snipped because he and I had a misunderstanding) said that climate models don’t have initial starting conditions.
They have to. They have to pick a year to start with, and levels of CO2, forcing, temperature, etc must be in place for that start year. The model can’t project from a series of arbitrary uninitialized variables.
The pitfalls may not just be in rounding. When accuracy is important it’s useful to understand how computers represent numbers. Mantissa and exponent for floating point numbers for example. The thing is, you have to bear in mind that you don’t have good intermediary results when you perform operations in an order that mixes very large numbers with a lot of important significant digits with very small numbers with a lot of important significant digits; if the programmer isn’t thinking this through its easy to loose significant digits along the way – one has to bear in mind that the intermediate answer (with floating point) will always be a limited number of significant digits at one scale, or exponent.
But these are well known, well studied problems in comp sci. There’s no reason for anybody who cares about numerical accuracy to be stung this way.
Terry, It would cost a lot of money to run at 128bit – one would need millions of dollars of coding, then the program would run 20 – 100x slower at 128 bit. At 32 bit you get muddy sludge in minutes with this kind of code.
The authors of the paper have done the right thing – its much cheaper to change compilers and optimization settings while staying at 64 bit.
Essentially this is the butterfly effect in numerical computing.
“Nick in his initial comment (that I snipped because he and I had a misunderstanding) said that climate models don’t have initial starting conditions.”
I didn’t say that they don’t have starting conditions – I’m well aware that all time dependent differential equations have to. I said that they don’t forecast from initial conditions, as I expanded on here.
So it’s really important to split hairs about the differnce between ‘global forecast models’ and ‘climate models’? So, until a ‘scientist’ produces a peer-reviewed paper showing that this divergence also occurs in ‘climate models’, we can safety asssume they are unaffected? Ha!
positive lyapunov exponents
I have been a programmer since 1968 and I am still working. I have been programming in many different areas including forecasting. If I have undestood this correctly this type of forecasting is architected so that forecastin day N is built on results obtained for day N – 1. If that is the case I would say that its meaningless. Its hard enough to predict from a set of external inputs. If you include results from yesterday it will go wrong. Period!
Anthony Watts says:
July 27, 2013 at 11:46 am
“Nick in his initial comment (that I snipped because he and I had a misunderstanding) said that climate models don’t have initial starting conditions.
They have to. They have to pick a year to start with, and levels of CO2, forcing, temperature, etc must be in place for that start year. The model can’t project from a series of arbitrary uninitialized variables.”
Initial starting conditions include the initial state of each cell of the models – energy, moisture, pressure (if they do pressure), and so on. The state space is obviously giantic (number of cells times variables per cell times resolution of the variables in bits – or in other words, if you can hold this state in one Megabyte, the state space is 2 ^ (8 * 1024 * 1024) – assuming every bit in the megabyte is actually used), and the deviation of the simulated system from the real system should best be expressed as the vector distance between the state of the simulation expressed as a vector of all its state variables, against the according vector representation of the real system. It is this deviation (length of the vector difference) that grows beyond all bounds when the system being simulated is chaotic.
Ingvar Engelbrecht says:
July 27, 2013 at 11:59 am
“I have been a programmer since 1968 and I am still working. I have been programming in many different areas including forecasting. If I have undestood this correctly this type of forecasting is architected so that forecastin day N is built on results obtained for day N – 1. ”
Yes, Ingvar, weather forecasting models as well as climate models are iterative models (basically very large finite state machines; where the program describes the transition table from one step to the next).
Numerical computation is an entire distinct subfield of computer science. There are many traps for the unwary, some language-specific and others deriving from hardware differences. It used to be worse, with different manufacturers using incompatible formats and algorithms, and most had serious accuracy problems on over/underflow events.
A major advance was the 1985 adoption of the IEEE 754 standard for floating point. One principal goal was to avoid abrupt loss of accuracy as quantities neared representation limits in calculation. But even with much better underlying representation and algorithms, there are plenty of opportunities for unseasoned programmers to get results significantly off what they should be. All floating point quantities are approximate, unlike integers which are always exact. It sounds like a simple distinction but it has profound implications for programming.
One assumes all the environments used were validated using a floating-point accuracy benchmark before they ran these models, but there is no benchmark to catch sloppy programming.
Intense numerical programming is not for amateurs.
Nick Stokes says:
July 27, 2013 at 11:30 am
“They no longer attempt to predict from the initial conditions. They follow the patterns of synthetically generated weather, and the amplification of deviation from an initial state is no longer an issue.”
That’s great to hear. As every single model run will develop an individual ever-growing discrepancy in its state from the state of the real system ( in the vector space, this can be imagined as every single model run accelerating into a different direction; picture an explosion of particles here), how is the mean of a thousand or a million model runs meaningful?
TIA
Unfortunately, Nick, climate (as formulated in most GCMs) is an initial value problem. You need initial conditions and the solution will depend greatly on them (particularly given the higlhy coupled, non-linear system of differential equations being solved).
“They follow patterns of synthetic weather”??
REALLY? Could you expand on that?? I have NEVER heard that one before…
mpaul says:
July 27, 2013 at 11:26 am
Excellent! When are you selling shares? 1 penny from everyone’s bank account will make me a millionaire in no time, 0.001 of a degree each month will prove the warmists are right!
DirkH says:
July 27, 2013 at 11:15 am
Partial restauration of my comment above that got snipped:
Reminder: mathematical definition of chaos as used by chaos theory is that a system is chaotic IFF its simulation on a finite resolution iterative model develops a deviation from the real system that grows beyond any constant bound over time.
Excellent article about an excellent article. If you “looked under the hood” of a high level model and talked to the very high level programmers who manage it, you would learn that they use various heuristics in coding the model and that the effects of those heuristics cannot be separated from the model results. Run a model on different computers, especially supercomputers, and you are undoubtedly using different programmers, heuristics, and code optimization techniques, none of which can be isolated for effects on the final outcome. Problems with rounding errors are peanuts compared to problems with heuristics and code optimization techniques.
But the bottom line is that no VP of Finance, whose minions run models and do time series analyses all the time, believes that his minions are practicing science. He/she knows that these are tools of analysis only.
TerryS says: July 27, 2013 at 11:43 am
“128 bit floating point numbers”
Extra precision won’t help. The system is chaotic, and amplifies small differences. It amplifies the uncertainty in the initial state, and amplifies rounding errors. The uncertainty about what the initial numbers should be far exceeds the uncertainty of how the computer represents them. Grid errors too are much greater. The only reason why numerical error attracts attention here is that it can be measured by this sort of machine comparison. Again, this applies to forecasting from initial conditions.
Ric Werme says: July 27, 2013 at 11:35 am
“The mesh size may not be automatically set by the model”
It’s constrained, basically by speed of sound. You have to resolve acoustics, so a horizontal mesh width can’t be (much) less than the time it takes for sound to cross in a timestep (Courant condition). For 10 day forecasting you can refine, but have to reduce timestep in proportion.
Out of my depth here, but that’s how one gleans an eclectic education these days (usually someone good at this sort of thing puts it all into a brief “for dummies” essay). Meanwhile, on faith at this point, I’m overwhelmed. A few thoughts come to mind for some kind reader to explain:
1) How can one model any complex phenomenon confidently given such confounding factors? Is there a fix?
2) Aren’t the rounding errors distributed normally? Can’t we simply take the mean path through the spaghetti arising from these errors?
3) Surely since we can’t hope for perfection in modelling future climate, i.e. the errors in components (and from missing components) is necessarily sizable, rounding errors would seem to be the smaller of the issues. If we were predicting a temperature increase of 2C by 2100, what would the std from the rounding errors be as an example?
[Removed as requested, see Green Sand’s comment below]
Frank K. says:
July 27, 2013 at 12:16 pm
“Unfortunately, Nick, climate (as formulated in most GCMs) is an initial value problem. You need initial conditions and the solution will depend greatly on them (particularly given the higlhy coupled, non-linear system of differential equations being solved).
“They follow patterns of synthetic weather”??
REALLY? Could you expand on that?? I have NEVER heard that one before…”
Synthetic weather is what we have in Virginia. Nature could not produce the rain that we have “enjoyed” this year. The synthetic “pattern” is endless dreariness.
Mods, sorry for previous OT comment, posted in error, please delete if not too inconvenient.