New peer reviewed paper finds the same global forecast model produces different results when run on different computers
Did you ever wonder how spaghetti like this is produced and why there is broad disagreement in the output that increases with time?
Graph above by Dr. Roy Spencer
Increasing mathematical uncertainty from initial starting conditions is the main reason. But, some of it might be due to the fact that while some of the models share common code, they don’t produce the same results with that code owing to differences in the way CPU’s, operating systems, and compilers work. Now with this paper, we can add software uncertainty to the list of uncertainties that are already known unknowns about climate and climate modeling.
I got access to the paper yesterday, and its findings were quite eye opening.
The paper was published 7/26/13 in the Monthly Weather Review which is a publication of the American Meteorological Society. It finds that the same global forecast model (one for geopotential height) run on different computer hardware and operating systems produces different results at the output with no other changes.
They say that the differences are…
“primarily due to the treatment of rounding errors by the different software systems”
…and that these errors propagate over time, meaning they accumulate.
According to the authors:
“We address the tolerance question using the 500-hPa geopotential height spread for medium range forecasts and the machine ensemble spread for seasonal climate simulations.”
…
“The [hardware & software] system dependency, which is the standard deviation of the 500-hPa geopotential height [areas of high & low pressure] averaged over the globe, increases with time.”
The authors find:
“…the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.”
The initial conditions of climate models have already been shown by many papers to produce significantly different projections of climate.
It makes you wonder if some of the catastrophic future projections are simply due to a rounding error.
Here is how they conducted the tests on hardware/software:
Table 1 shows the 20 computing environments including Fortran compilers, parallel communication libraries, and optimization levels of the compilers. The Yonsei University (YSU) Linux cluster is equipped with 12 Intel Xeon CPUs (model name: X5650) per node and supports the PGI and Intel Fortran compilers. The Korea Institute of Science and Technology Information (KISTI; http://www.kisti.re.kr) provides a computing environment with high-performance IBM and SUN platforms. Each platform is equipped with different CPU: Intel Xeon X5570 for KISTI-SUN2 platform, Power5+ processor of Power 595 server for KISTI-IBM1 platform, and Power6 dual-core processor of p5 595 server for KISTI-IBM2 platform. Each machine has a different architecture and approximately five hundred to twenty thousand CPUs.
And here are the results:

While the differences might appear as small to some, bear in mind that these differences in standard deviation are only for 10 days worth of modeling on a short term global forecast model, not a decades out global climate model. Since the software effects they observed in this study are cumulative, imagine what the differences might be after years of calculation into the future as we see in GCM’s.
Clearly, an evaluation of this effect is needed over the long term for many of the GCM’s used to project future climate to determine if this also affects those models, and if so, how much of their output is real, and how much of it is simply accumulated rounding error.
Here is the paper:
An Evaluation of the Software System Dependency of a Global Atmospheric Model
Abstract
This study presents the dependency of the simulation results from a global atmospheric numerical model on machines with different hardware and software systems. The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.
h/t to The Hockey Schtick
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.

Parallel processors as found in GPUs, e.g., the Nvidia CUDA, also execute instructions across threads/warps in a non-deterministic order.
Mark
For a chaotic system even a perfect model would fail your proposed test because there is a degree of inherent variability involved. You simply can’t expect accurate predictions of what will happen in this case.
Varying the initial conditions is reasonable as it allows the model to exibit its internal variability so that you can find the shape of its attractors. The initial consitions should match measurement though, and the variations should be small – within measurement error.
However climate scientists seldom seek to find the shape of the attractor. They are focussed on temperature to the exclusion of all else and throw away most of their data to generate graphs of global average temperature. They then average over several runs and use the variability to compute error bounds for the prediction. The models can be falsified if reality strays out of those error bounds, but it is a very weak test and requires you to wait many years for data to possibly falsify it.
I suspect their models would fail a strong test, which would look at whether they exhibit the identified variability of the real climate. Do the models have el nino and la nina states? Do they exhibit the various ocean oscillations that have been identified? If they don’t do this they are wrong. You don’t have to wait many years to reach that conclusion.
ferd berple says:
July 28, 2013 at 2:58 pm
“Suddenly the future is not deterministic, it is probabilistic. The future doesn’t exist as a point in time, it exists as a probability function, with some futures more likely than others, but none written in stone… The very best you can hope for is to calculate the odds of something happening, and while the odds may favor a turn to the right, they don’t prohibit a turn to the left. the same for temperatures.”
Think of a cloud of gas. Every molecule in it is drifting and bouncing in random directions. Even if you could make an exact copy of the cloud with all its molecules in the exact place and speed at an instant T0, soon the positions of molecules in the two clouds would start to diverge because of quantum fluctuations and the most microscopic noise introduced from the environment.
Does that mean that you can’t simulate the behaviour of that gas cloud? Not at all. Your model loses very quickly the ability to predict where a single molecule will be, but it can simulate the shape of the cloud very well on some much longer time scale because its general behaviour has nothing to do with the particular trajectories of its molecules.
So it’s a matter of spatial and temporal scale: the current weather models can’t tell me if a small breeze will pass through my windows in the next minute, but they can predict the general weather on the city in two days; they will fail in telling which weather you’ll have in one year but (as climate models) they can show some skill in predicting if the average temperature for the whole planet will go up or down in ten years. If the climate is a chaotic system (letting aside catastrophic events of all kinds) it is possible that the same models will eventually fail completely on some very long time scale – tens of thousands of years, for example. But the rate of divergence from the modelled system is completely different from that of weather models as the level of detail at which they have to exhibit a skill is completely different.
ATheoK says: “…I’ll also mention D-Base which introduced thousands to databases, which I also hated extensively and deleted off of every computer under my management….”
I’m rather proud of having retained my luddite-like total ignorance of DBase. I never learned it, getting by with less profound (and less expensive) programs. This saved me hundreds of hours of learning curve, which I’ve doubtlessly squandered in other places, such as Lotus Agenda…
Dan Hughes says: July 28, 2013 at 3:03 pm
“Nick, if the method requires that the Courant sound-speed criterion be met, it can’t be “stretched a bit”. The grid is not 100 x 100 x 100 either. Your basic factor of 4 also isn’t “a bit”. Finally, accuracy generally requires a step size much smaller than the stability limit be used.”
The Courant “limit” gives a guide to a typical space dimension. It can be extended by higher order interpolation, for example. It gives an instability mode for the worst case – a wave that has wavelength equal to the Nyquist frequency – In fact the Nyquist freq is a half wavelength, so there’s a factor of 2 already. if this is damped by accident or design, you can extend. And so on.
The vertical dimension does not have a momentum equation; hydrostatic pressure is assumed. No wave propagation required.
The Courant condition is a stability condition, not an accuracy one. It controls the emergence of unstable modes. You need to resolve acoustic waves to prevent them going haywire, but they aren’t otherwise an important part of the solution.
I have just been looking at the latest snapshot of NASA GISS ModelE source code and it is a real eye opener.
In the main model directory there is over 333,000 lines of code. That is huge.
I did a search using ‘grep’ and found over 6,000 uses of the keyword real. This is used to declare floating point variables. Each line that declares real declares multiples variables, typicaly 3 to 6.
There are a few uses in the declaration of functions and some constants but at a rough guess they are less than 10 percent. My rough guess is that there are 20 to 30 thousand floating point variables in the GISS ModelE GCM. And some of those are arrays or matrixes.
There is no way on earth that anyone can manage this level of complexity with regard to rounding errors. I would suggest that nothing that is output from GISS ModelE has any value.
/ikh
.
ihk : “In the main model directory there is over 333,000 lines of code. That is huge.”
I am not a climate scientist but work in another area of physical science which requires large scale computing. 333000 lines of code is not huge. I know of and have used several codes which contain millions (about 10 million in the largest case) of lines of mixed Fortran and C++. They are full of floating point arithmetic – it cannot be avoided. The interesting thing is that these codes, written by different groups at different times, can, when applied to a given problem, be capable of giving the same results to virtually full machine precision i.e. about 15 significant figures. You do not have to be afraid of FP arithmetic and rounding errors. When these codes do give different answers, it is usually not the code, or the CPU, but the compiler – compilers make far more mistakes than people realise and quite often ‘mis-optimise’ a code in such a way that they change the results. We always check for problems of this nature – I hope the climate people do as well.
What is funny is we are getting a three group separation and it is based on lack of understanding
We have a group who for some reason believe that chaos means you can’t track and predict it because of some extremely sensitive reactions the so called butterfly effect.
We have a group who want to try and plaster over and ignore the problem and seem to accept the models will always be in error and this seems to include the climate scientists, Nick Stokes and the climate modelers.
Then you have a third group which come from different backgrounds and I include myself in that which know that both of those arguments are garbage because we routinely do it in science.
If you consider a patriot missile when it launches it only approximately knows where the target is. Worse as it closes on the target that target will most likely go into counter measures including going as random in movements as it can.
You will note when lose of prediction occurs in a patriot missile we call it what it is …. a software bug
http://sydney.edu.au/engineering/it/~alum/patriot_bug.html
The fact the climate models and climate model creators are not even sure if they have lock on the thing they are trying to model tells you they have not a clue what they are doing.
In effect the climate models are trying to launch a patriot that has its targeting correct at launch and the military used that system in earlier world wars against planes and we call it “flak” and it seems climate science hasn’t progressed far from there.
The problem to me seems to stem from the fact most of the climate scientists don’t want to put error feedback and analysis of the errors on the models and accept that they continue to deviate from reality and like Nick Stokes is arguing that it is unavoidable.
The stability of the Earth’s climate over 4 billion years tells you that the chaos effect in climate is not unmanageable in the same way a missiles random movements has limits and thus a patriot can track and destroy it the problem seems to be the climate scientists willingness to learn from hard science.
“Intel chips have an instruction pipeline that they try to keep full and not allow it to stall.”
Out-of-order execution produces the same results as in-order execution, because it has to wait for the previous operation on the data to complete before it can continue. If the compiler tells an Intel CPU to calculate a*b, then c*b, then add them, it does that. It may choose to calculate c*b before it calculates a*b, but that’s irrelevant, because the results will be the same either way.
It won’t convert those instructions into calculating a+c and then multiplying that sum by b, which would change the result. But the compiler may well do so.
My experience with compilers is very limited. It would only “catch” syntax errors if the code was written incorrectly but didn’t care about errors within the syntax. (i.e. If I left the “greater than” sign off a “blockquote” here, a compiler might catch it but wouldn’t care if within the blockquote I had said 2+2=5.)
Do the compilers spoken of here basically work the same way?
To a first approximation, numerical modeling *is* control of forward error propagation. Very often (and this is a rough, non-rigorous description) a finite difference model of a differential will have two solutions mixed in due to the finiteness with which variables in the system can be represented using finite-precision hardware. One will be, say, a decaying exponential, A exp(-n t) and the other will be a rising exponential B exp(n t). No matter how small B is made compared to A, over time it will come to dominate and wreck the convergence of the system. Mitigating this is half the battle.
Lessee now.
My Radio Shack TRS80 says we are probably doomed.
My Sinclair ZX80 says we are only slightly doomed.
But my Oric Atmos says we are very doomed indeed.
Which should I believe?
Paul Jackson says:
July 27, 2013 at 1:32 pm
Edward Lorenz pretty much came to almost the exact same conclusion, in regards to almost the exact same computational problem almost 50 years ago; this is as the Warmistas would say is “settled science”..
Thank you. I going to point that out myself.
LdB says:
July 28, 2013 at 7:20 pm
….
The stability of the Earth’s climate over 4 billion years tells you that the chaos effect in climate is not unmanageable in the same way a missiles random movements has limits and thus a patriot can track and destroy it the problem seems to be the climate scientists willingness to learn from hard science.
I like most of your argument. However, “stability” is a matter of perception. First, over that 4 by period, the atmosphere has evolved from something that would kill an unprotected human in very short order to the air we breath today. Next, there have been during the last 600 my a number of wildly violent swings in the size of the planetary biomass with accompanying swings in the number of species of plant, animal and even bacteria. Entire orders have been pruned. At present, atmospheric CO2 is in fact hovering near the lowest it has ever been within that same 600 my span, and if it drops by about half, the entire planet could once again see a massive extinction event due to failed primary green-plant productivity. The only “stability” per se over the last 4 by is the presence of life forms of one form or another throughout.
This thread has been a most illuminating one. I hope non computer programmers begin to understand that ‘heavy duty programming’ is not without problems!
But, in the engineering world, computer programs are built up from modules or units, each module or unit is tested before being incorporated into a larger program.
In engineering, each module/function is tested to make sure it behaves as expected.
We apply inputs, run it, and check the outputs are correct.
In climate science programming, I would expect the same, develop a small program/module/unit/function to process the effect of, say, particulates of a certain size, or specific gases of a specified concentration.
Where are the results of the thousands of experiments needed to prove that each small function of a GCM has been tested and verified before being incorporated into the complete model?
Having just had a quick look at the GISS IE model linked to earlier, I could see no obvious unit testing of each and every function.
I could see patches available to the WHOLE MODEL. How can you patch a COMPLETE MODEL without regression testing relevant units?
It beggars belief that these people are paid public money to ‘play with software’.
When programmers in Banks cause a failure and an online bank goes offline for a few hours due to a bad software update, people are sacked, careers ruined.
In climate science you can proudly boast of your untested model and gain awards??????
LdB says:(July 28, 2013 at 7:20 pm) “We have a group who want to try and plaster over and ignore the problem and seem to accept the models will always be in error and this seems to include the climate scientists, Nick Stokes and the climate modelers.”
You need to read Nick above where he says: “I bet you can’t get two different computers to keep a vortex shedding sequence in exactly the same phase for 100 cycles. But they will still give a proper vortex street, with frequency and spacing.”
Nick probably meant to say “statistically valid”, but with climate we do not know what is statistically valid when the climate changes either due to increased CO2 or exogenous (e.g. solar) influences.
steverichards1984 (July 29, 2013 at 1:53 am) “In climate science programming, I would expect the same, develop a small program/module/unit/function to process the effect of, say, particulates of a certain size, or specific gases of a specified concentration.”
They have had that for decades but it doesn’t help The simplest explanation is that the computations for individual molecules cannot be extended to the model of the planet as a whole because there are too many molecules to simulate. So they use various parameterizations instead which cannot be validated.
DirkH says:
July 27, 2013 at 2:31 pm
Man Bearpig says:
July 27, 2013 at 2:03 pm
“The IEEE double precision format has 53 bits of significance, about 16 decimal places. Please don’t offer stupid answers.
===========================
Yes, and isn’t that wonderful? However, to what level can we actually measure the values that are entered as a starting point into the models. To calculate them to 16 decimal places is not a representation of the the real world. ”
The recommended way for interfacing to the real world is
–enter data in single float format (32 bit precision)
-During subsequent internal computations use as high a precision as you can – to reduce error propagation
-during output, output with single precision (32 bit floats) again – because of the precision argument you stated.
It is legit to use a higher precision during the internal workings. It is not legit to assign significance to those low order digits when interpreting the output data.
In this regard, the GCM’s cannot be faulted.
Dirk you appear to have missed Man Bearpig’s point. The real world value you have just entered ‘in single float format’ was 12.5C. The actual value was 12.3C but the observer never thought about your problems with initial start parameters. The pressure was similarly slightly not the same the dew point etc etc. There are huge roundings and errors in the observations that provide the start point even from automatic sensors like ARGO and GOES. it is impossible to set the start parameters and inputs to the level of precision required to prevent the chaotic dispersion of results. People are living in a dreamworld if they think they can,
Ric Werme says:
July 27, 2013 at 5:16 pm
…….If two climate models are working by simulating the weather, then it really doesn’t matter if the instantaneous weather drifts widely apart – if the average conditions (and this includes tropical storm formation, ENSO/PDO/AMO/NAO/MJO and all the other oscillations) vary within similar limits, then the climate models have produced matching results. (If they’re really good, they’ll even be right.)
This bugs the heck out of me, so let me say it again – forecasting climate does not require accurately forecasting the weather along the way.
Another way of looking at it is to consider Edward Lorenz’s attractor”, seehttp://paulbourke.net/fractals/lorenz/ andhttp://en.wikipedia.org/wiki/Lorenz_system While modeling the attractor with slightly different starting points will lead to very different trajectories, you can define a small volume that will enclose nearly all the trajectory.
The trajectory is analogous to weather – it has data that can be described as discrete points with numerical values. If some of the coefficients that describe the system change, then the overall appearance will change and that’s analogous to climate.
The trick to forecasting weather is to get the data points right. The trick to forecasting climate is to get the the changing input and the response to changing input right.
Ric, I understand what you are saying but disagree that it is the way to build a model of the real world. I have bolded my concern. What you appear to be describing is what we see. Climate modelers impose their own guess on what the shape and extent of the Poincare section should be and bound their software to meet that guesstimate at future reality. These climate models allow tweaking and parameterization (tuning) so that the output is what the writers want not what will happen in the real world or what would happen if they took their thumbs off the scales. If these modelers were NOT bounding their programs in this way then the chaotic systems would show far more chaotic dispersal.
Blarney says:
July 28, 2013 at 4:12 pm
“So it’s a matter of spatial and temporal scale: the current weather models can’t tell me if a small breeze will pass through my windows in the next minute, but they can predict the general weather on the city in two days; they will fail in telling which weather you’ll have in one year but (as climate models) they can show some skill in predicting if the average temperature for the whole planet will go up or down in ten years. If the climate is a chaotic system (letting aside catastrophic events of all kinds) it is possible that the same models will eventually fail completely on some very long time scale – tens of thousands of years, for example. But the rate of divergence from the modelled system is completely different from that of weather models as the level of detail at which they have to exhibit a skill is completely different.”
You are describing the process of parameterizing a statistical description of a 50 times 50 km cell in a GCM.
Note that this statistical approach only works when a lot of instances of the described process happen in the grid box.
This approach breaks down when one process instance is larger than a grid box, for instance a convective front.
You have now entered the area where the faulty physics of the models must be described; and we leave the area where we discuss rounding errors and the implications of chaos.
David Gillies says:
One will be, say, a decaying exponential, A exp(-n t) and the other will be a rising exponential B exp(n t). No matter how small B is made compared to A, over time it will come to dominate and wreck the convergence of the system.
Not if the step size is chosen so as to satisfy the stability requirements for the solution of the discrete approximations to the continuous differential equations. Off-the-shelf ODE solver software will cut through this like a hot knife through warm butter. Roll-your-own finite difference methods can be easily analyzed to determine the stability requirements.
This problem is a classic that is used to illustrate various aspects of finite-precision arithmetic and numerical solutions of ODEs.
Consistent discrete approximations coupled with stable numerical solution methods always leads to convergence of the solutions of the discrete approximations to the solutions of the continuous equations. <emAlways.
@Robert Clemenzi and @ikh
Thank you for providing your comments on Model E. They echo my analysis of that code (i.e. it’s a real mess, with totally inadequate code comments and documentation). Robert – I’d like to hear more about your experiences with Model E. Perhaps you could write up a short essay that Anthony could publish.
For those who would like to see for themselves, the Model E source code can be downloaded from here:
http://www.giss.nasa.gov/tools/modelE/
For laughs, click on their link for the “general documentation” paper. Count how many equations you see in the entire paper…
@Dan_Hughes
Good to see you commenting here 🙂
@Nick Stokes
Thanks for the link to ocean circulation animation (I’ve seen it before). While, that’s not an example of “synthetic weather”, I think you point is that climate models are something like Large Eddy Simulation (LES) in CFD, wherein the Navier-Stokes equations are solved for the scales of turbulence resolvable on a given (usually very fine) mesh. In such simulations, the actual path of individual eddies don’t matter, only their impact on the time-averaged solution, which is what is of interest. While this is tempting analogy, the problem is that no one has demonstrated that this behavior should emerge from the differential equations which climate models are solving (which are NOT the Navier-Stokes equations, despite what some ill-informed climate scientists may assert). If you’d like to point to such an analysis, I’d be very interested to read more about this topic.
Duster says
July 29, 2013 at 12:18 am
The only “stability” per se over the last 4 by is the presence of life forms of one form or another throughout.
You forget the argument we are talking about surface temperature because that is what the climate models are supposed to be predicting.
Given the possible range of -157 degrees C to 121 degrees C which is the temperature range possible both at theory and as it is recorded on the ISS (http://science.nasa.gov/science-news/science-at-nasa/2001/ast21mar_1/)
We have never seen anything like those numbers within human occupancy and as best we can tell ever so therefore there is something that is some defining behavior this is the exact situation of a plane or a rocket trying to do random turns even by design the random they can do has limits because they have inertia.
You see the same inertia with temperature when the sun goes down the surface temperature drops but it doesn’t go to -157 degrees instantly does it .,… why?
Lorenz was talking about a particle something that has no real inertia you can’t get an instant butterfly effect on a plane, rocket or temperature of the planet they simply have to much inertia.
It’s the inertia of the system that makes them solvable and that is what makes some of the arguments by people just very silly.
Technically what happens with inertia is it creates integratibility over the chaos which is technical speak for saying that the chaos can not take only any value it can only change at some maximal rate. In layman speak a plane or rocket when trying to go chaotic can only bank and turn at a maximal rate which you would already know from common sense.
So butterfly effects and other chaotic behavior which exists and is unsolvable in mathematics and some fields holds little relevance to many real world physics where the chaos mixes with inertia components.
So when you people are talking about chaos and effects that appear in mathematics and books please be very careful because they don’t tend to be pure like that in the real world.
If you look at the Quantum Mechanics as a theory everything is popping into and out of existence completely randomly doesn’t get much more random and chaotic than that but yet you view the world as solid for exactly the same reason because your observation inertia makes it appear solid.
If you are still not convinced then I can do nothing but show you an example as many have tried to explain
“Hardware Implementation of Lorenz Circuit Systems for Secure Chaotic Communication Applications”
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3649406/
=> The experimental results verify that the methods are correct and practical.
So please do not tell us you can not get a model to lock to a chaotic system, you may not be able to but many of us can.
http://old.zaynar.co.uk/misc2/fp.html
I think this might be relevant for people who are discussing floating point error.
streflop helps mitigate this, although it is rarely used.
LdB says:
July 28, 2013 at 7:20 pm
“The problem to me seems to stem from the fact most of the climate scientists don’t want to put error feedback and analysis of the errors on the models and accept that they continue to deviate from reality and like Nick Stokes is arguing that it is unavoidable. ”
How does this differ from adjusting models to hindcast? Try applying it to tossing a coin or predicting percentage of time the jetstream will pass south of the UK or north of the UK, or over the UK, in 100 years time, in 50 years time, in 10 years time and so define a main driver of the UK’s climate.
“Technically what happens with inertia is it creates integratibility over the chaos which is technical speak for saying that the chaos can not take only any value it can only change at some maximal rate. ”
I’m not sure that matters since the differences between various predictions or projections are all well within the maximal rates of change of al the values involved. You appear to be talking about trimming an enormous space of possible solutions which still leaves behind a fairly large space.
The question is whether the “models” (i.e. emulations or simulations) can be validated. To use a simple example, convection from warming easily results in turbulent flow. The turbulence can be ignored using some bounds similar to temperature inertia. Then one could simplify parameters and formulas using average wind speeds.
One could also analyze structure to the chaotic system calculations like the self-organizing structures described here: http://www.schuelers.com/chaos/chaos1.htm Although weather is a lot simpler than that, it still has effects like atmospheric mixing that are ultimately controlled by a chaotic process. Without accurate calculations of the mixing efficiency predictions are impossible. This is even more important for oceans over the long run to model ocean cycles and even determine something as simple as thermal inertia which is well-bounded as you state.