Another uncertainty for climate models – different results on different computers using the same code

New peer reviewed paper finds the same global forecast model produces different results when run on different computers

Did you ever wonder how spaghetti like this is produced and why there is broad disagreement in the output that increases with time?

Graph above by Dr. Roy Spencer

Increasing mathematical uncertainty from initial starting conditions is the main reason. But, some of it might be due to the fact that while some of the models share common code, they don’t produce the same results with that code owing to differences in the way CPU’s, operating systems, and compilers work. Now with this paper, we can add software uncertainty to the list of uncertainties that are already known unknowns about climate and climate modeling.

I got access to the paper yesterday, and its findings were quite eye opening.

The paper was published 7/26/13 in the Monthly Weather Review which is a publication of the American Meteorological Society. It finds that the same global forecast model (one for geopotential height) run on different computer hardware and operating systems produces different results at the output with no other changes.

They say that the differences are…

“primarily due to the treatment of rounding errors by the different software systems”

…and that these errors propagate over time, meaning they accumulate.

According to the authors:

“We address the tolerance question using the 500-hPa geopotential height spread for medium range forecasts and the machine ensemble spread for seasonal climate simulations.”

…

“The [hardware & software] system dependency, which is the standard deviation of the 500-hPa geopotential height [areas of high & low pressure] averaged over the globe, increases with time.”

The authors find:

“…the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.”

The initial conditions of climate models have already been shown by many papers to produce significantly different projections of climate.

It makes you wonder if some of the catastrophic future projections are simply due to a rounding error.

Here is how they conducted the tests on hardware/software:

Table 1 shows the 20 computing environments including Fortran compilers, parallel communication libraries, and optimization levels of the compilers. The Yonsei University (YSU) Linux cluster is equipped with 12 Intel Xeon CPUs (model name: X5650) per node and supports the PGI and Intel Fortran compilers. The Korea Institute of Science and Technology Information (KISTI; http://www.kisti.re.kr) provides a computing environment with high-performance IBM and SUN platforms. Each platform is equipped with different CPU: Intel Xeon X5570 for KISTI-SUN2 platform, Power5+ processor of Power 595 server for KISTI-IBM1 platform, and Power6 dual-core processor of p5 595 server for KISTI-IBM2 platform. Each machine has a different architecture and approximately five hundred to twenty thousand CPUs.

And here are the results:

model_CPUs_table2 — Table 2. Globally-averaged standard deviation of the 500-hPa geopotential height eddy (m) from the 10-member ensemble with different initial conditions for a given software system 383 (i.e., initial condition ensemble), and the corresponding standard deviation from the 10-member ensemble with different software systems for a given initial condition (i.e., software system ensemble).

While the differences might appear as small to some, bear in mind that these differences in standard deviation are only for 10 days worth of modeling on a short term global forecast model, not a decades out global climate model. Since the software effects they observed in this study are cumulative, imagine what the differences might be after years of calculation into the future as we see in GCM’s.

Clearly, an evaluation of this effect is needed over the long term for many of the GCM’s used to project future climate to determine if this also affects those models, and if so, how much of their output is real, and how much of it is simply accumulated rounding error.

Here is the paper:

An Evaluation of the Software System Dependency of a Global Atmospheric Model

Song-You Hong, Myung-Seo Koo,Jihyeon Jang, Jung-Eun Esther Kim, Hoon Park, Min-Su Joh, Ji-Hoon Kang, and Tae-Jin Oh Monthly Weather Review 2013 ; e-Viewdoi: http://dx.doi.org/10.1175/MWR-D-12-00352.1

Abstract

This study presents the dependency of the simulation results from a global atmospheric numerical model on machines with different hardware and software systems. The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.

h/t to The Hockey Schtick

0 0 votes

Article Rating

281 Comments

Inline Feedbacks

View all comments

Pamela Gray

August 4, 2013 10:21 am

ding ding ding ding. Brain just bumped up against limit***warning warning***shut down is immanent***warning warning***blip…………………………..

steverichards1984

August 4, 2013 11:05 am

As @Compu Gator was alluding too, it seems a badge of pride amongst ‘clever’ C programmers to ‘optimise’ each line of code.
So much so, that it becomes almost impossible for someone who needs to maintain that code to understand it 2 years later.
This is why we use subsets of C and C++.
It goes against the grain for ‘clever’ programmers to accept programming constraints ie do not do this, do that instead, that a whole set of tools and standards have grown up to ‘control’ the worst excess of clever programmers. See MISRA etc for a subset standard ‘proven’ to work in safety critical applications.
By use of ‘proven’ subsets one can be assured that the same code will produce the same results on any hardware that obeys the standard.
Which is what we want isn’t it?