But it Worked in the Simulation!

Perils of computer simulation of complex systems

Story submitted by Ricky Seltzer

John F. McGowan, Ph.D., writing on math-blog, describes the various ways in which breakthrough science can be misunderstood and miscalculated even by top-flight computer simulation.  (One example of breakthrough science, of course, would be climate modeling).

Another important aspect, is that common-mode errors, where different teams rely on the same erroneous data, or repeat the same error, are common.  Different teams have been observed to make the same or equivalent errors in constructing software.   There were some experiments in the previous century which assigned the same task to different, competing teams of software developers.  It was found that some errors appeared in many or all of the teams.

Here is one reference from Lawrence C Paulson:

“Redundancy in software means having several different teams code the same functions. This has been shown to improve reliability, but the improvement is much less than would be expected if the failure behaviours were independent. This suggests that different teams make similar coding errors or fail to consider similar unlikely cases.”

Short summary:

Story title:

Software bugs, physics errors, typos, and so on all can combine to lead good people astray.  Read the whole thing.

0 0 votes
Article Rating
21 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Editor
June 6, 2011 10:02 pm

I remember diagnosing my first Unix bug, well before I had even looked at the code. When my boss and I found I was right, he was quite impressed and asked me how
I figured it out. In addition to clues as to when it (hung output) triggered and how we could clear it, I commented “It seemed like the sort of mistake I would make.”

rbateman
June 6, 2011 10:27 pm

I would say the Manhattan Project went along the lines of Oppenheimer himself:
“The secret was not how it was built, but that it worked.”
Which tells me that what he was really saying was that there were many ways to produce an A-bomb with the yield in mind. That doesn’t mean an A-bomb can be built any old way. N. Korea tested one or two that were reported fizzles, yielding little in the way of pop.
So, if that is true, and the Manhattan Project was a fluke of being accomplished in many different means, then Science is merely suffering from the Law of Diminishing Returns.
Subsequent breakthroughs are going to get harder: The low-hanging fruit is depleted.
That doesn’t bode well for Fusion, but the results so far indicate a slam-dunk is not in the cards.
Think about it. Fusion is nothing less than a miniature Star, complete with currents, flares, spots, etc., and we don’t yet fully understand our own Sun.

jorgekafkazar
June 6, 2011 11:50 pm

Fascinating article. Mark Twain said “A man who carries a cat by the tail learns something he can learn in no other way.” Simili modo, learning the pitfalls of modeling (or any sort of complex programming) can be learned only by participating in molar-gritting, mind-numbing, face-flushing, sphincter-smoking, toe-curling failure of the most abysmal sort. An ounce of the resultant humility is worth a year of college programming instruction. Unfortunately, humility is rarely found in academia in recent times.

June 7, 2011 1:05 am

Software models often get things right for the wrong reasons.
A fellow software developer once told me the story of an attempt to build an automatic satellite recon system which could spot tanks.
Worked fine in development, then failed totally on its first full scale test.
On investigation, it was discovered that all the majority of aerial “training” photos which had been shown to the system, which contained a tank, were taken on cloudy days, but the training photos which didn’t contain a tank were taken on sunny days.
So what they had actually built was a satellite system capable of spotting whether the day was sunny.

Alan the Brit
June 7, 2011 2:21 am

Puters? Be wery wery wary!!!! Excuse the poor Elma Fudd impression. I still come across consulting engineers who employ graduates who tell me every now & then, when I merely question the magnitude of a bending moment or shear force, that that is what it said on the print out. Well it may well do, it’s meaningless though. If you get the principle wrong, no amount of computing power will make it right, you just get the wrong answer faster!

John Marshall
June 7, 2011 2:48 am

Errors have certainly sneaked into the climate models.

June 7, 2011 4:51 am

It is not so strange I think. Errors propagate, and soon enough people forget about assumptions made in the beginning and starts to think of those as correct.
Coders are no different from other people. Like minded people think alike, as the saying goes. Now take several teams of likeminded people to solve a problem within a limited context with a limited set of words and logical rules, such as a programming language, is it any wonder then that all with the same level of knowledge make the same rights or wrongs in solving the problem?
Also all teams involved in a creating a program has to trust that each team did right. The ones coding the model engine has to trust that the data gathered by that other “team” is correct. The ones doing the abstract coding of using the engine has to trust the data is sound and that the engine is done right, and so on. In essence the engine might work flawlessly and the user inputted code might be sound, but if the data is corrupt, GIGO, garbage in garbage out. However, since all code has bugs, in the engine and certainly in what the users input and in between, you get freaking frankensteins that pop out.
So the data can’t be on Bobs uncles mums word, it has to have actual integrity.

wsbriggs
June 7, 2011 4:55 am

The best lesson in computer programming I was ever taught was, “You always work with the guy who puts bugs in your code. The better you get at programming, the better he gets at putting bugs into it.”

June 7, 2011 5:33 am

Read the whole article. It is worth it. Click the link.

Murray
June 7, 2011 7:15 am

Zero defect or 6 sigma techniques can be applied to a lot of complex issues to greatly reduce error rates, but seem to be little known in the scientific community. In making complex microchips, with hundreds of process steps, each subject to some degree of variation, thousands of chips per wafer and low millions of transistors or gates per chip, we learned how to ship end products at defect rates below 3 parts per million. It was impossible to test in this level of outgoing quality, it had to be designed and built in, with every step in design and process involved. When we applied the same techniques to accounting we lowered journal entry defects by 4 orders of magnitude in less than one year. I don’t know how well these methods would apply to writing software, but have to believe they could be very helpful.

PB-in-AL
June 7, 2011 7:39 am

My experience as both a computer programmer and sysadmin is that no matter how idiot-proof one makes their system, there is always a better idiot that comes along.

Atomic Hairdryer
June 7, 2011 7:53 am

Re Eric Worrall

A fellow software developer once told me the story of an attempt to build an automatic satellite recon system which could spot tanks.

It must be something about tanks. I heard a similar story about an anti-tank missile that wanted to attack telegraph poles. Allegedly that was due to it’s definition of a tank being an object with a barrel or pole sticking out of it and it’s pole-detecting routines being rather too sensitive.

LarryD
June 7, 2011 8:17 am

Open Source means that the code is visible not just to the development team, but to any interested party, who may be an expert in a relevant field the development team didn’t have on board.
I think it was Torvalds who put it “given enough eyeballs, all bugs are shallow”.
Re: fusion. Rbateman, but that is “the box”, thinking that to get fusion we must replicate stellar conditions. Tokamak is an expensive bust, but Polywell and Focus Fusion show promise, and the research is relatively cheap.

Doug Proctor
June 7, 2011 8:33 am

From my experience in the oil and gas business, simulations and calculations of oil and gas field performance are consistently in error because the complexity of the world is greater than understood. But in spite of that, we use, in our simulations/calculations, hard numbers such as “permeability”, “porosity”, “water saturation” and reservoir pressures. But we also say that, for example, in the Cardium Formation, a flow-cutoff of 0.5 mD exists, i.e. there is no oil production below 0.5 mD, while in the Glauconite, gas – a much easier substance to flow – needs the rock to be in excess of 10.0 mD for decent performance. So 0.5=10.0.
The process of understanding fluid flow in rock is completely and utterly observational. We have basic “conceptual” equations, but each has a number of “fudge” factors specific to the case. When you point out to engineers – as I have – that their entire analysis work is a fudging the numbers to get the one you believe in, i.e. that seems reasonable, they are furious: they have years of schooling and powerful computers at their fingertips. Of course they are not recreating `belief`! But they are: someone earlier back-calculated (back engineered) what the a, m, n, shrinkage, recovery factor (this, especially) and so on should be, based on what was got out of the ground compared to what was calculated in the ground. Even the calculations of the oil or gas in the ground are contaminated by fudge factors: we assume a homogeneity that history tells us does not exist.
There is something I call Limits to Knowledge. As it applies to my discipline, there is only a certain amount of knowledge you can get out of study before you must test. I worked on the Hibernia oilfields off the coast of Newfoundland, Canada. In the Gulf offices there was a room full of simulation studies. Three different core style studies had been done. There was about 20 years of work in there. Yet all there were were 9 wells drilled, a limited amount of core taken, a swath of older 2D seismic data from surface boats, and two 3D seismic programs. All 20 years of study were studies of churning the same ground. And the result was that estimations of the reservoir rock were based on measuring the time between two reflections on seismic, using an AVERAGE speed of the seismic waves between them to get an interval thickness, and then multipling by 0.40 to get the reservoir thickness.
A career length of time for multiple people came down to guesstimate of 0.40. Of course the first well drilled showed that all the simulations were off. A reserve potential of 150 million barrels went up to 600 million barrels. Costs also went up as more wells were needed than expected.
And this from an industry said to be the most computer intensive on the planet.
People love to over-analyse on the basis that they will learn more by thinking more. There is a limit, very quickly reached, at which more work does not get better results. Climate science is clearly one of those. Diverse thinking gets diverse results, largely because though the system is vast and interconnected, we can only work on pieces, presuming that other factors are not significant. In climate science you can `prove`that the world is warming at an accelerating rate while others can `prove`that it hasn`t warmed since 1996. The benefit of more computer modelling was reached a long time ago; more and different data is what is needed now.
But not, apparently, by those who say the science is `scettled`. That is like my engineer associates who confidently predicted the recovery of 85% of a large, Leduc oilfield in central Alberta, the result of which came from their brilliant management of the field. Then the 85% became 92%, and cause for great excitement. Then 95%. By 103% of recovery they went quiet, as it was now clear that they had underestimated the volume of rock that was being drained.
It is not just the public that misunderstand practical science. It is the practicioners. There is a larger than demonstrable belief in the certainty of its results. If it comes from a computer, if you give it two decimal points, it is more real than if you work it out with pencil and paper and round it off. But the certainty of it is not. Yet the former gets the budget allocation and the investment. (I know: I play by these rules myself.)
The world abounds in uncertainty but people dislike uncertainty and cling or grasp at anything and anyone who promises them true knowledge. Think of Harold Camping, who said the world would end on 21 May, 2011. At 6 pm. Not at 5:35, note. Now that 21 May passed uneventfully, he says he made a mistake and it will end on 21 October. His remaining followers stand firm because they want to have certainty in their lives, even if it is in their after-lives.
The retailers who start selling Halloween stuff by 6 September again this year will be the wisest of the bunch.

Ben of Houston
June 7, 2011 8:46 am

Murray, NASA has a similar system for coding. It involves 2-6 hours of debugging for every hour coding. Sometimes bugs still slip through, but at a rate that amazes people.
The problem is that writing code isn’t a rote procedure that is repeated like manufacturing or accounting. Each time is a unique piece of code (otherwise, you either copy-paste or write a module and include it). Each programmer has their own style of thinking and their own process. This goes doubly so with science programmers, who are generally either a Chemist/Mathematician with some knowledge of Fortran or a Programmer who hasn’t had anything beyond Freshman level solid sciences. People who are formally educated in both are rare, and it shows. Go through the Climategate data and you’ll see horrific spaghetti code. Look at others and you’ll see nice, tight code that doesn’t understand thermodynamics or proper methods of interpolation.
One thing I like about the engineering college is the fact that they told us flat out to not trust our models, and they gave us process control assignments in the lab where we saw how far our models were off on something so simple as controlling the water level in a 1-foot tall surge tank. We couldn’t get better than a sinusoid with a 1-inch swing. Reality has a nice way of knocking you down to size. Too bad people just “purity” of science by how separated from reality it is.

tadchem
June 7, 2011 11:34 am

As a scientist who worked with engineers routinely, I had to constantly re-educate them about assumptions, approximations, and extrapolations WRT computations.
Even when I made them express their assumptions explicitly, recognize the approximations, and limit the extrapolations to a fraction of the range of the fitted data, they still stubbornly clung to insignificant digits and disregarded uncertainties.
With even proofreaders becoming extinct, there is little hope for quality communications coming from the research establishments.

Auto
June 7, 2011 1:27 pm

My background is nautical. I’ve driven oil tankers – over a thousand feet long, nearly two hundred fet wide, and drawing upwards of sixty five feet of water [L = 330m; B = 55m; D = 21m for the mertical folk]. It had abvout fifteen tanks – and we had to calculate the weight of oil in the beast; tank tables give you volume for a given ullage or depth. four or five temperures – accurate to a degree fahrenheit – give you temperature – hence density of ‘what you had’ rather than at 15 degrees celsius. the reference temperature. And the company forms had weight to 3 decimal places of a tonne – to, effectively, the nearest kilo. Given the measurement uncertainties [did the ship change shape when loaded to her ‘calculated empty state? how much was she rolling? how (in)accurate was the temperature measurement/guess? etc.] I reckoned I could be confdent of the ‘thousands’ figure, and the hundreds figure was close. After that – guesswork, at best. Assumptions approximations and estimations. It’s better now – more temperatures are taken, but the ullages ares till a guess to the nearest centimetre or so – and that’s several barrels in a big tank . . . . .

Paul Penrose
June 7, 2011 5:21 pm

Professional software engineers have known this for a long time. And as Murry pointed out there are processes in any engineering discipline that can reduce errors significantly, if followed. Unfortunately none of these were followed by the people that wrote the GCMs. Maybe they just didn’t know how, or maybe they never thought they would be relied on to justify, in part, totally restructuring our global energy supply. But it is almost a given that they all contain serious errors, and most likely common ones. So no solace can be taken in the fact that they all produce a similar result.

Greg Cavanagh
June 7, 2011 6:40 pm

I used to believe that logic was easily conclusive and absolute.
Until I wrote a small scale urban stormwater backwater program. Two significant leasons were learned.
One; The original calculation methodoligy for any given aspect of the process, has a huge number of assumptions within it to even get a formlua in print. It will work reliably and predicably within a set limited range of conditions. It may or may not reflect reality.
Two; No matter how careful you are within a routine’s logic, you MUST hand check every calculation and trace every process. Do not believe it will work, do not believe it produces the correct answer, no matter how expected the output is from a routine. If you don’t trace it and check it, your foolin’ yourself.
Once the software is released and in the hands of the user, they will believe its bug free and completely reliable. Another failure of logic.

Steve C
June 8, 2011 3:30 am

An interesting article, and one which should perhaps be recited each morning by anyone who uses a computer for any more than just accessing their emails and the net. Most of us have experienced unexpectedly eccentric results output by even quite simple programs, yet on the global level we are supposed to believe that anything which comes out of a computer is God’s own truth – despite the evidence of a collapsed international monetary system and climate “predictions” bordering on the insane, to name but two.
The efficacy of a computer model – even assuming the complete absence of error – also depends crucially on the understanding of the underlying physical processes built into its routines. I blush to admit it, but I still waste far more of my spare time than I ought playing with Microsoft’s ‘Pinball Arcade’ (1995), which works remarkably well precisely because we have a pretty good understanding of the mechanics of motion at the scale of pintables. A professional friend has also let me play with electronics simulation software which is far beyond my price range, and it knocked me sideways that the thing could simulate stupid mistakes, like a transistor deep inside the circuit heating up and failing … but then, the transistors it simulates are made to defined specifications by us, so again we ought to be able to define their characteristics very tightly. Yet even excellent simulations like these can fail without warning: one of the pinball games, for instance, suffers from a problem I call “quantum flipper”, where the ball unexpectedly “tunnels” through the flipper and escapes at moments of high excitement, a feature I have never seen in a real-world table. Implementing it would be an interesting challenge for Gottlieb’s design team!
How apparently intelligent “climate scientists” can claim such conviction that their models accurately model climate, though, given (a) humanity’s very partial understanding of a small subset of the processes which drive and affect climate, (b) our very partial records of only the most recent years and (c) the undeniably chaotic nature of the actual system, I really don’t know. To what extent it is just normal human hubris and to what extent political machination is debatable, but the glaringly obvious nature of the non-fit between models and reality strongly suggests to me that politics is far more to blame; we face, as a world, very great challenges in bringing the vaulting ambition of the unelected, self-styled “élite” to heel. British readers, incidentally, may have seen the latest TV series from Adam Curtis, “All Watched Over by Machines of Loving Grace”, but I’d advise everyone to look out for it on the interweb, watch and learn – Curtis’s work, as usual, serves to illustrate well the frightening disconnection of the great majority of modern humanity from reality. (3 x 1-hour programmes.)
Thanks to Dr McGowan, and to Ricky Seltzer for finding the article and putting it here. I’ve kept a copy, and will be passing it on to others with an interest in humanity’s current state. We may live in interesting times, but those same times are, at least, driving some of us to thoughtful analyses of humanity’s (many) follies.
PS – My no-longer-young eyesight struggles a bit with the new smaller font in the Comment box. (Just saying, hint, hint 🙂

Ricky Seltzer
June 10, 2011 5:48 am

I have tracked down a reference to the phenomenon of common-mode errors in Multi-Version programming:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.363
Some experiments were done with having multiple teams code the exact same problem, in the hopes of improving reliability. But they ran into significant limits that were hard to overcome: people made similar errors, so comparing the output of one program to the output of another could fail to find errors.
—-
The story of the tank-finders that spotted sunny days was, as far as I can recall, from a story in Scientific American about using neural networks to guide anti-tank missiles. One of you guys can track it down.
—-
The idea that scientists should “Open Source” the code that leads to their conclusions is behind some of the FOI requests such as those by Steve McIntyre of Climate Audit. Climategate uncovered some of that code and the fudge-factors therein.
—-
All the examples of high-reliability code utilize multiple rounds of fix-then-test. It takes 30 or 40 years to test climate models. Maybe longer. We might be done testing by the year 2500.