Quote Of The Week #13

qotw_cropped

From Gary Strand, software engineer at the National Center for Atmospheric Research (NCAR) commenting on Climate Audit:

As a software engineer, I know that climate model software doesn’t meet the best standards available. We’ve made quite a lot of progress, but we’ve still quite a ways to go.

I’ll say. NASA GISS model E written on some of the worst FORTRAN coding ever seen  is a challenge to even get running. NASA GISTEMP is even worse. Yet our government has legislation under consideration significantly based on model output that Jim Hansen started. His 1988 speech to Congress was entirely based on model scenarios.

Do we really want congress to make trillion dollar tax decisions today based on “software [that] doesn’t meet the best standards available.”?

There’s more. Steve McIntyre comments:

Re: Gary Strand (#56),

Gary, if this is what you think, then this should have been reported in IPCC AR4 so that politicians could advise themselves accordingly. I do not recall seeing any such comment in AR4 – nor for that matter in any review comments.

…and to the second part of the comment:

Re: Gary Strand (#56),

If we can convince funding agencies to better-fund software development, and continued training, then we’ll be on our way. It’s a little harsh, IMHO, to assign blame to software engineers when they’re underpaid and overworked.

Boo-hoo. Hundreds of millions of dollars, if not billions of dollars is being spent. PErhaps the money should be budgeted differently but IMO there’s an ample amount of overall funding to have adequate software engineers. Maybe there should be some consolidation in the climate model industry, as in the auto industry. If none of the models have adequate software engineering, then how about voluntarily shutting down one of the models and suggest that the resources be redeployed so that the better models are enhanced?

I’m not making this QOTW to pick on Gary Strand, though I’m sure he’ll see it that way. It is a frank and honest admission by him. I’m making it QOTW because Gary highlights a real problem that we see when we look at code coming from NASA GISS.

But don’t take my word for it, download it yourself and have a look. Take it to a software engineer at your own company and ask them what they think.

download-icon

GISS Model E global climate model source here

GISTEMP (surface temperature analysis) source  here

Sure, this is one of many climate modeling software programs out there, but it happens to be the most influential, since GISS and GISTEMP are the most widely cited outputs in the popular media.

U.S. industry seems to do a better job of software development than government programs, because in business, if something doesn’t work, or doesn’t work well, contracts get lost and/or people get fired. There’s consequences to shoddy work.

In academia, the solution is usually to ask for more grant money.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

164 Comments
Inline Feedbacks
View all comments
July 7, 2009 7:29 am

Fortran or WATFIV?

July 7, 2009 8:01 am

>>10 Rem Global Climate Model
>>20 Print “The End Is Nigh”
>>30 Goto 20
>>40 End (of planet)
This really would be the End of the World – you’ve got a loop there somewhere…. 😉
.

Kojiro Vance
July 7, 2009 9:25 am

How would one know the difference between true model predictions, and artifacts because of bad coding?
Somewhere in the code I saw that it assumes 360 days in a year, or 12 equal months of 30 days. Not a bad assumption if you are forecasting over a short range of time. But when you are trying to predict global temperatures 100 years into the future – that is a very different story.

Bobn
July 7, 2009 11:34 am

You need multiple independently coded models to alleviate the problem of coding errors.
However programs like modelE are implementations of some deep domain knowledge, and the trickiest thing is the application of that domain knowledge rather than the coding itself. Ie problems are more likely to arise because of a physics error than a coding error.
I have been programming all my life, C++, C#, COBOL too, so this FORTRAN stuff doesn’t look too alien to me. However I can’t make head or tail of the modelE code, not because I can’t follow the flow of the code, but because I don’t understand the domain knowledge (climate physics) behind it.

July 7, 2009 11:46 am

I’d thrown together a post a while back on poking a little fun at some of the base assumptions behind catastrophic AGW, and I had a couple paragraphs dedicated to why the models are pretty useless. Of course, I was just looking at the design side, not even getting into the issues of poor coding. Here’s the relevant portion:

…the sheer magnitude and complexity of the system still defies attempts to fully understand all of the interactions, leading to small errors having an ever-growing cascade effect as one tries to predict further into the future.
Put more simply, imagine working on a very complex algebra problem. The issue is that you copied one little part of the problem wrong. In your first pass, simplifying, substituting, and performing other operations, that wrong bit interacts with another element of the problem, the result of which is corrupt. Each subsequent pass through the problem causes the erroneous segment to corrupt more of the total problem as it is combined with other elements, and, by the end, your final answer is utterly wrong. This is very much what inhibits long-term forecasting on even a local scale and short time spans. Taking this up to the global scale allows vastly more chances for error, and projection years into the future gives any errors much more time to cascade to the point of rendering the entire model useless. The calculations may be perfect, but a faulty starting point ruins the entire process. Unfortunately, this isn’t the greatest difficulty. Back to the hypothetical problem, imagine that you don’t even get copies of any of the formulae you need, nor even the base problem. Instead, all you have are the base values to use and thousands, millions of answers, each coming from different base values (you don’t get told what any of them are), from which to figure out the starting equation. It gets worse. Some of those answers are wrong, and you don’t know which ones. It gets worse still. The answers are divided up into hundreds of groups, each group containing answers to certain combinations of formulae and base values, and you aren’t told what formulae or values are used or how they are combined. Nope, not done getting worse, yet. In fact, you are told that none of the groups contain answers to the actual problem you need to solve. Instead, example answers for the overall problem are supposed to be composites, somewhat like the mean, of all of one answer (the tenth, for example) from each group, but you also have to figure out how those subordinate answers get combined into the big one. In the end, you have mountains of data, an unknown amount of which is faulty, from which to determine all of the formulae, how the formulae are used to produce the subordinate answers, and how those are combined to get the main answer before you can even start working on the problem with the given starting values. Guess what? This isn’t even accounting for time lapse yet.
Someone may read that last paragraph and think, “Models on the global scale don’t have to be that precise, though; they can be more general and still get the broad picture.” Well, let’s run with the picture metaphor, shall we? I would roughly equate simplifying components of a model to reducing the number of pixels in an image. Sure, you can do it, but you loose definition. Do it with a complex, intricate picture, and you will very quickly end up with an image people misinterpret, or one from which it is impossible to discern anything at all. Sure, you can dumb down the models by making them less complex with estimations of values instead of solid formulae, but you have to admit that doing so destroys the accuracy of that broad picture. A more direct example comes from graphing a curve to fit plotted points. Reducing complexity could be seen as using fewer data points from which to extrapolate a curve. The problem with this is that, as the data points are made fewer (thereby spreading out on the graph), you may end up missing movement in the actual curve occurring between points and graphing a curve that is completely wrong. When it comes to scientific (quasi-scientific in this case, really, but that’s a topic for another day) analysis of data, generalization is bad, and detail is good.

Kojiro Vance
July 7, 2009 1:59 pm

That leads me to a more fundamental question. Are there any logical checks on climate modeling?
Drawing on my experience as a chemical engineer, because the chemical systems I modeled were non-nuclear, I was bound by the laws of conservation of mass and energy. The models always checked to see it the number of pounds, kilograms, or tons of products equalled the same measure of reactants. If the mass balance did not close, the program printed a big warning.
Do climate models bother to calculate an energy balance? It would seem to me that if you got a bogus answer of 1 x 10^20 temperature in March 2016, then the answer is “No”.
If they don’t anergy balance, then how can you have ANY confidence in the results?

R S Bridges
July 7, 2009 7:48 pm

It appears I have hit a sore spot re: “software engineers.” First of all, I don’t mean to demean the value of the work that they do. They fulfill a very necessary niche in the business world. But at the same time, unless they are engineers in the traditional sense, they must take direction from engineers or scientists to build the models. There are a lot of chemical engineers who develop software for our business (for those in my business, think ASPEN, Simsci, or other chemical engineering software modelling companies). And we would be a hell of a lot less productive without them. I shudder to think of designing a multi-component distillation column by the trial-and-error tray-by-tray method. I remember seeing a thick binder of calculations done by hand. That said, it is still the physical science that is the underlying basis for the models, not the code.
So, Squidly, I salute you, and all of your brethern! Go forth, and prosper!

July 7, 2009 7:59 pm

E.M.Smith (01:50:56) :
George Tobin (09:01:16) : I was unaware that any blame had been directed at GISS software engineers. My understanding was that they were faithfully implementing the bogus sensitivity assumptions and all the rest of that which comprises current climate modeling ideology.
George, you are laboring under the assumptions that the code was written by a software engineer and that there was a division between the designer and the coder. The code looks to me like it is largely written by folks who learned programming as a sideline or “had a class in it” not by professional programmers. (Yes, anyone can learn to program just like anyone can learn to write poetry… that does not make you a professional programmer nor a good poet…)
I’ve read GIStemp front to rear. It looks like it was written by folks who had a FORTRAN class once, years ago (I would guess some of it was written by Hansen himself decades ago…) and do not make production product for a living. It also has “layers” that look like kludge added on top of kludge over time. Some blocks are ALL CAPS and look like F77 style (and syntax). Other newer chunks are mixed case and F90 style. In short, “it just growed” is how it looks. There are exceptions.
==
OK. No disagreements with the internals of the code. I will grant your (stated) expertise in the various languages, and the ability of a (good) programmer to write (good) code in virutally language. No problems.
But … What about the need for configuration control? Is there ANY change control, change testing, testing “debug” notes or corrections?
Is there any “lost” code that could/would/might produce arbitray trash if a single thing goes wrong? For example, if we put a real value of CO2 and a real value of temperature? Does it produce a valid “radiation in” as a check? Who made the changes? Does it process “real data” if the year or earlier CO2 results and temperatures change? (In other words, if you put in values for 400 ppm CO2 in 1980, does it produce a valid temperaure as it would if the year were 2050 and the CO2 were 400? If CO2 were changed to 200, does it produce real world temperatures for say the year 1800? )
Who tested each module, if the whole thing has not been tested? how do we know the actual calculations in the repeating module are correct? Who verified its input, and what are its input variables, and what are the “coded in” variables?
When a change or update was made, who made the change? Who authorized the change? Who tested the result after the change? What was the previous coding, and what was the changed coding? Has it been” hacked” or do the results have any chance of being “hacked”? Were results as printed actually teh results from a test run, or were they merely typed in afterwards?
If this ran an ATM machine, would you sell it to your bank? Would your congressman put his money into this program if it were a ttax processing software?

Squidly
July 7, 2009 10:57 pm

Ted Clayton (00:24:40) :
Ah ha! .. Very thoughtful comments! Nice!
I completely agree with you concerning language construct and organization of FORTRAN and what that brings to numerical formula processing. Your comments are certainly very valid. I would content however, and especially when considering the enormous budgets employed, one could utilize a more modernized language (preferably a fully compiled language that produces binary code for performance optimization, ie: this would exclude Java) and provide even more robust scripting mechanisms that would allow the use of heavy mathematics in robust expressions that could then in turn be pre-compiled into the foundation language, optimized and compiled to binary codes native to the host machine. Such an approach as this will allow for the proper architectural abstractions, allow the software engineers to employ flexible and robust design concepts and patterns, allow software developers to write compartmental coding segments abstracted from adjacent logical modules, and probably most importantly, allow the “scientists” to develop robust mathematical representations of physical processes and complex interactions utilizing language, verbiage and structure that they are used to.
In this scenario, everybody wins, and result could be something that may actually stand a snowballs chance in hell of working. And even if it doesn’t work, you have developed something that can adapt as your learning and understanding culminates and expands.
IMHO…

Squidly
July 7, 2009 11:25 pm

R S Bridges (19:48:32) :

So, Squidly, I salute you, and all of your brethern! Go forth, and prosper!

In large part I am in agreement with what you are saying. Conversely the same could be said software engineer to scientist. This, after all, is a cooperative effort. There should be no dictation flowing either direction. I find there is a clear and present contrast between a “programmer” and a “software engineer”, that is the crux of what I have been discussing. To some who have said “a software engineer is not an engineer” or that “there is no such thing as a software engineer”, they are quite naive to the development of software in general. Ultimately, the process of software development is simply the translation of human process to machine process. Of this, there are many facets involved, from the engineering aspects, to the coding aspects, and many branches in between. The trick to developing successful software is proper organization and translation, this is usually not simply a task for a “coder”, as “coders” don’t usually posses the skill set necessary to adequately architect the application structure itself with all components considered.

“That said, it is still the physical science that is the underlying basis for the models, not the code.”

I disagree with this. Again, it is a cooperative effort. One relies on the success of the other. All the physical science in the world is simply garbage if not properly constructed and coded. Conversely, the best software design and coding is worthless without proper physical science instruction. So, again, to be successful, one cannot live without the other. They need to be harmonious. These are not easy or trivial tasks, but properly set forth can be accomplished.

pkatt
July 8, 2009 12:54 am

🙂 hehe webcam for the win Anthony..

July 9, 2009 8:31 pm

@Squidly,
“All the physical science in the world is simply garbage if not properly constructed and coded,.”
Not so, sir.
My engineering career spanned the period before computers were widely used, up to today. I can assure you that sound physics and engineering were performed by competent, real engineers, (in the fields of civil, mechanical, electrical, chemical, and others) without the use of coding into software and running on computers. If you doubt this, then I invite you to have a look at any building, bridge, dam, power plant, chemical plant or refinery with a cornerstone plaque or startup dated earlier than 1970.
Many times a computer simply adds more time to the task, and useless digits in insignificant places. We had (and still have) perfectly good methods for calculating and designing almost everything — and it was certainly not garbage, and certainly no coding was necessary.
The legal reality is that almost every large new software system is a failure, and results in lawsuits. While there are many reasons for this, the inability of software *engineers,* systems architects and analysts, code writers and debuggers, to deliver a working product on schedule is a major factor.
The knowledge that the software *engineers* are coming to deliver a new system causes business owners and others who contract for such systems to cringe. They know the system will fail, and lawsuits will result.
Just one (of thousands I could mention) is the computerized payroll system for the Los Angeles Unified School District in California, circa 2007. If they (the programmers) cannot do something as simple as add up hours worked, multiply by dollars per hour, perform some equally simple deductions, and print a paycheck, all that for a few thousand employees, how can they be expected to do something a bit more complicated like a climate model?

1 5 6 7