CRUTEM3 "…code did not adhere to standards one might find in professional software engineering"

Those of us who have looked at GISS and CRU code have been saying this for months. Now John Graham-Cumming has posted a statement with the UK Parliament about the quality and veracity of CRU code that has been posted, saying “they have not released everything”.

http://popfile.sourceforge.net/jgrahamc.gif

I found this line most interesting:

“I have never been a climate change skeptic and until the release of emails from UEA/CRU I had paid little attention to the science surrounding it.”

Here is his statement as can be seen at:

http://www.publications.parliament.uk/pa/cm200910/cmselect/cmsctech/memo/climatedata/uc5502.htm

=================================

Memorandum submitted by John Graham-Cumming (CRU 55)

I am writing at this late juncture regarding this matter because I have now seen that two separate pieces of written evidence to your committee mention me (without using my name) and I feel it is appropriate to provide you with some further information. I am a professional computer programmer who started programming almost 30 years ago. I have a BA in Mathematics and Computation from Oxford University and a DPhil in Computer Security also from Oxford. My entire career has been spent in computer software in the UK, US and France.

I am also a frequent blogger on science topics (my blog was recently named by The Times as one of its top 30 science blogs). Shortly after the release of emails from UEA/CRU I looked at them out of curiosity and found that there was a large amount of software along with the messages. Looking at the software itself I was surprised to see that it was of poor quality. This resulted in my appearance on BBC Newsnight criticizing the quality of the UEA/CRU code in early December 2009 (see http://news.bbc.co.uk/1/hi/programmes/newsnight/8395514.stm).

That appearance and subsequent errors I have found in both the data provided by the Met Office and the code used to process that data are referenced in two submissions. I had not previously planned to submit anything to your committee, as I felt that I had nothing relevant to say, but the two submissions which reference me warrant some clarification directly from me, the source.

I have never been a climate change skeptic and until the release of emails from UEA/CRU I had paid little attention to the science surrounding it.

In the written submission by Professor Hans von Storch and Dr. Myles R. Allen there are three paragraphs that concern me:

“3.1 An allegation aired on BBC’s “Newsnight” that software used in the production of this dataset was unreliable. It emerged on investigation that the neither of the two pieces of software produced in support of this allegation was anything to do with the HadCRUT instrumental temperature record. Newsnight have declined to answer the question of whether they were aware of this at the time their allegations were made.

3.2 A problem identified by an amateur computer analyst with estimates of average climate (not climate trends) affecting less than 1% of the HadCRUT data, mostly in Australasia, and some station identifiers being incorrect. These, it appears, were genuine issues with some of the input data (not analysis software) of HadCRUT which have been acknowledged by the Met Office and corrected. They do not affect trends estimated from the data, and hence have no bearing on conclusions regarding the detection and attribution of external influence on climate.

4. It is possible, of course, that further scrutiny will reveal more serious problems, but given the intensity of the scrutiny to date, we do not think this is particularly likely. The close correspondence between the HadCRUT data and the other two internationally recognised surface temperature datasets suggests that key conclusions, such as the unequivocal warming over the past century, are not sensitive to the analysis procedure.”

I am the ‘computer analyst’ mentioned in 3.2 who found the errors mentioned. I am also the person mentioned in 3.1 who looked at the code on Newsnight.

In paragraph 4 the authors write “It is possible, of course, that further scrutiny will reveal more serious problems, but given the intensity of the scrutiny to date, we do not think this is particularly likely.” This has turned out to be incorrect. On February 7, 2010 I emailed the Met Office to tell them that I believed that I had found a wide ranging problem in the data (and by extension the code used to generate the data) concerning error estimates surrounding the global warming trend. On February 24, 2010 the Met Office confirmed via their press office to Newsnight that I had found a genuine problem with the generation of ‘station errors’ (part of the global warming error estimate).

In the written submission by Sir Edward Acton there are two paragraphs that concern the things I have looked at:

“3.4.7 CRU has been accused of the effective, if not deliberate, falsification of findings through deployment of “substandard” computer programs and documentation. But the criticized computer programs were not used to produce CRUTEM3 data, nor were they written for third-party users. They were written for/by researchers who understand their limitations and who inspect intermediate results to identify and solve errors.

3.4.8 The different computer program used to produce the CRUTEM3 dataset has now been released by the MOHC with the support of CRU.”

My points:

1. Although the code I criticized on Newsnight was not the CRUTEM3 code the fact that the other code written at CRU was of low standard is relevant. My point on Newsnight was that it appeared that the organization writing the code did not adhere to standards one might find in professional software engineering. The code had easily identified bugs, no visible test mechanism, was not apparently under version control and was poorly documented. It would not be surprising to find that other code written at the same organization was of similar quality. And given that I subsequently found a bug in the actual CRUTEM3 code only reinforces my opinion.

2. I would urge the committee to look into whether statement 3.4.8 is accurate. The Met Office has released code for calculating CRUTEM3 but they have not released everything (for example, they have not released the code for ‘station errors’ in which I identified a wide-ranging bug, or the code for generating the error range based on the station coverage), and when they released the code they did not indicate that it was the program normally used for CRUTEM3 (as implied by 3.4.8) but stated “[the code] takes the station data files and makes gridded fields in the same way as used in CRUTEM3.” Whether

3.4.8 is accurate or not probably rests on the interpretation of “in the same way as”. My reading is that this implies that the released code is not the actual code used for CRUTEM3. It would be worrying to discover that 3.4.8 is inaccurate, but I believe it should be clarified.

I rest at your disposition for further information, or to appear personally if necessary.

John Graham-Cumming

March 2010

Advertisements

162 thoughts on “CRUTEM3 "…code did not adhere to standards one might find in professional software engineering"

  1. The computer code is the most important aspect of climate science and the least available. Why am I not surprised?

  2. “I rest at your disposition for further information, or to appear personally if necessary.”
    Careful there. They still have those river level chambers in the bottom of the Tower of London.

  3. This is very important:
    it appeared that the organization writing the code did not adhere to standards one might find in professional software engineering
    What a big surprise!

  4. All this computer code stuff baffles these ‘high level reviewers’. Why would anyone ever use version control when we all know these experts get it right on the first try. A better set of questions is why do they not have some software experts on their review board. No version control poor documentation within the code is criminal, especially with trillions of dollars hanging on every amateur bug in the code. Why has no one asked about version control for the data? Code and data go hand in hand. Amateurs one and all. Yes I have looked at the code and yes I have 45 years as a professional programmer.

  5. So Dr. Graham-Cummings has a BA in Matematics and Computation and a doctorate in Computer Security, has been writing software for 30 years but still only manages, in von Storch and Allen’s opinions, to qualify as an “amateur computer analyst”.
    Are they “amature” climate scientists”?

  6. JonesII (08:09:18) :
    “This is very important:
    it appeared that the organization writing the code did not adhere to standards one might find in professional software engineering
    What a big surprise!”
    I have to tell you in all candour that it isn’t that straightforward to get professional software engineering outfits to “adhere to standards” …..:-)

  7. Is there a UK based software developers professional association, or similar group?
    I would think that if so they should be interested in commenting on the quality of the code and limitations of the software. If silent they are by implication endorsing the shoddy coding practices (no version control, poor documentation etc.) in code which governments depend on for major policy decisions.
    Same question applies for other national, and international software developer organizations and related specialties.
    The professional associations of librarians and archivists, should also be commenting on lack of proper archiving of materials and versions. If someone wanted to write a forensic history of climate model coding practices used in the late 20th century and early 21st century they would be trying to pull information out of a black hole.
    Larry

  8. As AJ Strata pointed out, you never let the PhDs write the code if the project is important. People get hurt. If it’s important, you get people who actually have a clue what they are doing to write the code.

  9. “I am a professional computer programmer who started programming almost 30 years ago. I have a BA in Mathematics and Computation from Oxford University and a DPhil in Computer Security also from Oxford. My entire career has been spent in computer software in the UK, US and France.” Amateur, a true professional would work for the Government or a University. /sarc

  10. This is why climategate is so important.Clever people like this guy are taking an intrest and ripping the warmists to shreds.Wonderful!

  11. So… to be a “climatologist” one requires knowledge of:
    1. Computer programming
    2. Chemistry
    3. Physics
    4. Statistics
    However, failure to have knowledge of or skills in some or any of these disciplines is easily overlooked if you have a “green” attitude and mindset, and are capable of writing suitably alarming predictions. Also the ability to procure grants and scare children is a genuine asset.

  12. Why not bring in professional software auditors to do a thorough review of the CRU software methodology and conformance to that methodology? If the CRU software is not even under version control, that pathetic beyond belief.

  13. To the extent that the code was written by grad students, I would not expect professional quality code. You should see some of the code written by Professors sometime too. My wife works processing Voyager data at MIT among other projects. If you want a strange and obscure code written over a couple of decades, then go look at that set of programming. But the work is open and available in the scientific tradition.
    But the code must be available to see if it is valid. It can be written in a way that is not up to “current standards”, but implements the correct stuff.
    I give CRU a little slack here bad programming style. But none for their failure to release it so others can see what they did.

  14. Surprise.
    I read the code and also saw problems. What shall we do? Jones really doesn’t want this out either.
    “It is standard for scientists to manipulate and hide their code”

  15. Now I have to wonder if there is software at USHCN or GHCN that stomps over real data and punches artificial holes in it where no holes should be.
    Or maybe it is scrambling the data, or it could be the overuse of tape media that left corruption in the stored data.
    I’d like to get with you, E.M.Smith, if you are about. I have some very interesting anomalies in the data that you’d be good at figuring out the pattern to.

  16. Professor Acton seems to be quite economical with verifiable and accurate statements of fact! He is quite adept at digging deeper holes, though.

  17. I suspect John will get a very polite reply, something along the lines of: “While the subject matter at hand is of the utmost importance to the very survival of the human race, we find the your submission was post marked after the final date for submissions and therefore cannot be considered.”

  18. “The computer code is the most important aspect of climate science and the least available.”
    It could be worse. What if the code that we have already is all that is available?

  19. I invented teh Internets and teh computer programs mathematical models as well.
    Fiddle the figures
    In tune with Al Gore
    Now you got
    The Al Gore Rhythm

  20. The denigration, minimization, name calling continues with “…amateur computer analyst…” and “…they don not affect the trends…” Quelle surprise!
    “3.2 A problem identified by an amateur computer analyst with estimates of average climate (not climate trends) affecting less than 1% of the HadCRUT data, mostly in Australasia, and some station identifiers being incorrect. These, it appears, were genuine issues with some of the input data (not analysis software) of HadCRUT which have been acknowledged by the Met Office and corrected. They do not affect trends estimated from the data, and hence have no bearing on conclusions regarding the detection and attribution of external influence on climate.”

  21. Great contribution, but wake me up from this nightmare, please:
    All this hype, destroying economies, scaring kids and even killing people by fear, called global warming is based on
    – poor weather stations readings, cherry-picked
    – enhanced by irreproduceable software, shown here
    – compared with unreliable proxies, cherry-picked again
    – blaming a trace gas, as only factor, as Schellnhuber’s “linear correlation”
    – wrongly mixed with inappropriate statistic methods, as MM have shown
    – stonewalled against any checking, as Jones thinks is normal
    – arrogantly called this the true and only science by ‘realclimate scientists’
    – paid for with a Nobel Peace Price and billions of taxpayers’ money
    – brought up as top priority by whatever kind of politicians
    You couldn’t invent it.

  22. I am a professional computer programmer who started programming almost 30 years ago. John Graham-Cumming
    3.2 A problem identified by an amateur computer analyst
    by Professor Hans von Storch and Dr. Myles R. Allen
    WUWT?

  23. Since Climate Science (one of the youngest branches of science) is all about projecting the future trends for the Earth’s Climate- computer modeling is their primary tool.
    It is quite worrying that there is no standardization for the computer code. Nor is there any review or testing process of the code and data sets used for the results.
    Computer science (which is also a young science) has extensive requirements for code and data reliability. Lack of following even the basic standards is negligence, bordering on criminal intent to defraud.

  24. The code is substandard? File this under no [fooling] Sherlock!
    Anybody who works in software development knows there are open-source tools available for source control and testing. Some very good databases are also open-source. Surely CRU can afford to use open-source software? In many cases, it’s free!

  25. The climatescience process really is from normalscience, isn’t it!
    Conclusions
    Results
    Method
    Apparatus
    Thanks to J C-R and others, hopefully soon we will be able to get to the apparatus.
    And, I like the idea of Acton being asked by Parliament to verify his statement that the actual CRUTEM3 production code has been released. When you think about it, IF the answer proves to be negative then the consequence would be enormous. In other words, this could turn out to be ‘match point’
    So, can anybody comment on the status of the CRUTEM3 code?

  26. I wonder what it takes to be more than “an amateur computer analyst”? If a doctorate and career in the field aren’t enough to qualify one as more than “an amateur computer analyst” then what is required?
    While it is a minor point, it does not reflect well on the competence of the authors.

  27. I worked as a programmer for a physics research group as an undergraduate and in industry before going to grad school in the 1980’s. It never occurred to the physicists to publish or make original code or even the data available. If someone wanted to replicate their work they would do the experiment themselves and write their own code. The papers described the mathematical methods used so that anyone could write a program to do the number crunching. In industry I often had to deal with very poorly documented code. The standards in industry have likely improved, but a lot a valid work was done in the bad old days.
    Since climatology is such a hot area, I too would like to see greater openness. This will cost more money. Granting agencies will need to provide budget lines for professional programmers (rather than cheap students) and web servers for data archives. But, there is no basis for demonizing researchers who have been following the standards in their field or ignoring past work. If you want to redo someone’s chemistry experiment should you be able to demand the use of their test tubes? Maybe some day chemists will be required to archive all their old test tubes. But, that won’t invalidate all the chemistry that was done before.

  28. Bernie’s Iron Rules of Survival in Modern Organizations
    1. Do not lie to auditors.
    2. Do not lie about programmers.
    3. Never put in an email that which can be said with a wink.

  29. I am the ‘computer analyst’ mentioned in 3.2 . . .

    Dr. Graham-Cumming diplomatically leaves out the word ‘amateur’ with which Professor Hans von Storch and Dr. Myles R. Allen denigrated his analysis. ‘Nuff said.
    /Mr Lynn

  30. Finding errors is a good thing because errors can be corrected. But implying errors affect conclusions, when they don’t, is not a good thing.

  31. John in L du B (08:30:44) :
    So Dr. Graham-Cummings has a BA in Matematics and Computation and a doctorate in Computer Security, has been writing software for 30 years but still only manages, in von Storch and Allen’s opinions, to qualify as an “amateur computer analyst”.
    Are they “amature” climate scientists”?
    ****************************************************
    You know the drill. If you don’t have the facts on your side, shoot the messenger. The only problem is that as the bodies accumulate, so do those who look for the facts to verify the claims. That’s the stage we’re in now. For years, the McIntyres and Cummings would get lumped with the crackpot oil-bought “denier’ crowd. There are too many McIntyres and Cummings now for that method to work. As far as I can tell, there is no Plan B, and this whole affair called AGW is working its way toward an ugly end.

  32. “John Galt (09:06:48) :
    Anybody who works in software development knows there are open-source tools available for source control and testing. Some very good databases are also open-source. Surely CRU can afford to use open-source software? In many cases, it’s free!”
    I am quite sure that CRU do not grasp the concept of open-source.

  33. “Professor Slingo: … We test the code twice a day every day. We also share our code with the academic sector, so the model that we use for our climate prediction work and our weather forecasts, the unified model, is given out to academic institutions around the UK, and increasingly we licence it to several international met services: Australia, South Africa, South Korea and India. So these codes are being tested day in, day out, by a wide variety of users and I consider that to be an extremely important job that we do because that is how we find errors in our codes, and actually it is how we advance the science that goes into our codes as well. So of course, a code that is hundreds of thousands of lines long undoubtedly has a coding error in it somewhere, and we hope that through this process we will discover it. Most of the major testing is very robust.”
    This statement indicates that Slingo knows almost nothing about software. The more salient questions are: how must test automation coverage do you have, how do you maintain your test cases, do you do unit testing upon check-in, what’s your procedure for regression testing, do you perform independent code reviews etc.
    “We test the code twice a day every day” — this doesn’t make any sense. It sounds like something she just made up on the spur of the moment.

  34. I’d like to recommend this version control software from Perforce Software Inc
    http://uk.news.yahoo.com/22/20100303/ttc-uk-china-google-fe50bdd.html

    The hackers targeted a small number of employees who controlled source code management systems, which handle the myriad changes that developers make as they write software,

    He said the common link in several of the cases that McAfee reviewed is that the hackers used source code management software from privately held Perforce Software Inc, whose customers include Google and many other large corporations.

  35. “But, there is no basis for demonizing researchers who have been following the standards in their field or ignoring past work.”
    Mike, thats true but by the same token if the standards (of computer coding out of which are born the models) are rather low in this field, should we take their work as good enough to be the basis of policies which could well cause considerable lifestyle changes not to mention rather more taxes?

  36. Perhaps we should set up a project on SourceForge called “CRUTEMP” and help them along a bit! Although funny, I think we should seriously consider it.
    We could start with the 160Mb of code and data that we already have (courtesy of our “insider”). We could even put a wiki together – to help the undergrad’s get up to speed.
    Although I think we should only allow amateur admin’s. No Dr’s or post grad’s should be allowed access to the delete button. It’s a trust thing.

  37. max (09:19:50) :
    “I wonder what it takes to be more than “an amateur computer analyst”? If a doctorate and career in the field aren’t enough to qualify one as more than “an amateur computer analyst” then what is required?”
    What is required? That’s very simple: agree with them. That’s all. Agree with AGW and you’re a professional. Disagree and you’re an amateur, denialist, climate criminal, contrarian, etc.

  38. I bet if we asked any of these CRU people what “cyclomatic complexity testing” was done – they would reply that “we don’t test for cyclones in our software”.

  39. “Mr Lynn (09:31:39) :
    I am the ‘computer analyst’ mentioned in 3.2 . . .
    Dr. Graham-Cumming diplomatically leaves out the word ‘amateur’ with which Professor Hans von Storch and Dr. Myles R. Allen denigrated his analysis. ‘Nuff said.”
    There’s a world of difference between diligent analysis carried out by an amateur and the amateurish practice of state-funded professionals we’ve seen at CRU.

  40. This isn’t “post-normal” science – it is abnormal science. I never liked that “post-normal” BS anyway.

  41. Seems there was no QA, no source control, no independent auditing; all of which one would expect in any commercial environment of any size. I spent 25 years in IT, in a company which grew from small to huge. When I started, we programmed by the seat of our pants, documentation was scarce, commenting scarce, any testing done, we did, and maybe the customer when they went live!
    By the time I left, we had full and comprehensive inter-office (global) source code control, QA controlled documentation, a requirement for ISO9001 accreditation, and regular visits from an independent auditing company.
    Frankly, what went on at UAE would be laughable were it not so appalling.

  42. Mike (09:27:15) :
    I worked as a programmer for a physics research group as an undergraduate and in industry before going to grad school in the 1980’s. It never occurred to the physicists to publish or make original code or even the data available. If someone wanted to replicate their work they would do the experiment themselves and write their own code. The papers described the mathematical methods used so that anyone could write a program to do the number crunching. In industry I often had to deal with very poorly documented code. The standards in industry have likely improved, but a lot a valid work was done in the bad old days.
    Since climatology is such a hot area, I too would like to see greater openness. This will cost more money. Granting agencies will need to provide budget lines for professional programmers (rather than cheap students) and web servers for data archives. But, there is no basis for demonizing researchers who have been following the standards in their field or ignoring past work. If you want to redo someone’s chemistry experiment should you be able to demand the use of their test tubes? Maybe some day chemists will be required to archive all their old test tubes. But, that won’t invalidate all the chemistry that was done before.
    ========
    I would prefer people go over my work before I present it as finished. If I show everything as I progress, it would give people the opportunity to be helpful. But I doubt transparency would put an end to FOI requests.

  43. I wonder whether Professor Hans von Storch and Dr. Myles R. Allen bothered to try to find out anything about the person they called an “amateur computer analyst”. And whether they are always so pedantic in their work.

  44. Mike (09:27:15) :
    This is different.
    To rerun the experiment you need the data and the code.
    In a physical experiment, the experiment is described, the methods described, and then others may perform the same steps. If lots of others perform the same experiment and get different results, then the original result may not be valid for some reason not described in the process.
    I recently read a book about 20th century astronomy. It described the process by which astronomers shared their plates (photos). The astronomers replicated each others work by inspecting the exposed plates. They shared the exact data that was used to produce the results.
    In another post, someone suggested that if people didn’t agree with a certain finding, they could just go to Antartica and drill their own cores. That’s all fine and good if you do the research on your own nickel, but if the US or British government is funding the research, then not just the results, but the data, the code and the results belong to all of us. We should be able to get as much mileage as possible out of every research project we fund, including results by “amateurs”.
    Lastly, in this day and age it’s cheap and easy to publish all the data and code. Some of the blogs referenced from WUWT do that every day.

  45. Mike (09:27:15) :
    (…)
    “If you want to redo someone’s chemistry experiment should you be able to demand the use of their test tubes? Maybe some day chemists will be required to archive all their old test tubes. But, that won’t invalidate all the chemistry that was done before.”
    ———
    Reply:
    We don’t need their computers (aka “test tubes”); others are available. What we want is their UNadjusted raw data and their final numbers. Then we’ll see how they tortured the data. If the new results are different than “climate science” results, we’ll certainly show them (and the courts) our algorithms!
    But will “climate scientists”, barring some subpoena, ever show us their algorithms?
    No worries… subpoenas are on the way.

  46. In light of all that has been revealed about the fallacy of AGW and experience with alternative energy sources in the EU, and Obama is still preaching wind turbines and solar cells!! I’d pull my hair out if I had any!!

  47. Graham-Cumming is such an ‘amateur’, with those degrees and real world experience. I agree (having architected, designed and managed large software systems for NASA and the DoD) the code we have seen is garbage. And it is clear in testimony the code used to generate CRUs products HAS NOT been released. When it is, with the data, it MUST produce the same numbers as before.
    It won’t of course, because there is no way that buggy code could replicate an error message.

  48. I note that the Met office have stated that this year will be the 5th warmest on record. This is interesting considering the northern hemispheres winter.
    However they qualify their claim by adding in small print, “the warmest taking into account:
    El Nino
    La Nina
    Increase greenhouse gas concentrations
    the cooling influence of aerosol particles
    Solar effects
    Volcanic cooling ( if Known)
    and natural variation in the oceans.
    And in conjunction with the University of East Anglia.
    Well there we are, all facts piled into a questionable computer model which generates the headline.
    This is all highly amusing and very sweet in a strange way, but I suspect in reality the behaviour of my cat is about as valid a predictor as the above.

  49. Mike @9:27,
    depends on what physics (or other) you are doing. When it is practical to reproduce the experiment it is expected that people wishing to reproduce the work will do so and it serves as good check against experimental error, but information about how the experiment was conducted must be provided so that the experiment can be replicated. When it is impractical to reproduce the experiment (a need to book the CERN supercollider to run the experiment, for example) it is expected that the raw data of the experiment it is based upon will be made available to those wishing to attempt reproduce your results. When the “experiment” starts with large and multiple data sets and consists of drawing conclusions from the results of manipulating some of those data sets (climate science), it is incumbent upon the “experimenter” to identify the data sets used and the manipulations to them so that the “experiment” can be replicated.

  50. wind turbines and solar cells are costly and impractical for most countries.
    the code was ‘..unreliable’, hmm now why does that not surprise me?

  51. hmm
    having written code myself in an academic environment (a well respected research university that shall remain unnamed) I am not a bit surprised that the quality of software engineering is poor… nor should anybody else be.
    I was in an applied mathematics department modeling very complex but much better understood (than climate science) physics problems. Basically the complex and long term code is modified and maintained by a bunch of undergrads and first yr grad students who were studying applied mathematics and NOT software engineering and who spend a semester or a year on it and then move on. Poor quality code ensued.
    “Climate science” being even much further from engineering than applied mathematics it is only to be expected that there would be significant issues with the code at UEA.
    This is not a scandal, it is just the way it is with code in academia. BUT is another good reason to be very skeptical of anybody who sells the results of their academic computer models as being definitive.

  52. Here’s the logical problem with the Dr Jones testimony where he claimed it was not ‘standard practice’ to release data and computer models so other scientists could check and challenge research.
    There are three pieces to the work of Dr Jones.
    First, we know and he argues that almost all the weather data is posted all over the world on the Internet so that anyone can get at it. Second, he argues that his papers contain all of the methodology that demonstrates that his number crunching code is valid. Sandwiched in the middle is the third fact that the code itself has not been made publicly available. But we all know how easy it is to put that code up and make it available just like the other data and papers are.
    So there is an established precedent that exists for two items, to wit, the publishing of data and the papers. And, since we all know that they seem to be able to afford to put that information on the Internet, there is simply NO valid reason to hide the code (by not publishing it) unless Dr Jones is hiding something. Enough with the copyright and proprietary ownership red herrings.
    Put the three data sets out there and let the chips fall where they may, Dr Jones.

  53. Mike (09:27:15) :
    I worked as a programmer for a physics research group as an undergraduate and in industry before going to grad school in the 1980’s. It never occurred to the physicists to publish or make original code or even the data available. If someone wanted to replicate their work they would do the experiment themselves and write their own code. The papers described the mathematical methods used so that anyone could write a program to do the number crunching. In industry I often had to deal with very poorly documented code. The standards in industry have likely improved, but a lot a valid work was done in the bad old days.
    Since climatology is such a hot area, I too would like to see greater openness. This will cost more money. Granting agencies will need to provide budget lines for professional programmers (rather than cheap students) and web servers for data archives. But, there is no basis for demonizing researchers who have been following the standards in their field or ignoring past work. If you want to redo someone’s chemistry experiment should you be able to demand the use of their test tubes? Maybe some day chemists will be required to archive all their old test tubes. But, that won’t invalidate all the chemistry that was done before.
    Mike, would you say that if your life depended on it.

  54. Dr Graham-Cumming has opened a can of worms here. I’m not referring specifically to the bugs in the code, but the complete lack of even the most basic code quality procedures.
    Having worked in commercial applications program development for 30 years, I can only reiterate the disbelief that others have mentioned. In the institutions I have worked, (and they are many), all program code must go through a rigourous process of unit testing and an audit log kept of all bugs and fixes, and the test plans filed away for future reference. The same goes for system testing. This way, if a bug is found, you can check the test plan to see if a particular logic pathway has actually been tested, or tested with all relevant conditions. However, once the code “goes live” the process is tightened up even more.
    While pre-live, some corners may be cut in documenting fixes, post-live, every change is logged in stone. The source code is usually kept under some kind of development management system so that it can’t be checked out by more than one programmer at the same time. Any fixes must have an audit trail that ties it to a documented problem. Another test plan is created to test the fix, which is documented in the code and cross referenced to the original problem document. Once the fixed program is re-released live, it becomes the new version. This is the basic version control which Dr Graham-Cumming says was missing. There would normally be an archive for each version of the program code. This is the only way you can deal with unexpected problems caused by adding the fixes themselves and allows you to go back to earlier versions to see what effect these fixes have had on the code.
    The situation here seems to be closer to what one would expect from a home coder with no controls or quality management at all.

  55. “In light of all that has been revealed about the fallacy of AGW and experience with alternative energy sources in the EU, and Obama is still preaching wind turbines and solar cells!!”
    FAIL

  56. John Graham-Cumming,
    I’ve been a professional programmer in the industrial control sector for almost 30 years. I don’t know if the terminology is different between the US and UK, but over here a piece of squirrely code that nets you a couple billion in government dollars is a feature, not a bug. 😉
    Of course, all this code should just boil down to Tglobal = f(CO2), where
    f(CO2) is a fairly simple formula or a look-up table. Even a bad programmer should be able to pound it out in an hour, but instead everyone wants to milk the problem for a couple of decades.

  57. Great job by Dr. Graham-Cumming.
    As someone who spent much of a 30-year career doing software engineering on a in-house data system project (>100,000 lines of 4GL code), what most jumped out at me from thread start was this:
    ”The code had easily identified bugs, no visible test mechanism, was not apparently under version control and was poorly documented.”
    Easy ID bugs, NO test system, poor documentation, and NO VERSION CONTROL ?!?!
    For a software system of any size and complexity that undergoes many revisions by multiple programmers over time:
    Even if all you know about it are these 4 deficiencies, you can pretty much conclude that a lot of it is junk spaghetti-code; and that the results you get from it are largely trash.
    Yup: Starting to feel like ”ClimateGate: The Sequel”. . . .

  58. I will be very curious where this branch of climate science in the area of software leads. John Graham-Cumming says he has almost 30 years of experience in software, I have written software since 1976. Anyone who has spent that amount of time in the intricacies of proper software generally know what they are speaking, degree or not. However, there are multiple branches in software science and design that immediately bring in certain flavors of expertise; you can no longer know it all.
    For many times, the most readable and beautiful code is not necessarily the best software. It usually hinges in the area of efficiency or speed. If you don’t care how slow the software runs, object oriented design will usually produce the most readable and logical code but I have seen it cost a 10, 100, or ever 1000 times slowdown. However, if huge amounts of data are involved, especially when the data’s influence spreads across many sub-areas, structured and logical design tend to no longer apply, you get into the area of spatial integrity and locality. This mainly has to do with the internal layout of memory within the computer and jumps directly into multiple-layer cache layouts and pipeline lengths and stalls.
    Inside purely efficient code, usually even the base equations are broken into multiple pieces, especially when integrals are concerned, when performing numeric analysis. A long equation will deal with one mathematical operation across the arrays at a time. This can cause the software to look rather bizarre but it doesn’t necessary mean the software is sub-par quality or blatantly bad. I hope these software specialists as John Graham-Cumming are truly trained in this area of expertise, for those used to software in business and financial are going to get a rude shock when jumping across to scientific software design, they are two different worlds.
    Still there are times when bad design is just bad design and bad coding is just bad coding and anytime logic flaws are embedded into the code, it is bad. Error flow is usually one of the first areas to look for purely bad designs.

  59. TerryBixler (08:22:24) : Why has no one asked about version control for the data? Code and data go hand in hand.
    I’ve wondered the same thing, most of their data is stored in flat-files, ascii format files, it is like they designed it to use version control then didn’t do it. The metadata during check-ins would have been so insightful, almost makes one think they purposely obfuscated their methods.

  60. As a programmer that is also what surprised me the most. Not just the quality, but the techniques, and the faith in the accuracy of them.
    I’m still not sure I believe they didn’t use version control. In the back of my mind I think they have to say that in order not to release the code (and more, so that no one would see the versions — it would probably be easy to show manipulation with that). If that is true though, then it shows they know almost nothing about software, and only slightly more about computers.

  61. @ Rob uk (10:47:09)
    There is a fundamental difference between the examples you cite and the climate science case.
    In chemistry experiments are well documented. To reproduce another’s work don’t need their test tubes but you do need to know the precise series of steps they took.
    In general climate science has failed to provide lab notes. In the case of a published temperature reconstruction they are not providing sufficient detail of how the reconstruction was made for it to be duplicated. Basic things are unreported:
    what measuring stations were used
    How is temp assigned to areas with no stations
    How is actual temp measurement data adjusted and why
    These are the basic inputs required to duplicate a study, these are lab notes not test tubes.

  62. To: Larry Geiger 10:14:22
    I strongly support your position. Indeed, the funding source is very important, and the funding source is tax money. I am surprised that scientists involved in Climategate don’t acknowledge that and behave in nasty and arrogant way. I have never encountered in my practice and neither do I know any of my colleagues who were ever asked for a code or experimental data. Generally, public is either unaware that they could or isn’t interested. This is despite the fact that billions are spent for LHC, modeling of comet impacts with Jupiter satellites and Earth, string theories, and other science that I would call “entertainment” science suitable mostly for Discovery or History channels. And, arguably and only to my taste, relatively little is spent to support research that are of practical value for public. However, in the climategate case when the international policies making was attempted on the base of publically funded science and these policies will obviously cause tax and unemployment increase and standard of living decrease, the science must be scrutinized and it must satisfy the standards of majority of people. In the mean time all some group of climatologists separated us, the people, into “real scientists”, “real science supporters” and “ignorant trolls” and continue belittling the latter group despite the fact that it seems to be growing majority contributing the most to their well funded (by public) research.

  63. What Eric (10:32:52) : and Mike (09:27:15) : said. It looks like academia code. If your business was dependent on the results or sales of such, you would have code standards in place, version control, business modeling, and test environments. The sell the results and reproducability or accuracy was not their goal. The code just proves what we have always assumed.
    I would not be surprised if this were found somewhere in the unreleased code.
    x=0;
    x=0; //just in case

  64. ANTHONY, very important
    John Graham-Cumming (08:20:24) :
    Your headline is slightly inaccurate because I don’t say that the code for CRUTEM3 is ‘below standards’, I was referring to other code at CRU.

  65. The fact of the matter is that if you are developing software for, say, a medical environment, or flight control systems, or engineering design tools, or industrial process control, or any of a thousand other disciplines (even automotive powertrain control) a bug in the code (or any of the code libraries and objects underlying that code) can kill. In thousands of other applications, bugs can result in mechanical failure, excessive operating cost, legal compliance problems, Mars landers missing the planet, etc.
    So at an industrial level, there’s a very substantial incentive to get it right.
    These scientists are amateurs. For the most part, they plink around trying to produce results, and when they get something that looks like what they want to see they call it finished. There’s no formal test plan, no test data sets to evaluate a full range of possible inputs to the model, no process to ensure that the evaluation is done by properly skilled individuals. Peer review is worthless in this environment because most of the peers are no more structured in their approach to software than the original authors.

  66. I don’t know where these FORTRAN DINASOURS are coming from.
    Heck, I was raised with crappy old FORTRAN too.
    I’ve now processed datafiles with EXCEL which have 10,000,000 elements in them.
    Totally TRANSPARENT on the math formuli.
    But, I suppose if you are living in 1975…FORTRAN sounds pretty good.
    Max

  67. AJStrata (10:20:02) :
    “… having architected, designed and managed large software systems…”
    OT, but isn’t “architected” and “designed” in the same clause redundant, and infact isn’t “architected” a barbarism? 😉

  68. Code, shmode! With all due respect for John Graham-Cumming, (and I’m sure he will agree with this) the fallacy of the whole premise is NOT in the code, good, bad or in-between. It’s in the logic! And NOT the logic behind the code, but the logic (or the illogic if you prefer) supporting the entire SYSTEM! You have AVERAGED, partial (weather!) data, haphazardly recorded, GENERALLY ONCE A DAY, at a miniscule number of points that varied over time within an ANALOG, CUBIC system (global ocean/atmosphere), with no controlled standardization of either equipment or placement, with said equipment never having been design for the intended purpose and virtually no understanding of the system as whole to start with, sub-systems whose effect % we do not understand (not to mention the still to be identified sub-sub-systems or the partial/complete ignoring thereof), no accurate history (data wise) of one time events or again the effect % OR the “effect time frame” of those events…… etc.?!! Even satellite temperature data is at best marginal and so infinitesimally short in length that it’ll be 250+ years before it might have any meaning.
    And we’re arguing about code?? How unbelievably arrogant and disingenuous for any “scientist” (or anybody else for that matter) to claim their model/system/code produces any kind of meaningful results, much less accurate projection. Analyze the code till your blue in the face (because of the 40km of ice over your head?) but it’ll mean nothing until you can comprehensively systematize the WHOLE system. YOU CAN NOT GAIN UNDERSTANDING OF THE SYSTEM IN THE COMPUTER LAB! The most elegantly written and documented code is meaningless if it doesn’t accurately model the system OR if the DATA is GARBAGE! And I’m not even going to say it….
    Now, go back to work. And if you come into my office again with an incomplete system, based on such illogical analysis, I’ll either fire you or bust you back down to Jr. programmer or maybe even computer operator! Oh, not you John. You’re doing fine work.

  69. mpaul (09:48:33) :
    “This statement indicates that Slingo knows almost nothing about software.”
    As someone who was for fifteen years a professional systems analyst and programmer (albeit in the commercial sector), this is one of the few things here I can pass an even half-informed comment on.
    Running software however often you like isn’t what tests it per se. You analyse, design, code, test and maybe pilot a system before you let it go live, and subsequently, it gets tested by your users, who will be over you like a rash as soon as they pick up on a problem. You damn well better fix it PDQ, especially if it’s mission critical.
    All of us here in the UK are the Met’s end users. We’ve been testing their warmed-over soup for a while now, and found it severely wanting. But have they fixed it? Nah. They have this disconnect between modeller’s fantasy land and reality.
    I’m sure if they keep on predicting warm summers and mild winters, they’ll get it at least half right the odd year, but I’m not holding my breath. Their shorter-term predictions aren’t too hot (or maybe they are!) either. No one expects them to be perfect, but if even the BBC is contemplating kicking them into touch, things must be really bad.
    Slingo’s brass neck and spin is breathtaking.

  70. First, we know and he argues that almost all the weather data is posted all over the world on the Internet so that anyone can get at it.>>
    There is a world of difference between “the data I used is in the public domain” and “this is the data I used”. Stating that the data he used is on the internet is meaningless, because it provides no guarantee that what he used was the same, or a subset or a superset or so on. Similarly, publishing the methodology is also meaningless, because a review of the code is required to determine if the methodology is implemented as described. All three are required or the end result is just an opinion.
    As for code quality and revision control… There is no doubt that much of the code written for research is buggy and poorly documented. The same is true for a remarkable amount code written within an enterprise for specific purposes. Anyone who dealt with a Y2K mitigation project knows what a nightmare the legacy code was in terms of documentation and revision control and that was for both private enterprise and public enterprise applications. However, code for commercial purposes is an entirely different matter. Companies like Sun, Oracle, Intel, SAP and on and on have rigorous version control, in depth regression testing, and can reproduce the exact code version and the exact dataset tested against at the drop of a dime. The reason is there are substantial financial consequences to them if they cannot do this. The finanancial consequences of the decisions to be derived from this particular combination of code and data demand a standard even higher.

  71. re: wayne (11:00:11)
    I am going to add one other aspect to my previous comment.
    However, it would be good to have the software used by climate science to be re-written in a very common and forgiving language as C-sharp or Java, freely available, in purely OO design and ignoring any efficiencies, so all programmers can understand what exactly is happening, no matter their expertise. Efficiency could be added by parallel implementations when speed plays in. That would add hugely to the insight and clarity into climate science itself, and in fact, seems the best way to move this science along at its highest speed of development. A common, plug-in type design to, albeit slow at run-time, test basic core logic used to compute these temperatures.
    Yes, that would be good!

  72. Mike (09:27:15)
    “I worked as a programmer for a physics research group as an undergraduate and in industry before going to grad school in the 1980’s. It never occurred to the physicists to publish or make original code or even the data available. If someone wanted to replicate their work they would do the experiment themselves and write their own code. The papers described the mathematical methods used so that anyone could write a program to do the number crunching. In industry I often had to deal with very poorly documented code. The standards in industry have likely improved, but a lot a valid work was done in the bad old days. ”
    If you want the original data for Principia Mathematica it’s available, same for Copernicus, Galileo, Faraday, Maxwell, Einstein, and all the other physicists. Nobody expected to be taken seriously if they couldn’t produce the data and methods by which they came to there results. It explains why the IOP submission was so damning, it’s plainly not science to produce a result that can’t be challenged and tested.

  73. …and in addition to strong version control and regression testing and documentation, may I bring up one more point that IT professionals should be harping on.
    Any IT shop should be able to reproduce the exact state of the data on any given day. In this context, “data” means EVERYTHING. Datasets, code bases, e-mails, everything that existed on THAT day. Standard best practices require a weekly full backup and daily incrementals written to tape and stored off site. If an email, or data file, or version of the code existed on a specific day 3 years ago, it should be possible to restore it from backup. The only thing that should escape these best practices is a file etc that was created and destroyed on the same day. Everything else should be there.

  74. John Graham-Cumming (08:20:24) :
    Sir: Most of us reading and posting here lean toward being skeptics and not just of the catastrophic effects of climate change. We have learned to treat Anthony’s headlines as we do most everything posted here. And some of us have programming experience – I started with FORTRAN II-D in 1965. Headlines and press releases seem to be a problem and we have learned, with prodding from Anthony, to ask to see the data, the code, the actual article and so. We do not trust climate scientists with peer review and we would rather a person’s work speaks for itself than trust the letters after the name.
    While being skeptical, nevertheless I was quite in agreement with your point #1, namely . . .
    Although the code I criticized on Newsnight was not the CRUTEM3 code the fact that the other code written at CRU was of low standard is relevant.

    Thanks for being public with this. As you now know, not everyone is willing to step forward.
    John

  75. Mike — If you want to redo someone’s chemistry experiment should you be able to demand the use of their test tubes?
    If this is being used by politicians as the basis to extract an extra 10% of my earnings or make me pay an extra $3 a gallon for fuel, then YES.
    Do you not have a concept of what’s at stake here, or why extra diligence might be required for trillion dollar decisions?

  76. Mr Lynn (09:31:39) :
    I am the ‘computer analyst’ mentioned in 3.2 . . .
    Dr. Graham-Cumming diplomatically leaves out the word ‘amateur’ with which Professor Hans von Storch and Dr. Myles R. Allen denigrated his analysis. ‘Nuff said.

    Maybe the word is returning its original meaning, which would rather be a compliment than denigration.
    Amateur: through French, from the Latin ‘amator’: lover, devoted friend, devotee, enthusiastic pursuer of an objective.
    I seem to recall Anthony was called ‘amateur’ several times, too.

  77. Mr Lynn (09:31:39) :
    I am the ‘computer analyst’ mentioned in 3.2 . . .
    Dr. Graham-Cumming diplomatically leaves out the word ‘amateur’ with which Professor Hans von Storch and Dr. Myles R. Allen denigrated his analysis. ‘Nuff said.

    He wasn’t just being diplomatic, he was playing high-level gamesmanship. (“My fast to your slow and vice versa.”) Well played.

  78. “for example, they have not released the code for ’station errors’ in which I identified a wide-ranging bug, or the code for generating the error range based on the station coverage”

    This bug, does it have a name? I don’t fancy myself an amateur computer analyst, but I do recognize it from many other areas of scientific study, where computer models are replacing lab work. The “bug” works like this:

    “‘to program a computer with the same assumptions used to interpret observations and to generate features similar to the observations'”**

    So is it the Jack Horner bug, by chance?
    **”Quasars: Massive or Charged?”
    thunderbolts.info

  79. In evolutionary biology, a single organ may serve multiple functions. Conversely, a single function may derive from different organs.
    In science, one of the most valuable verification techniques is to address a given hypothesis from different angles: Different data, different experiments, different analytical techniques. Replicating original results by such means inherently renders a hypothesis “robust” in the sense that varying accounts converge.
    Generally speaking, “there is no One Best Way.” But there most certainly are invalid, misleading, even self-deceptive ways of doing “science”: Blondlot’s N-rays, transparent junk blithely asserted by such as J.B. Rhine, Immanuel Velikhovsky, Trofim Lysenko, come to mind. Beware Thomas Kuhn’s “Nature of Scientific Revolutions”, for Kuhn fundamentally misreads the enterprise– Einstein did not render classical mechanics obsolete, but stood on Newton’s “giant shoulders” as ’twere.
    “Climate science” is a non-empirical discipline, a classification scheme akin to botany that deals in hindsight only. By mathematical and physical principle one can neither extrapolate complex dynamic systems nor assert a “runaway Greenhouse Effect” in terms of a global atmosphere engaging thermodynamic principles of entropy. (Earth’s minimal CO2 effect undergoes cyclical variations, necessarily stable over time.) In sum, though poor coding is not in and of itself a killer, dismissing lazy and malfeasant programming as immaterial invalidates empirical conclusions. If Climate Cultists do not know this, they simply are not scientists at all.

  80. John Graham-Cumming has a BA and DPhil in Maths and Computing from Oxford. Professor Jones read Geography at Lancaster.
    Here’s a working hypothesis. Most scientists who advocate AGW didn’t go to the best universities. They’re not all that clever. They got their PhDs by slogging. They are easily swayed by strong personalities. They have second rate minds and do second rate science. It shows.

  81. Ralph Woods (09:04:11) : “Since Climate Science (one of the youngest branches of science) is all about projecting the future trends for the Earth’s Climate- computer modeling is their primary tool.”
    And all this time I thought it was Al Gore.

  82. CodeTech (08:42:51) :
    “So… to be a “climatologist” one requires knowledge of:
    1. Computer programming
    2. Chemistry
    3. Physics
    4. Statistics
    However, failure to have knowledge of or skills in some or any of these disciplines is easily overlooked if you have a “green” attitude and mindset, and are capable of writing suitably alarming predictions. Also the ability to procure grants and scare children is a genuine asset.”
    Sounds like they need to have an inter-faith meeting.
    I think your point proves quite well that the possibility of of one becoming a Climatologist in any real world sense is slim to none. Climatology should be stricken and fall under the umbrella of theoretical physics. An open and Interdisciplinary approach makes the most sense . With a widened pool of participants it may be possible to achieve greater transparency in the process so long as it is not stacked with eco-ideologues.

  83. The climate scientists just don’t get why they should follow standard software development practices. Does anyone remember the “On Replication” post at RC from about a year ago where Gavin made this telling statement at comment 89:
    “My working directories are always a mess – full of dead ends, things that turned out to be irrelevant or that never made it into the paper, or are part of further ongoing projects. Some elements (such a one line unix processing) aren’t written down anywhere. Extracting exactly the part that corresponds to a single paper and documenting it so that it is clear what your conventions are (often unstated) is non-trivial. – gavin]”
    http://www.realclimate.org/index.php/archives/2009/02/on-replication/
    So how is anyone else (or even Gavin) supposed to replicate the work? We’re supposed to “trust” whatever it is their paper said and the maybe somewhere on their computer is the code and data used to produce the result?
    I was already leaning to the skeptic side, but this discussion at RC is what pushed me off the fence.
    What’s even more disturbing is that using documentation, source control, etc. would actually make their lives easier in the long run.

  84. Fundamentally, if you’re going to do it right, you should start with a set of goals. So, for instance, if we collectively (since most of the commenters on this board are better able to handle this than, say, the Phil Joneses of the world) were going to set out the design requirements for a system for storing temperature data, we’d need to work through reliability and availability, security and auditability, relationships between datasets and individual data points and their annotations and ‘confidence factor’ ranges, etc. Changes are always made to new instances, the relationships to the previous instance and to any outside sources are logged, all changes are annotated with the adjustment formulae and algorithms, any such ‘adjusted’ instances must be released for public scrutiny before any other references can be made to the numbers.
    It’s not a difficult problem but it requires a different mindset from the ‘this is my sandbox, go away’ climate-science attitude.

  85. George Ellis:
    I would write that as:
    x = 0x05555;
    x = 0x0aaaa;
    x = 0;
    … bonus points if you know why… 😉

  86. Milwaukee Bob (11:24:34) :
    I’ll agree that the focus of our attention should be on the data, not the code that mangled it. My take from HARRY_READ_ME is that he found the programs were stomping data, overwriting instead of making new versions, not what it was documented to be doing.
    The buggy mis-documented code can go straight to the trash can for all that it’s worth. What we should be concerned with is accounting for is the missing station data, and I firmly believe there is much to be had.

  87. I wrote the software for analyzing data obtained from the BATSE experiment (on the COMPTEL Gamma Ray Observatory) in the late 80’s.
    Basically I wrote a program which would allow scientists to run their own models within it (for you techie-types, this was done under VMS using Fortran).
    The scientists working on the project had no problem sharing data and models with each other. However, they got miffed at a certain University (which I’m leaving unnamed), because some of their programmers hacked into our computers and stole some of the data – which allowed their researchers to publish first.
    The scientists I was working with were unhappy, because they felt, since they had legitimately been involved in the project from the beginning, they should have been given 6 months exclusive use of the data.
    They felt that would give them sufficient time to publish their hypotheses, at which time, they would release the data too.
    Of course, these were mere astrophysicists and theoretical physicists. What would they know about standard scientific practice?

  88. Re: max (Mar 4 10:30),
    Mike @9:27,
    depends on what physics (or other) you are doing. When it is practical to reproduce the experiment it is expected that people wishing to reproduce the work will do so and it serves as good check against experimental error, but information about how the experiment was conducted must be provided so that the experiment can be replicated.
    True.
    When it is impractical to reproduce the experiment (a need to book the CERN supercollider to run the experiment, for example) it is expected that the raw data of the experiment it is based upon will be made available to those wishing to attempt reproduce your results.
    No. The raw data are not shared with all and sundry in the accelerator experiments. The groups have rights of publication. Once the data is archived, it is open for sharing, after the experiment is closed, and still there are caveats.
    Replication is done by having more than one experiment at a time. In the LHC ATLAS and CMS are competing experiments studying the same physics independently.
    One reason is proprietary. It takes ten years of preparation by hundreds of people to set up the experiment and take the data. You would not find people willing to do that if the first theorist who came with a FOI request got his/her hands on the data before publication by the group.
    The second is the complexity. Each experiment develops its own computer codes ( not well documented) corrections etc that an outsider would have to spend years to do all over again, given the raw data. That is why at least two experiments are necessary.

    When the “experiment” starts with large and multiple data sets and consists of drawing conclusions from the results of manipulating some of those data sets (climate science), it is incumbent upon the “experimenter” to identify the data sets used and the manipulations to them so that the “experiment” can be replicated.

    In disciplines where the data is unique, yes. One cannot go back and remeasure the temperatures. On the other hand one could argue that all one needs is the temperature data and metadata to create completely independent code to study whatever. In principle with the different groups handling the climate data one should have been safe, except there was too much inbreeding and not independent replication. They were being orchestrated.

  89. JG-C: “For a somewhat technical post on this including the answer to my question given by Professor Jones and Professor Slingo see: http://www.jgc.org/blog/2010/02/something-bit-confusing-from-ueacru.html
    Thanks for the link. As we post here, Harry II through Harry XII, somewhere deep in the bowels of UEA, are frantically trying to create back-dated code that will crank out pseudoCRUTEM3 results from the abysmal database mentioned by Harry I. This is like trying to spin offal into gold. In Fortran, no less. Imagine the suspense, the hideous shrieks from the dungeon, the cabin fever…. Life must be Hell at UEA.
    “A miracle has happened.”

  90. Mike (09:27:15) :

    Since climatology is such a hot area, I too would like to see greater openness. This will cost more money. Granting agencies will need to provide budget lines for professional programmers (rather than cheap students) and web servers for data archives. But, there is no basis for demonizing researchers who have been following the standards in their field or ignoring past work. If you want to redo someone’s chemistry experiment should you be able to demand the use of their test tubes? Maybe some day chemists will be required to archive all their old test tubes. But, that won’t invalidate all the chemistry that was done before.

    In some cases yes. In some chemical processes you need to use special glassware that does not contain boron for example.
    The direct analogy in this case would be if the researcher says they used a well known commercial software package version xyz release #2, build abc345, then you could obtain the exact same build and release and implement his computation algorithm and see if you get comparable results.
    If he says, I just slapped together a statistical processing routine that does such and such — well then I would want to see the exact code he used if I got different results when I used well known statistical routine to do the same stated processing.
    If asked they should at least be able to point the duplicating researcher to a source for the source code, like “implemented a binary search as described in ref 3 page 26, code block 85”.
    If you then use the same binary search function he references and it does not work you at least have a starting point to figure out what is wrong. At a minimum, the code referenced is wrong and you both properly implemented the defective code. The referenced code is good, but one or both of you improperly implemented it.
    It is entirely likely they one of you has a working block of code with a typo in it that compiles successfully and on trivial problems works fine but at some extreme condition it blows up.
    When you get to that point, there is no other option that to do a line by line verification of the actual code used, as no other method will point out where the difference in behavior occurs that makes the original researchers results mis-match with the duplicating researcher.
    In the case of the chemistry example you used, in many cases common processes are so trivial that any commercial glassware will serve to perform the test, but in some cases a particular brand of glassware might contaminate the process. Likewise something as obscure as how the glassware was cleaned and dried might influence the outcome. In those cases you get down to verifying very trivial details of the process until you find the clinker that breaks the process.
    With undocumented ad-hoc code there is literally no way to know if the code does what the researcher says it does in all cases. Where with well documented professional class code, you at least have a reasonable expectation that like the glassware, your experimental setup is sufficiently similar to the researchers experimental setup that it is highly unlikely that the small differences between them will adversely effect the outcome of the experiment.
    Larry
    Larry

  91. Re: CO2 Realist (Mar 4 12:39),
    So how is anyone else (or even Gavin) supposed to replicate the work? We’re supposed to “trust” whatever it is their paper said and the maybe somewhere on their computer is the code and data used to produce the result?
    Think of it as an exam. The professor gives a problem. Each student solves it by his/her method using the information provided with the problem. The results have to agree, not the method of solving the problem.
    In this sense, given the data and meta data, another researcher should be able to say if the temperature is increasing or not. It is the data that is important. The method becomes important when there is disagreement in the results. In this case a third ( or fourth etc)independent analysis would clarify the issue as well as finding the error in the method of one of the two original who disagreed, and would be less of a hassle. It is the “independent” that is important. By subverting the peer review process independent analysis was lost.

  92. My point on Newsnight was that it appeared that the organization writing the code did not adhere to standards one might find in professional software engineering. The code had easily identified bugs, no visible test mechanism, was not apparently under version control and was poorly documented. It would not be surprising to find that other code written at the same organization was of similar quality. And given that I subsequently found a bug in the actual CRUTEM3 code only reinforces my opinion.
    If you swapped CRUTEM3 out and GIStemp in, this is exactly how I would evaluate GIStemp (or, rather, have evaluated it). I found one bug in the USHCN F to C conversion that is a compiler dependent order of execution that can warm 1/10 of the records by 1/10 C. Don’t know if it was fixed in the most recent release. There is no SCCS used ( source code sits in the same directory where scratch files are written…) and there is no visible test mechanism.
    It seems to be a systemic style failure among climate ‘researchers’…

  93. Thank you John for shining a bit more on the truth behind the cargo-cut cience practised by the CRU. Please keep at it and keep us in the loop here at WUWT as thing progress.

  94. This is not unexpected. I think it was about 18mth ago that I wrote on CA that, although I had not seen all the code, the information leaking out to SteveMc at that time indicated a complete lack of industry level software engineering principles in the climate science arena an in particular at NOAA/NASA.
    Why did I say it. I was a software engineering trainer in a large, internationally recognised, research company and a senior project manager of system, data, people projects. Nothing I read or saw indicated a formal approach to software development in climate science.

  95. Milwaukee Bob (11:24:34) : … You have AVERAGED, partial (weather!) data, haphazardly recorded, GENERALLY ONCE A DAY, at a miniscule number of points that varied over time within an ANALOG, CUBIC system (global ocean/atmosphere), with no controlled standardization of either equipment or placement, with said equipment never having been design for the intended purpose and virtually no understanding of the system as whole to start with,
    BINGO! You got it. It’s a pile of Mulligan Stew and we’re supposed to swallow…
    davidmhoffer (12:00:00) : Any IT shop should be able to reproduce the exact state of the data on any given day. In this context, “data” means EVERYTHING. Datasets, code bases, e-mails, everything that existed on THAT day. … The only thing that should escape these best practices is a file etc that was created and destroyed on the same day. Everything else should be there.
    With modern journalling file systems, you can even capture moment to moment changes. The Network Appliance has a kind of version control built in to it. This is a 10 minute kind of thing to set up.
    FWIW, the standard when I was making commercial software for release was a “golden master” day. We went through “soft freeze” then “hard freeze” where it was almost impossible to put more changes in (only lethal bugs fixed) and finally, after QA suite passed it, we made the “Golden Master”. Those were archived forever and they were the only thing used for production.
    Compare that with GIStemp where each time it is run, it recompiles the FORTRAN (using whatever compiler is in the given users environment variable…) and then deletes the binaries at the end. It leaves the source code in the scratch file directory and it encourages you to hand edit things if you want different behaviours. Further, the data changes moment to moment over the month and there is no “release date” on the file. You just hop onto the NCDC web site and down load an ‘image du jour’. And it is not just adding new records, it is changing old records from the past up to several months, and perhaps years ago.
    http://chiefio.wordpress.com/2010/02/15/thermometer-zombie-walk/
    This instability was one of the first things I fixed in the version of GIStemp I got running. I added a source code directory and trivial “make” file.
    http://chiefio.wordpress.com/2009/07/29/gistemp-a-cleaner-approach/
    So it is literally the case that each time any user runs GIStemp, they are running a slightly different product and may be doing it on somewhat different data…
    That is just sooo broken a design…
    The whole thing is just so, so, “ersatz”.

  96. Dr Graham-Cumming
    Your footnote in history may be small, but is somewhat akin to Fleming noticing that his hugely old bacterial plates, which in a properly run lab would not still exist, show ‘interesting features’……..

  97. mpaul (09:48:33) :
    “Professor Slingo: … We test the code twice a day every day. We also share our code with the academic sector, so the model that we use for our climate prediction work and our weather forecasts, the unified model, is given out to academic institutions around the UK, and increasingly we licence it to several international met services
    Nail on head comes to mind. She has absolutely no idea what software testing means. She thinks it means running the program and looking at the result(s) and if it looks alright then it passes the test.

  98. It is important that everyone go back to square one and checks the stations that are used one by one.
    Warwick Hughs has the books.
    http://www.warwickhughes.com/blog/?p=510
    “These books are witness to the processes operating at the birth of what we now know as IPCC AGW. Information contained in TR022 and TR027 will assist people who are curious to uncover what Jones et al have done with temperature data from their village, town, city, region, state or nation.”
    I have checked just three stations in Australia so far and there appears to be heaps of “value adding”and little consistancy from Jones 1990 to Jones 1999.
    Here is Halls Creek for jones 1991
    http://members.westnet.com.au/rippersc/hcjones1990.jpg
    I notice He picked the year1899 only out of the old station record which as I have said before is 12 kims and 63 metres downhill from the current one.
    Here is the 1999 version
    http://members.westnet.com.au/rippersc/hcjones1999.jpg
    There has been some adjustment in the 1950’s & 1960’s.
    I did a rough calculation and conservatively I reckon that the previous Halls creek temps get extrapolated over more than 1M sqr kilometers or around 14% of Australia’s land area.
    Here is Kalgoorlie
    http://members.westnet.com.au/rippersc/kaljones1990.jpg
    Note that the Jones interpretation is a combination of two stations after 1941.
    He also used Southern Cross (~200km away for 1895-1899)leaving a 42 year gap.
    It would have been simpler to just use the entire Southern Cross record IMHO.
    http://members.westnet.com.au/rippersc/southernX.jpg
    Here is the Jones 1999 version for Kalgoorlie
    http://members.westnet.com.au/rippersc/kaljones1999sg.jpg
    http://members.westnet.com.au/rippersc/kaljones1999line.jpg
    Huge difference to the 1990 version ,
    Here is the Jones figures for Southern Cross
    http://members.westnet.com.au/rippersc/scjones1999.jpg
    I notice that global warming from 1990 to 1998 has caused Southern Cross to get 1.4 degrees C colder in the 1890’s years that were used in 1990 .

  99. Re: ShrNfr (08:44:02) :
    Oh my, such a serious misspelling! Surely you meant to type “clusterfrag” instead.
    Moderators! Yoo-hoo! Can you help this nice person out and make that change for them? Thanks in advance!

  100. (First time posting here). Does anyone look at ClimtePrediction.net as a basis for how the models work? I had a quick look and it appears they run thousands of simulations with different paramaters to ‘recreate the past’. And then pick the most accurate to simulate the future. Therefore all future scenarios that you might expect to happen, will happen in the simulations.
    My questions is, how do UEA/NASA tackle the problem and is it much different?

  101. “amateur computer analyst”
    Nobody uses that term anymore. I’m surprised they didn’t say ‘in the data processing centre’ or some other such 1980’s term.
    I guess if he was a professional he would be in academia and writing papers instead of, you know, actually delivering real solutions to companies who pay for them.

  102. davidmhoffer (11:31:14) :
    Anyone who dealt with a Y2K mitigation project knows what a nightmare the legacy code was in terms of documentation and revision control and that was for both private enterprise and public enterprise applications.

    Funny you mention Y2K. I’m on a project that found a Y2K bug a couple of weeks ago. Fortunately the buggy code wasn’t invoked in that routine until we tried it but still. . ..Only I would be hit with Y2K a decade later!
    I work for a health insurance company, read code all the time (primarily COBOL and APS) and I promise that if OUR code looked like CRU’s, we’d have been closed down years ago. And our code doesn’t atttempt to justify the slavery of the entire free world.

  103. rbateman (12:57:35) :
    Yes, but don’t forget if you do not know or can’t see the big picture (forest for trees and all that) AND you do not know how it all works together, you probably don’t have enough of the RIGHT data to begin with.
    For any kind of analysis model, the design of the model to be correct has to first be “complete”. Yes, complete is a relative term, but in global atmosphere modeling there is no amount of relativism you could possibly apply to claim we’re close enough.
    THEN, there is that damn data….. ☺

  104. mpaul (09:48:33)
    “Tested twice a day”
    Is that when they run it to feed us the output?

  105. bill (09:51:52) :
    “Mike, thats true but by the same token if the standards (of computer coding out of which are born the models) are rather low in this field, should we take their work as good enough to be the basis of policies which could well cause considerable lifestyle changes not to mention rather more taxes?”
    Fair question. (1) These low stabdards did get us to the moon, etc., etc.
    (2) Many climate research groups have arrived at similar results.
    (3) Most of the code for the data analysis of temps is now availible. Small errors that have been found have not substainly changed the results.
    The last two points illustrate the robustness of the climate results.
    Small note: the programs at CRU are for data analysis, not climate modeling. The hocky stick is a data set of past temps.
    Here is another way to look at it. Suppose we went back to the early medical work showing the link between tobacco and cancer. It probably would not meet these new standards. Did they save all the data? I doubt it. Are all the statistics programs that were used availible? I can’t imagine it. Yet, it would be foolish to run out and start smoking.
    Remember that the tobacco companies put up are fight to keep people smoking. Millions died.
    If we enact C&T schemes now with high caps for now, at least we will have a system in place. I figure it will take a few yaers to get a the “bugs” out of C&T. If the temps go down, and I’d venture there is a 5% chance of that,
    then we keep the caps high. Not much harm done. If we do nothing, and the temps go up and we keep pilling up the CO2, we are going to mess things up big time. No, I don’t think it will be the end of civilization, but major hardships will be impossed.
    We do need to weigh the risks of doing nothing. Some people will go to doctor after doctor until they hear what they want. If the first nine doctors tell you to lose weight, eat better and get more exercise but the tenth one says not to worry, it is tempting to go with the tenth doctor, but this is not wise.
    If the models are off, it only means the warming will come a few years or decades later. You can’t get around the physics that more CO2 will eventualy cause big problems.

  106. Milwaukee Bob (14:09:09)
    THEN, there is that damn data….. ☺

    That damn data status can be improved upon. The data sets(raw) out were by no means derived from an exhaustive and through effort.
    I would estimate that the completeness of the records could be improved upon greatly by examing closely the archives.
    I am getting this very unnerving feeling that few have bothered to look into the differences between what’s in archives and what’s being put out there as “that’s it, nothing more to see”.
    How many out there have looked?

  107. Moderators: Re: kadaka (13:58:35)
    Ah, that change works as well. Thank you for your prompt attention.
    Feel free to delete this and my previous comment at your discretion, I won’t mind. If you want to leave them, perhaps as a sort of change note, that’s fine too.

  108. E. M. Smith
    With modern journalling file systems, you can even capture moment to moment changes. The Network Appliance has a kind of version control built in to it. This is a 10 minute kind of thing to set up>>
    Well now we’re talking storage management, not backup 🙂
    All storage arrays can do a snapshot of what is stored on them and keep it as a “point in time” copy. Not all snapshot techniques are the same, so some arrays can only support a few snapshots while others (like Netapp) can support hundreds. This is slightly different than version control. Also, it is valid for flat files, but databases require that they be quiesced before snapshot otherwise there is a possibility of the snapshot losing an in flight transaction and compromising the integrity of the snapshot. Many vendors have tools that can do this automatically for common databases like Oracle, SQL, Exchange, etc. As for journaling, yes that can be done at the file system level, and most database and transaction processing systems can do it at the application level. In brief, if you are willing to spend the money, you can capture the state of the whole system down to the second if you want. But as a rule of thumb, the minimum any IT shop would have in place would be weekly fulls and daily incrementals. A typical Netapp shop (or Equallogic, or Sun7000, IBM nSeries which support “re-direct on write” snaphots) would supplement the tape based backup system with hourly snapshots.

  109. Mike (14:26:24),
    *sigh*
    Instead of sending you to logic re-education camp, start reading the archives. “What if” scenarios can mean anything.
    And by saying: “You can’t get around the physics that more CO2 will eventualy cause big problems,” you’re telling us that you are right and planet Earth is wrong. CO2 has been just a mite higher in the past, without causing your imaginary “big problems.” click

  110. Anna:
    “No. The raw data are not shared with all and sundry in the accelerator experiments. The groups have rights of publication. Once the data is archived, it is open for sharing, after the experiment is closed, and still there are caveats.
    Replication is done by having more than one experiment at a time. In the LHC ATLAS and CMS are competing experiments studying the same physics independently.
    One reason is proprietary. It takes ten years of preparation by hundreds of people to set up the experiment and take the data. You would not find people willing to do that if the first theorist who came with a FOI request got his/her hands on the data before publication by the group.
    The second is the complexity. Each experiment develops its own computer codes ( not well documented) corrections etc that an outsider would have to spend years to do all over again, given the raw data. That is why at least two experiments are necessary.”
    I am a little puzzled by this. Just how do people try to reproduce the results of experiments before those results have been published? Yes, to various extents there is exchange of information prior to publication, but prior to publication the results are subject to revision. This is not about pre-publication embargoes of data so that the experimenters can publish, this is about the post-publication blackout on the data so that the results cannot be tested once they become know to the world at large. Once everyone is done publishing, it is expected that the raw data will become available so that the results can be checked is a perfectly acceptable formulation, but contextually I didn’t (and still don’t) see the need to add a “post publication” qualifier.
    a side issue: the problem with climate science is that there are so many possible data sets and means of manipulating them that you can produce a wide variety of results. these are not all equally valuable in determining global trends, in fact many are almost useless. without information about what data sets are used and how they are manipulated it is impossible to determine (among other things) to what extent the results are an artifact of the manipulation, to what extent the data sets are representative of a global trend and what degree of significance to attach to the results.

  111. Somewhat o/t
    Another view on amateur
    I once worked for a man who voluntered for army service very early in WWII. Their medical officer was, in civies, a VD specialist. His basic message on that subject was
    “Its not the professionals you have to worry about, its the bloody enthusiastis amateurs”
    And I suspect that he would have allowed that, while there are gifted amateurs, these are vastly exceeded by amateurs who think that they are gifted.
    And applicable to most subjects.

  112. Simple question for the committee.
    ISO 9001. Is CRU accredited?. If not, stop wasting time and money – close the enquiry.
    If it is accredited, then when was the latest audit and what were the results. If negative, stop wasting time and money close the enquiry.
    Then sanitise the organisation.

  113. Visceral Rebellion;
    Funny you mention Y2K. I’m on a project that found a Y2K bug a couple of weeks ago. Fortunately the buggy code wasn’t invoked in that routine until we tried it but still. . ..Only I would be hit with Y2K a decade later.>>
    Don’t worry, there’s another round coming. A lot of the fixes were temporary. they took a range like 0 to 18 or 0 to 34 and wrote a little routine to convert JUST that date range to 2000+ instead of 1900+ on the assumption that a) there were no computer records prior to 1960 or so to conflict with, and b) the sofware would be replaced with new software before the new hard coded fix ran out of runway. Then everyone forgot about the new deadline they created for themselves and went back to day to day emergencies.
    Grace Hopper would chuckle.

  114. davidmhoffer (14:39:10) said:

    A typical Netapp shop (or Equallogic, or Sun7000, IBM nSeries which support “re-direct on write” snaphots) would supplement the tape based backup system with hourly snapshots.

    Would you be referring to “copy on write?” (The actual implementation is unlikely to copy the old data, rather it would simply allocate a new block for the new data and change some pointers in the metadata [block lookup table/b+tree/whatever].)

  115. Richard Sharpe (15:29:14) :
    davidmhoffer (14:39:10) said:
    Would you be referring to “copy on write?” (The actual implementation is unlikely to copy the old data, rather it would simply allocate a new block for the new data and change some pointers in the metadata [block lookup table/b+tree/whatever].)
    Yes but no. Early storage arrays like EMC, HP EVA, LSI, etc etc used a snapshot called “copy on write”. When the file system wants to change a data block, the snapshot tool interrupts the write, copies the original block and writes it to snapshot reserve to be retrieved later if the snapshot needs to be invoked, then allows the original write to change the original block. This works, but uses a lot of I/O, so a limited number of snapshots can be supported before performance of the array is impacted.
    Netapp, Equallogic, others use a different technique called “re-direct on write”. In their file systems, when a block needs to be changed, the file system writes a net new block, and leaves the original in place. The file system itself is changed to point to the new block (re-direct) instead of the old one. The snapshot tool in this system copies the file system at a point in time to preserve the pointers (you will hear the term “pointer based snapshot” as well, same thing).
    There are pros and cons to both strategies.

  116. rbateman (14:37:09) :
    Absolutely! Collect data on and of every sub-system we know is involved. Have a through search for historical data. I am not disagreeing with you.
    But if you take a hard, cold look at what data we have now relating to the entire atmospheric system…. It’s zilch.
    And there are sub-systems that we know have effects on the total system that we have virtually no data on and others that think have effects but we’re not even collecting data on them for a number of reasons.
    Here’s a sentence I deleted from my previous post before I submitted it – “The real shame in all of this is the billions of $$ wasted by so called “scientist” on partial, low quality data when the equipment and systems used to get the data is not only inept for that purpose, but is crumbling around their feet! Not to mention they know the “model” they’re plugging the data into is full of holes.”
    And what do we have for it? Al Gore…. Cap & Trade….. IPCC…… A Draconian EPA….. Thanks a lot, “scientist”. Next time I’ll skip the dance.

  117. An old term comes to mind a friend told me ” garbage in garbage out”. I have just started , 3 weeks ago, to start looking into allthe hype about AGW. I have not believed in AGW since it became an issuse 20 someodd years ago.I wilnot pretend to to know the science behind it,as i am a average person in the us,an auto mechanic by trade. I was taught early on in life that know matter what kind of education you have COMMON SENSE trumps all,most of the time.Good intentions are all well and good,but if you do not inject good common sense,all you have done, as in this instance, is yell fire in a crowded room. Its all well and good they they say the world is warming,but please BE VERY SURE!!!!,before you tell the world that some thing needs to be done, that it in fact needs to be done and you know how it should be done.any thing else is just pi$$ing inthe wind. This my first time posting,i hope i did alright< i have many thoughts on this subject and i hope to put all down little by little. this site ia amazing,anthony, great job!!!! hopefully,iwould like to actually sit with some of you and talk all about this and other subject matter. i find it easier to talk to people than sit here and type,something like skpe,if we can?

  118. Much of the data coming in is such unstructured and incomplete crap that it lends itself to manipulation using QAD code just to try and stitch the data together into something that can even be processed…
    It’s like trying to build a car using a collection of bits from different manufacturers – the end result may look like a car but it won’t work like a car…
    These whole projects attempting to produce global/regional gridded temperature data need taking right back to square-1 and starting over with consistent methodology using only complete quality assured data-sets. Until then it’s just GIGO…
    Maybe they should learn to walk before they can run?

  119. John Galt (09:06:48)
    “Some very good databases are also open-source”
    ===
    Yes, but as we learned via the Climategate emails (confirmed by none other than Phil Jones during his recent testimony), “open-source” is an anathema to “climate scientists”!

  120. davidmhoffer (15:28:04) :
    Don’t worry, there’s another round coming. A lot of the fixes were temporary. they took a range like 0 to 18 or 0 to 34 and wrote a little routine to convert JUST that date range to 2000+ instead of 1900+ on the assumption that a) there were no computer records prior to 1960 or so to conflict with, and b) the sofware would be replaced with new software before the new hard coded fix ran out of runway. Then everyone forgot about the new deadline they created for themselves and went back to day to day emergencies.
    Grace Hopper would chuckle

    Nah, we went whole hog with the long way ’round. If there were ANY chance I’d have to deal with THAT project I’d be looking for another job!
    Of course, what they’re doing to us right now out of DC is just as bad, and may be worse, than Y2K ever dreamed of. If you think CRU is dumb and evil, check out CMS.

  121. Milwaukee Bob (15:55:08) :
    I’d do it for regional purposes. That’s where it’s going to of most use.

  122. In partial defense of not using professional programmers, and not using professional software techniques. I am guilty of having done that, as it was standard practice for many years in my engineering profession, and many others as well.
    Disclosure: as a chemical engineer beginning in the 1970s, I wrote reams of amateur computer code. It worked – eventually; was tested to my satisfaction for the purpose at hand, and was sometimes left for the next poor fellow to deal with, perhaps years later. This was the norm in many organizations who built and ran the chemical plants, refineries, and many other manufacturing operations.
    The reasons we did this were financial and time constraints, as there were deadlines to get things done and very little staff or budget for it. When we published in our technical journals, we did not publish code, rather we published the mathematics that went into the computer code. This was standard practice for many years. (for examples of publications in the U.S., see Hydrocarbon Processing, Oil & Gas Journal, Chemical Engineering Progress, also Chemical Engineering, all available in most university libraries). As was mentioned in a comment above, it was assumed that those reading the publication would have the skills to do the programming – that was considered a trivial task.
    Then, after sufficient computer code was created internally, a choice arose when a new task was at hand: use somebody’s old code (probably undocumented and poorly written), make it work for your own purposes, or, start from scratch and write your own code.
    Management wanted engineers to go with choice number one, but that created problems for management, as they could not understand why old code did not work the first time and produce valid results for the current problem. Eventually, in some organizations, we resorted to having professional software engineers and managers deal with the legacy computer code. And thus was born the technical group within the IT department. The IT department had the same issues, how to reprogram old code and standardize it. They also brought in version control and some of the other good programming techniques mentioned in comments above.
    We also found it more economically attractive to lease or buy professionally written and maintained software, and an entirely new industry arose: software providers for the chemical engineers. A couple of such companies were Simulations Sciences of Brea, California, also Aspen Technologies, but there were usually a half-dozen or so. Some of our legacy code was run by the commercial software as a plug-in subroutine. That in itself created more than a few problems, though.
    We did not have the fate of the world riding on our software results, but we did have multi-billion dollar processes that could be harmed (or explode) if our code was wrong, and some smaller processes in the hundred million dollar range.
    It would appear that the climate scientists are today somewhere in the state that the chemical engineers were in a couple of decades ago: they could use an IT department and quit doing the programming themselves. This ClimateGate fiasco could also create a competitive commercial software industry, where the climate scientists shove their data in the front, and professionally written and maintained software crunches the data to produce the output.
    Lamentably, this probably will not happen. A major drawback to these types of commercial software is the lack of flexibility, and stifling of creativity in writing one’s own code to produce results.
    My preference is to at the very least, make sure the software is examined by professionals and brought up to some reassuring standards so that the code is robust and bug-free. This is the minimum for making policies that have the implications as proposed by the climate science community. We were able to make do in the early days with our engineer-written, use-it-once code where the worst outcome was we installed a pump or heat exchanger that did not work. The world’s economies did not suffer much, although the engineer who did this might have been out of a job for sloppy work. The stakes for climate models is, of course, far higher. Those who make policy based on the climate science should demand that the data and the computer codes be as up-to-date and modern as possible. No expense should be spared.

  123. Mike (14:26:24) :

    Mike, OUCH, man. You’re way off here.

    Fair question. (1) These low stabdards did get us to the moon, etc., etc.

    No, they most CERTAINLY did not. NASA had the highest standards available at the time, and if the original computer programming is crappy or not archived, it’s because a large number of PEOPLE were involved, each checking the others’ work. Failure was not an option, and the computer was used as it was supposed to be: as a tool, not as the answer.

    (2) Many climate research groups have arrived at similar results.

    … by using the same contaminated data …

    (3) Most of the code for the data analysis of temps is now availible. Small errors that have been found have not substainly changed the results.

    Most, hey? You’re not keeping up.

    The last two points illustrate the robustness of the climate results.

    Well, they would if they were accurate.

    Remember that the tobacco companies put up are fight to keep people smoking. Millions died.

    No, that’s wrong. The tobacco companies put up a fight to remain in business while powerful special interests fought to destroy their LAWFUL business activities. Smoking was, and remains, the decision of the smoker. The anti-smoking lobby long ago crossed the line from idealistically attempting to wean us off of a harmful habit, and are now just outright lying. The parallels are striking, as powerful special interests are fighting to destroy LAWFUL businesses and destroy peoples’ livelihoods on the basis of a disproved hypothesis.

    If we enact C&T schemes now with high caps for now, at least we will have a system in place. I figure it will take a few yaers to get a the “bugs” out of C&T. If the temps go down, and I’d venture there is a 5% chance of that, then we keep the caps high. Not much harm done. If we do nothing, and the temps go up and we keep pilling up the CO2, we are going to mess things up big time. No, I don’t think it will be the end of civilization, but major hardships will be impossed.

    Here’s the OUCH. YOU figure about a 5% chance of nothing bad happening? Well, I figure that’s completely baseless, and delusional.
    You figure “not much harm done”? Well, the evidence in Europe and even from this recession say otherwise… that we would literally shut down our first world economies. Not just a slowdown, and not some magical, fairyland of “green jobs and energy”. Not even close. Economic suicide is an understatement. Unlike the 1929 depression, a HUGE percentage of the first world now directly have their savings and investments in the markets, markets that will crash, fail, tumble, tank, and end. It won’t just be stock brokers jumping out of windows.

    We do need to weigh the risks of doing nothing. Some people will go to doctor after doctor until they hear what they want. If the first nine doctors tell you to lose weight, eat better and get more exercise but the tenth one says not to worry, it is tempting to go with the tenth doctor, but this is not wise.

    Um… again, we need the courage to do nothing. This IS a non-issue. There is NOTHING WE CAN DO that will make a lick of difference. I realize you believe otherwise, but your belief is based on fabrications, lies, political spin, and maybe even a bit of hero-worship. Whichever, your belief is wrong.

    If the models are off, it only means the warming will come a few years or decades later. You can’t get around the physics that more CO2 will eventualy cause big problems

    Yes, actually you can. But it’s nice that you used the word “physics” because it made you sound more authoritative.
    When it comes down to it, your entire post has no basis in reality, only a belief system. And that is most of the problem we’re fighting against. Well meaning people have been deliberately misled and used as tools by a few ambitious and unscrupulous people. It’s shameful, really.

  124. Re: Mike (Mar 4 14:26),
    If we enact C&T schemes now with high caps for now, at least we will have a system in place. I figure it will take a few yaers to get a the “bugs” out of C&T. If the temps go down, and I’d venture there is a 5% chance of that,
    then we keep the caps high. Not much harm done. If we do nothing, and the temps go up and we keep pilling up the CO2, we are going to mess things up big time. No, I don’t think it will be the end of civilization, but major hardships will be impossed.
    Bold mine.
    This is a very naive statement. You are saying : take civilization back to 19th century levels , and not much harm will be done!!, if the pyramid scheme worked of course in reducing the alleged culprit, CO2.
    Already millions starved to death in the Third World with the ethanol fiasco, because the price of corn went artificially up. In Haiti they were eating mud pies before the quake because of this. And you have the hubris to say not much harm will be done. I guess as you and yours survive the pyramid.
    There is the law of unexpected consequences that the naive do not know and the sharks of this world know and expect.

  125. Re: Roger Sowell (Mar 4 19:27),
    It seems that the conflict is between professional programming versus creative programming by researchers.
    Researchers use programming as a tool. Research grants are usually limited and most of the job is done by graduate students who are fired by the enthusiasm of the subject they have chosen to research. Creativity is nurtured, and creativity is opposed to regimentation.
    From your post I see that it was the same in the first years of using computers in industrial situations. May be it was a bleed through from the academic methods.
    In particle physics, professional programming is done in products that are used for the programs for research. Monte Carlo programs, thousands of mathematical functions, there is a CERN program library and it is expected to have professional programming standards.
    The programming done in research situations is the problem solving type: these are the data, I need to analyze them using computer programs and statistics as tools, to display the trends and further to see if a current hypothesis is correct.
    This is ad hoc, and it is tested by other graduate students working in other experiments and either agreeing or disputing the conclusions.
    No decisions of world wide nature hang on these studies.
    The problem with climate “science” is that it follows the research pattern while claiming industrial level outputs for political decision making. And that decision may be one that leads to the destruction of the western world as we know it and the death of billions in the third world.

  126. Mike:
    If the temps go down [in a few years], and I’d venture there is a 5% chance of that, ….

    There are bets you can make on how warm future years from 2011 up until 2019 will be (based on GISStemp’s online figures), at the well-known, Dublin-based event prediction site https://www.intrade.com (Click on Markets → Climate & Weather → Global Temperature).
    For instance, you can bet on whether 2019 will be warmer than 2009. There’s an offer there currently to “sell” 100 lots at 90, which says that there’s a 10% chance of No Warming. So you can get odds that are twice as attractive as your 5% estimate of No Warming.

  127. PS: If you (Mike) similarly think that there’s only a 5% chance that 2014 won’t be among the five warmest years ever — and surely a cap-and-trade warmist would agree that the odds of that happening are remote, given the long-term uptrend and the recent hiatus in that trend that ought to be “made up” — then another bargain is available, because 10% odds are being offered on that proposition as well. (I.e., there’s a Sell offer at 90.)

  128. I am reminded of that old saw..
    “People who live in glass houses should not throw stones”
    Seems that Professor Hans von Storch, Dr. Myles R. Allen et al still have a lot to learn. Humility would be a good start.

  129. I run a very small property services company. There are only a few tens of thousands of dollars at stake with each transaction yet there is no way my public liability insurer would stand for me writing code myself to handle management of transactions, nor me doing my own property valuations and not using an independent licensed valuer despite the fact that I have 20 years experience in processing these transactions and assessing valuation documents. I would be sued to bankruptcy. Yet with trillions of public and private money at stake these so called UEA academics and their supporters think it entirely appropriate they should bodge up there own code and statistical methods and that criticism from professional programming engineers and statisticians is out of order. These guys are truly out of there minds.

  130. anna v (21:28:50) Well-said.
    Mike: re your conviction that CO2 is causing global warming, there are at least two places you can put your money to make a fortune on Global Warming. These are stock index funds GWO, and PBW. You might be interested to know, however, that the price for each has taken a serious tumble in the past two years – far more than the S&P 500 index. It would appear that the profit-seeking investors of the world have missed a big opportunity, leaving it wide open for you. Good luck with that.
    http://sowellslawblog.blogspot.com/2010/02/is-global-warming-good-investment.html

  131. I’ve examined the code for a bunch of NOAA models. It was uniformly shabby and contained a number of obvious deficiencies peculiar to handling floating point math with digital computers. No methodical rounding control to contain generational error creep, no provision for detection/warning of loss of precision, etc. It appeared to have been written by someone with very limited knowledge of computational math techniques.
    Doing loosey goosey floating point math when huge error margins of 20-50% are intentionally designed in (ex. structural engineering) and small errors don’t matter is one thing. Doing it when tiny fractions are divined to mean something significant is quite another.

  132. CodeTech (20:30:44) wrote: “You [Mike] figure “not much harm done”? [By C&T] Well, the evidence in Europe and even from this recession say otherwise… that we would literally shut down our first world economies. Not just a slowdown, and not some magical, fairyland of “green jobs and energy”. Not even close. Economic suicide is an understatement. Unlike the 1929 depression, a HUGE percentage of the first world now directly have their savings and investments in the markets, markets that will crash, fail, tumble, tank, and end. It won’t just be stock brokers jumping out of windows.”
    Now who is being alarmist? C&T or a carbon tax that started low (or a high cap), which is what I’ve seen advocated, would do little to the economy. If then the majority of climatologists are right, we can rise the tax or lower the cap to limit the damage caused by climate change. It time C&T or a CT would cause some economic drag but nothing like what you are saying. Read the CBO report.
    http://www.eenews.net/public/25/11455/features/documents/2009/06/22/document_daily_01.pdf
    http://www.cbo.gov/ftpdocs/91xx/doc9134/04-24-Cap_Trade_Testimony.1.1.shtml
    While there is no free lunch your alarmism has no basis in known economic principles.
    anna v (21:02:35) wrote: “This [My statement regarding C&T] is a very naive statement. You are saying : take civilization back to 19th century levels , and not much harm will be done!!, if the pyramid scheme worked of course in reducing the alleged culprit, CO2. ”
    More economic alarmism. I can’t even make sense out of your sentence.
    anna v (21:02:35) wrote: “Already millions starved to death in the Third World with the ethanol fiasco, because the price of corn went artificially up. In Haiti they were eating mud pies before the quake because of this. And you have the hubris to say not much harm will be done. I guess as you and yours survive the pyramid.”
    Millions did not starve. There were food shortages that some attributed to the shift toward ethanol production. And I agree, pushing ethanol is not backed by the science but by corporate and political interests. Further research may change this. But, we do need to watch where the money is coming from. There is influence peddling all around. But, the big bucks are from the oil and coal companies that have funded a lot of the conservative think tanks that push denial and do nothingism. The “green” corporate interests are far weaker.
    Environmental alarmists do exist in the media, but not so much among the actual climate scientists. But be just as weary of the economic alarmists.

  133. As a climate modeler I am not surprised that the code did not adhere to professional coding standards and didn’t have “error checks” or “was poorly documented”. The problem of course is that the CRU data were processed by climate scientists who know how to code but are not professional programs that have to adhere to strict coding standards. Oftentimes modelers write brute code in order to process data. Not surprising this programmer did not like what he saw in a climate scientist’s code.
    I am not defending poor practice but just saying this isn’t as uncommon in the climate community as you think.

  134. rbateman (08:45:48) : Now I have to wonder if there is software at USHCN or GHCN that stomps over real data and punches artificial holes in it where no holes should be. I’d like to get with you, E.M.Smith, if you are about. I have some very interesting anomalies in the data that you’d be good at figuring out the pattern to.
    Well, I’m “about”. Though frankly, with the volume of postings here lately, I can’t follow all the comments in all the threads. I’m “skipping around” more than usual. So it would be better to talk over on chiefio.wordpress.com or in my email (encoded in words in the “about” tab over there…)
    And yes, I’ve noticed a few holes that didn’t make sense too. The most blatant is Bolivia where they have had CLIMAT reports at least since 2007 and I’ve found other Met Depts with the Bolivia records, yet GHCN does not have them…

  135. Gregg E: Ref Ariane 5 failure report is here: http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html
    DRA/DERA Malvern were involved and were part of UK MoDs software and safety assessment group. I was part of that organisation until it split to QinetiQ and DSTL and which still supplies software system assessment, audit etc of very complex systems. The ISA roles. So the UK Gov (OGD) has and always had access to independent assessment on most any technical topic..trouble is OGD in particular hardly ever used it. Hence the mass of IT/Software system calamities we have seen over the years relating to Gov Procurement. It looks as if its all happening again, only this time at the Planet level! I have always considered AGW a likely lame duck simply because of the lack of published project control information..never mind the data.

  136. @ Mike (14:26:24) :
    Others have pointed out most of the many flaws, but I think you demand a proper fisking

    Fair question. (1) These low stabdards did get us to the moon, etc., etc.
    (2) Many climate research groups have arrived at similar results.
    (3) Most of the code for the data analysis of temps is now availible. Small errors that have been found have not substainly changed the results.
    The last two points illustrate the robustness of the climate results.

    1. Not true. And a bit of a straw man. The entire basis of the AGW demand for world shaping policy changes is statistical analysis of adjusted data using computer software. If the people doing this cannot show their work, it is a major problem. This is not like a reproducible chemistry experiment. And while I would not use actually getting to the moon as the primary test case for NASA software, it does sort of validate. Which is why some have said wait 30 years (or another 15 years or whatever) to have a reasonable period for identifying the validity of the models.
    2. Using the same compromised ADJUSTED data, with the same selection bias and most importantly the same adjustments which are what actually introduce the warming.
    3. The data analysis is based on adjusting actual historical temperatures, interpolating multiple stations in specific ways to establish grids (which may or may not be defensible depending on the details).
    The last two points illustrate nothing about robustness. They illustrate a real problem with basing different analyses on the same adjusted data and a potential problem with

    Small note: the programs at CRU are for data analysis, not climate modeling. The hocky stick is a data set of past temps.

    The hockey stick is most certainly NOT “a data set of past temps”. It is an amalgamation of multiple different unrelated proxies (Ice cores, tree rings etc) stitched together in sequence, plus some of the same adjusted temperature records referenced above, minus some inconvenient proxy data data (tree rings from 1960 to 1990) that does not agree with their pre-conceptions or their models, and ought to undermine their “calibration” of the proxies (rather than “hide the decline”).

    Here is another way to look at it. Suppose we went back to the early medical work showing the link between tobacco and cancer. It probably would not meet these new standards. Did they save all the data? I doubt it. Are all the statistics programs that were used availible? I can’t imagine it. Yet, it would be foolish to run out and start smoking.

    This is a flawed analogy. The equivalent would be if those scientists had thrown out the orignal data and not documented their adjustments but demanded that we take it on faith. Or if they re-ran an analysis of some early medical work showing the link between tobacco and cancer and adjusted the data to show that fewer people actually died, or that people who died had died of other causes, or that some of them were not actually dead (just sleeping very soundly), or were not actually people, and then concluded that tobacco had no link to cancer deaths.
    And then demanded legislation that everyone had to subsidize smokers, or that all workplaces had to allow chainsmoking at your desk.
    That would be foolish indeed

    Remember that the tobacco companies put up are fight to keep people smoking. Millions died.

    This is would only be irrelevant, except it also carries the implication of this bizarre shibboleth of the AGW crowd that the only reason one would question their pronouncements is being funded by the deep pockets of the big commercial interests. Just stupid.

    If we enact C&T schemes now with high caps for now, at least we will have a system in place. I figure it will take a few yaers to get a the “bugs” out of C&T. If the temps go down, and I’d venture there is a 5% chance of that,
    then we keep the caps high. Not much harm done. If we do nothing, and the temps go up and we keep pilling up the CO2, we are going to mess things up big time. No, I don’t think it will be the end of civilization, but major hardships will be impossed.

    The cap and trade system is what we should be concerned about as what the big business interests actually want. From Enron to Goldman Sachs, these derivative markets have been promoted by the same scum who have done more to subtract value, transfer wealth to the rich and game the entire system

    We do need to weigh the risks of doing nothing. Some people will go to doctor after doctor until they hear what they want. If the first nine doctors tell you to lose weight, eat better and get more exercise but the tenth one says not to worry, it is tempting to go with the tenth doctor, but this is not wise.

    This could be a valid scenario. Or it could be completely off point. Or if in the early 1990s someone with chronic ulcers went to nine doctors in a row who all trotted out the conventional wisdom that was accepted but not questioned within the profession and so they all prescribed the same dietary and pharmacological prophylaxis or treatment. Which did not work. Then he went to a tenth doctor who said “We don’t really have foundation for these protocols but they clearly have not worked for you. However some Australians think that a plurality, perhaps a majority of chronic ulcers are caused by H pylori infection, so let’s try some antibiotics which while unproven for this use case have well known and limited potential negatives”

    If the models are off, it only means the warming will come a few years or decades later. You can’t get around the physics that more CO2 will eventualy cause big problems.

    Removing the adjustments from the actual data and/or questioning the validity or robustness of some of the proxies used makes the warming go away. The models could be directly opposite of reality.
    And a simplistic reading of the physics or focus on CO2 mitigation as the sole policy amelioration underestimates the potential effect of other greenhouse gases.
    It is entirely possible, maybe even likely that human activities have caused/will cause climate changes. Some of these changes might be warming. Some of this might be from CO2. But for the love of God, can we acknowledge that these are not logical dominoes that lead to one conclusion.
    Undocumented assertions from liars who appear to be lacking expertise in statistics, physics, geology, computer science, meteorology or any discipline necessary to actually execute on the promise of this pretend discipline of “climate science” is no reason to introduce panicky changes that may not do anything useful, but will cause economic hardship while transferring wealth to the same derivative traders who made such a great accomplishment of credit default swaps, securitized junk mortgage pools and then took trillions in public funds as dessert.
    So forgive anyone who asks” Ummmm… can you show us your work?”

Comments are closed.