Those of us who have looked at GISS and CRU code have been saying this for months. Now John Graham-Cumming has posted a statement with the UK Parliament about the quality and veracity of CRU code that has been posted, saying “they have not released everything”.

I found this line most interesting:
“I have never been a climate change skeptic and until the release of emails from UEA/CRU I had paid little attention to the science surrounding it.”
Here is his statement as can be seen at:
http://www.publications.parliament.uk/pa/cm200910/cmselect/cmsctech/memo/climatedata/uc5502.htm
=================================
Memorandum submitted by John Graham-Cumming (CRU 55)
I am writing at this late juncture regarding this matter because I have now seen that two separate pieces of written evidence to your committee mention me (without using my name) and I feel it is appropriate to provide you with some further information. I am a professional computer programmer who started programming almost 30 years ago. I have a BA in Mathematics and Computation from Oxford University and a DPhil in Computer Security also from Oxford. My entire career has been spent in computer software in the UK, US and France.
I am also a frequent blogger on science topics (my blog was recently named by The Times as one of its top 30 science blogs). Shortly after the release of emails from UEA/CRU I looked at them out of curiosity and found that there was a large amount of software along with the messages. Looking at the software itself I was surprised to see that it was of poor quality. This resulted in my appearance on BBC Newsnight criticizing the quality of the UEA/CRU code in early December 2009 (see http://news.bbc.co.uk/1/hi/programmes/newsnight/8395514.stm).
That appearance and subsequent errors I have found in both the data provided by the Met Office and the code used to process that data are referenced in two submissions. I had not previously planned to submit anything to your committee, as I felt that I had nothing relevant to say, but the two submissions which reference me warrant some clarification directly from me, the source.
I have never been a climate change skeptic and until the release of emails from UEA/CRU I had paid little attention to the science surrounding it.
In the written submission by Professor Hans von Storch and Dr. Myles R. Allen there are three paragraphs that concern me:
“3.1 An allegation aired on BBC’s “Newsnight” that software used in the production of this dataset was unreliable. It emerged on investigation that the neither of the two pieces of software produced in support of this allegation was anything to do with the HadCRUT instrumental temperature record. Newsnight have declined to answer the question of whether they were aware of this at the time their allegations were made.
3.2 A problem identified by an amateur computer analyst with estimates of average climate (not climate trends) affecting less than 1% of the HadCRUT data, mostly in Australasia, and some station identifiers being incorrect. These, it appears, were genuine issues with some of the input data (not analysis software) of HadCRUT which have been acknowledged by the Met Office and corrected. They do not affect trends estimated from the data, and hence have no bearing on conclusions regarding the detection and attribution of external influence on climate.
4. It is possible, of course, that further scrutiny will reveal more serious problems, but given the intensity of the scrutiny to date, we do not think this is particularly likely. The close correspondence between the HadCRUT data and the other two internationally recognised surface temperature datasets suggests that key conclusions, such as the unequivocal warming over the past century, are not sensitive to the analysis procedure.”
I am the ‘computer analyst’ mentioned in 3.2 who found the errors mentioned. I am also the person mentioned in 3.1 who looked at the code on Newsnight.
In paragraph 4 the authors write “It is possible, of course, that further scrutiny will reveal more serious problems, but given the intensity of the scrutiny to date, we do not think this is particularly likely.” This has turned out to be incorrect. On February 7, 2010 I emailed the Met Office to tell them that I believed that I had found a wide ranging problem in the data (and by extension the code used to generate the data) concerning error estimates surrounding the global warming trend. On February 24, 2010 the Met Office confirmed via their press office to Newsnight that I had found a genuine problem with the generation of ‘station errors’ (part of the global warming error estimate).
In the written submission by Sir Edward Acton there are two paragraphs that concern the things I have looked at:
“3.4.7 CRU has been accused of the effective, if not deliberate, falsification of findings through deployment of “substandard” computer programs and documentation. But the criticized computer programs were not used to produce CRUTEM3 data, nor were they written for third-party users. They were written for/by researchers who understand their limitations and who inspect intermediate results to identify and solve errors.
3.4.8 The different computer program used to produce the CRUTEM3 dataset has now been released by the MOHC with the support of CRU.”
My points:
1. Although the code I criticized on Newsnight was not the CRUTEM3 code the fact that the other code written at CRU was of low standard is relevant. My point on Newsnight was that it appeared that the organization writing the code did not adhere to standards one might find in professional software engineering. The code had easily identified bugs, no visible test mechanism, was not apparently under version control and was poorly documented. It would not be surprising to find that other code written at the same organization was of similar quality. And given that I subsequently found a bug in the actual CRUTEM3 code only reinforces my opinion.
2. I would urge the committee to look into whether statement 3.4.8 is accurate. The Met Office has released code for calculating CRUTEM3 but they have not released everything (for example, they have not released the code for ‘station errors’ in which I identified a wide-ranging bug, or the code for generating the error range based on the station coverage), and when they released the code they did not indicate that it was the program normally used for CRUTEM3 (as implied by 3.4.8) but stated “[the code] takes the station data files and makes gridded fields in the same way as used in CRUTEM3.” Whether
3.4.8 is accurate or not probably rests on the interpretation of “in the same way as”. My reading is that this implies that the released code is not the actual code used for CRUTEM3. It would be worrying to discover that 3.4.8 is inaccurate, but I believe it should be clarified.
I rest at your disposition for further information, or to appear personally if necessary.
John Graham-Cumming
March 2010
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
Ralph Woods (09:04:11) : “Since Climate Science (one of the youngest branches of science) is all about projecting the future trends for the Earth’s Climate- computer modeling is their primary tool.”
And all this time I thought it was Al Gore.
CodeTech (08:42:51) :
“So… to be a “climatologist” one requires knowledge of:
1. Computer programming
2. Chemistry
3. Physics
4. Statistics
However, failure to have knowledge of or skills in some or any of these disciplines is easily overlooked if you have a “green” attitude and mindset, and are capable of writing suitably alarming predictions. Also the ability to procure grants and scare children is a genuine asset.”
Sounds like they need to have an inter-faith meeting.
I think your point proves quite well that the possibility of of one becoming a Climatologist in any real world sense is slim to none. Climatology should be stricken and fall under the umbrella of theoretical physics. An open and Interdisciplinary approach makes the most sense . With a widened pool of participants it may be possible to achieve greater transparency in the process so long as it is not stacked with eco-ideologues.
PS: He has scored a British version of “the perfect squelch.”
The climate scientists just don’t get why they should follow standard software development practices. Does anyone remember the “On Replication” post at RC from about a year ago where Gavin made this telling statement at comment 89:
“My working directories are always a mess – full of dead ends, things that turned out to be irrelevant or that never made it into the paper, or are part of further ongoing projects. Some elements (such a one line unix processing) aren’t written down anywhere. Extracting exactly the part that corresponds to a single paper and documenting it so that it is clear what your conventions are (often unstated) is non-trivial. – gavin]”
http://www.realclimate.org/index.php/archives/2009/02/on-replication/
So how is anyone else (or even Gavin) supposed to replicate the work? We’re supposed to “trust” whatever it is their paper said and the maybe somewhere on their computer is the code and data used to produce the result?
I was already leaning to the skeptic side, but this discussion at RC is what pushed me off the fence.
What’s even more disturbing is that using documentation, source control, etc. would actually make their lives easier in the long run.
Fundamentally, if you’re going to do it right, you should start with a set of goals. So, for instance, if we collectively (since most of the commenters on this board are better able to handle this than, say, the Phil Joneses of the world) were going to set out the design requirements for a system for storing temperature data, we’d need to work through reliability and availability, security and auditability, relationships between datasets and individual data points and their annotations and ‘confidence factor’ ranges, etc. Changes are always made to new instances, the relationships to the previous instance and to any outside sources are logged, all changes are annotated with the adjustment formulae and algorithms, any such ‘adjusted’ instances must be released for public scrutiny before any other references can be made to the numbers.
It’s not a difficult problem but it requires a different mindset from the ‘this is my sandbox, go away’ climate-science attitude.
George Ellis:
I would write that as:
x = 0x05555;
x = 0x0aaaa;
x = 0;
… bonus points if you know why… 😉
Milwaukee Bob (11:24:34) :
I’ll agree that the focus of our attention should be on the data, not the code that mangled it. My take from HARRY_READ_ME is that he found the programs were stomping data, overwriting instead of making new versions, not what it was documented to be doing.
The buggy mis-documented code can go straight to the trash can for all that it’s worth. What we should be concerned with is accounting for is the missing station data, and I firmly believe there is much to be had.
I wrote the software for analyzing data obtained from the BATSE experiment (on the COMPTEL Gamma Ray Observatory) in the late 80’s.
Basically I wrote a program which would allow scientists to run their own models within it (for you techie-types, this was done under VMS using Fortran).
The scientists working on the project had no problem sharing data and models with each other. However, they got miffed at a certain University (which I’m leaving unnamed), because some of their programmers hacked into our computers and stole some of the data – which allowed their researchers to publish first.
The scientists I was working with were unhappy, because they felt, since they had legitimately been involved in the project from the beginning, they should have been given 6 months exclusive use of the data.
They felt that would give them sufficient time to publish their hypotheses, at which time, they would release the data too.
Of course, these were mere astrophysicists and theoretical physicists. What would they know about standard scientific practice?
Re: max (Mar 4 10:30),
Mike @9:27,
depends on what physics (or other) you are doing. When it is practical to reproduce the experiment it is expected that people wishing to reproduce the work will do so and it serves as good check against experimental error, but information about how the experiment was conducted must be provided so that the experiment can be replicated.
True.
When it is impractical to reproduce the experiment (a need to book the CERN supercollider to run the experiment, for example) it is expected that the raw data of the experiment it is based upon will be made available to those wishing to attempt reproduce your results.
No. The raw data are not shared with all and sundry in the accelerator experiments. The groups have rights of publication. Once the data is archived, it is open for sharing, after the experiment is closed, and still there are caveats.
Replication is done by having more than one experiment at a time. In the LHC ATLAS and CMS are competing experiments studying the same physics independently.
One reason is proprietary. It takes ten years of preparation by hundreds of people to set up the experiment and take the data. You would not find people willing to do that if the first theorist who came with a FOI request got his/her hands on the data before publication by the group.
The second is the complexity. Each experiment develops its own computer codes ( not well documented) corrections etc that an outsider would have to spend years to do all over again, given the raw data. That is why at least two experiments are necessary.
When the “experiment” starts with large and multiple data sets and consists of drawing conclusions from the results of manipulating some of those data sets (climate science), it is incumbent upon the “experimenter” to identify the data sets used and the manipulations to them so that the “experiment” can be replicated.
In disciplines where the data is unique, yes. One cannot go back and remeasure the temperatures. On the other hand one could argue that all one needs is the temperature data and metadata to create completely independent code to study whatever. In principle with the different groups handling the climate data one should have been safe, except there was too much inbreeding and not independent replication. They were being orchestrated.
JG-C: “For a somewhat technical post on this including the answer to my question given by Professor Jones and Professor Slingo see: http://www.jgc.org/blog/2010/02/something-bit-confusing-from-ueacru.html”
Thanks for the link. As we post here, Harry II through Harry XII, somewhere deep in the bowels of UEA, are frantically trying to create back-dated code that will crank out pseudoCRUTEM3 results from the abysmal database mentioned by Harry I. This is like trying to spin offal into gold. In Fortran, no less. Imagine the suspense, the hideous shrieks from the dungeon, the cabin fever…. Life must be Hell at UEA.
“A miracle has happened.”
In some cases yes. In some chemical processes you need to use special glassware that does not contain boron for example.
The direct analogy in this case would be if the researcher says they used a well known commercial software package version xyz release #2, build abc345, then you could obtain the exact same build and release and implement his computation algorithm and see if you get comparable results.
If he says, I just slapped together a statistical processing routine that does such and such — well then I would want to see the exact code he used if I got different results when I used well known statistical routine to do the same stated processing.
If asked they should at least be able to point the duplicating researcher to a source for the source code, like “implemented a binary search as described in ref 3 page 26, code block 85”.
If you then use the same binary search function he references and it does not work you at least have a starting point to figure out what is wrong. At a minimum, the code referenced is wrong and you both properly implemented the defective code. The referenced code is good, but one or both of you improperly implemented it.
It is entirely likely they one of you has a working block of code with a typo in it that compiles successfully and on trivial problems works fine but at some extreme condition it blows up.
When you get to that point, there is no other option that to do a line by line verification of the actual code used, as no other method will point out where the difference in behavior occurs that makes the original researchers results mis-match with the duplicating researcher.
In the case of the chemistry example you used, in many cases common processes are so trivial that any commercial glassware will serve to perform the test, but in some cases a particular brand of glassware might contaminate the process. Likewise something as obscure as how the glassware was cleaned and dried might influence the outcome. In those cases you get down to verifying very trivial details of the process until you find the clinker that breaks the process.
With undocumented ad-hoc code there is literally no way to know if the code does what the researcher says it does in all cases. Where with well documented professional class code, you at least have a reasonable expectation that like the glassware, your experimental setup is sufficiently similar to the researchers experimental setup that it is highly unlikely that the small differences between them will adversely effect the outcome of the experiment.
Larry
Larry
Re: CO2 Realist (Mar 4 12:39),
So how is anyone else (or even Gavin) supposed to replicate the work? We’re supposed to “trust” whatever it is their paper said and the maybe somewhere on their computer is the code and data used to produce the result?
Think of it as an exam. The professor gives a problem. Each student solves it by his/her method using the information provided with the problem. The results have to agree, not the method of solving the problem.
In this sense, given the data and meta data, another researcher should be able to say if the temperature is increasing or not. It is the data that is important. The method becomes important when there is disagreement in the results. In this case a third ( or fourth etc)independent analysis would clarify the issue as well as finding the error in the method of one of the two original who disagreed, and would be less of a hassle. It is the “independent” that is important. By subverting the peer review process independent analysis was lost.
My point on Newsnight was that it appeared that the organization writing the code did not adhere to standards one might find in professional software engineering. The code had easily identified bugs, no visible test mechanism, was not apparently under version control and was poorly documented. It would not be surprising to find that other code written at the same organization was of similar quality. And given that I subsequently found a bug in the actual CRUTEM3 code only reinforces my opinion.
If you swapped CRUTEM3 out and GIStemp in, this is exactly how I would evaluate GIStemp (or, rather, have evaluated it). I found one bug in the USHCN F to C conversion that is a compiler dependent order of execution that can warm 1/10 of the records by 1/10 C. Don’t know if it was fixed in the most recent release. There is no SCCS used ( source code sits in the same directory where scratch files are written…) and there is no visible test mechanism.
It seems to be a systemic style failure among climate ‘researchers’…
Thank you John for shining a bit more on the truth behind the cargo-cut cience practised by the CRU. Please keep at it and keep us in the loop here at WUWT as thing progress.
This is not unexpected. I think it was about 18mth ago that I wrote on CA that, although I had not seen all the code, the information leaking out to SteveMc at that time indicated a complete lack of industry level software engineering principles in the climate science arena an in particular at NOAA/NASA.
Why did I say it. I was a software engineering trainer in a large, internationally recognised, research company and a senior project manager of system, data, people projects. Nothing I read or saw indicated a formal approach to software development in climate science.
Milwaukee Bob (11:24:34) : … You have AVERAGED, partial (weather!) data, haphazardly recorded, GENERALLY ONCE A DAY, at a miniscule number of points that varied over time within an ANALOG, CUBIC system (global ocean/atmosphere), with no controlled standardization of either equipment or placement, with said equipment never having been design for the intended purpose and virtually no understanding of the system as whole to start with,
BINGO! You got it. It’s a pile of Mulligan Stew and we’re supposed to swallow…
davidmhoffer (12:00:00) : Any IT shop should be able to reproduce the exact state of the data on any given day. In this context, “data” means EVERYTHING. Datasets, code bases, e-mails, everything that existed on THAT day. … The only thing that should escape these best practices is a file etc that was created and destroyed on the same day. Everything else should be there.
With modern journalling file systems, you can even capture moment to moment changes. The Network Appliance has a kind of version control built in to it. This is a 10 minute kind of thing to set up.
FWIW, the standard when I was making commercial software for release was a “golden master” day. We went through “soft freeze” then “hard freeze” where it was almost impossible to put more changes in (only lethal bugs fixed) and finally, after QA suite passed it, we made the “Golden Master”. Those were archived forever and they were the only thing used for production.
Compare that with GIStemp where each time it is run, it recompiles the FORTRAN (using whatever compiler is in the given users environment variable…) and then deletes the binaries at the end. It leaves the source code in the scratch file directory and it encourages you to hand edit things if you want different behaviours. Further, the data changes moment to moment over the month and there is no “release date” on the file. You just hop onto the NCDC web site and down load an ‘image du jour’. And it is not just adding new records, it is changing old records from the past up to several months, and perhaps years ago.
http://chiefio.wordpress.com/2010/02/15/thermometer-zombie-walk/
This instability was one of the first things I fixed in the version of GIStemp I got running. I added a source code directory and trivial “make” file.
http://chiefio.wordpress.com/2009/07/29/gistemp-a-cleaner-approach/
So it is literally the case that each time any user runs GIStemp, they are running a slightly different product and may be doing it on somewhat different data…
That is just sooo broken a design…
The whole thing is just so, so, “ersatz”.
Dr Graham-Cumming
Your footnote in history may be small, but is somewhat akin to Fleming noticing that his hugely old bacterial plates, which in a properly run lab would not still exist, show ‘interesting features’……..
mpaul (09:48:33) :
“Professor Slingo: … We test the code twice a day every day. We also share our code with the academic sector, so the model that we use for our climate prediction work and our weather forecasts, the unified model, is given out to academic institutions around the UK, and increasingly we licence it to several international met services
Nail on head comes to mind. She has absolutely no idea what software testing means. She thinks it means running the program and looking at the result(s) and if it looks alright then it passes the test.
It is important that everyone go back to square one and checks the stations that are used one by one.
Warwick Hughs has the books.
http://www.warwickhughes.com/blog/?p=510
“These books are witness to the processes operating at the birth of what we now know as IPCC AGW. Information contained in TR022 and TR027 will assist people who are curious to uncover what Jones et al have done with temperature data from their village, town, city, region, state or nation.”
I have checked just three stations in Australia so far and there appears to be heaps of “value adding”and little consistancy from Jones 1990 to Jones 1999.
Here is Halls Creek for jones 1991
http://members.westnet.com.au/rippersc/hcjones1990.jpg
I notice He picked the year1899 only out of the old station record which as I have said before is 12 kims and 63 metres downhill from the current one.
Here is the 1999 version
http://members.westnet.com.au/rippersc/hcjones1999.jpg
There has been some adjustment in the 1950’s & 1960’s.
I did a rough calculation and conservatively I reckon that the previous Halls creek temps get extrapolated over more than 1M sqr kilometers or around 14% of Australia’s land area.
Here is Kalgoorlie
http://members.westnet.com.au/rippersc/kaljones1990.jpg
Note that the Jones interpretation is a combination of two stations after 1941.
He also used Southern Cross (~200km away for 1895-1899)leaving a 42 year gap.
It would have been simpler to just use the entire Southern Cross record IMHO.
http://members.westnet.com.au/rippersc/southernX.jpg
Here is the Jones 1999 version for Kalgoorlie
http://members.westnet.com.au/rippersc/kaljones1999sg.jpg
http://members.westnet.com.au/rippersc/kaljones1999line.jpg
Huge difference to the 1990 version ,
Here is the Jones figures for Southern Cross
http://members.westnet.com.au/rippersc/scjones1999.jpg
I notice that global warming from 1990 to 1998 has caused Southern Cross to get 1.4 degrees C colder in the 1890’s years that were used in 1990 .
Re: ShrNfr (08:44:02) :
Oh my, such a serious misspelling! Surely you meant to type “clusterfrag” instead.
Moderators! Yoo-hoo! Can you help this nice person out and make that change for them? Thanks in advance!
(First time posting here). Does anyone look at ClimtePrediction.net as a basis for how the models work? I had a quick look and it appears they run thousands of simulations with different paramaters to ‘recreate the past’. And then pick the most accurate to simulate the future. Therefore all future scenarios that you might expect to happen, will happen in the simulations.
My questions is, how do UEA/NASA tackle the problem and is it much different?
“amateur computer analyst”
Nobody uses that term anymore. I’m surprised they didn’t say ‘in the data processing centre’ or some other such 1980’s term.
I guess if he was a professional he would be in academia and writing papers instead of, you know, actually delivering real solutions to companies who pay for them.
Funny you mention Y2K. I’m on a project that found a Y2K bug a couple of weeks ago. Fortunately the buggy code wasn’t invoked in that routine until we tried it but still. . ..Only I would be hit with Y2K a decade later!
I work for a health insurance company, read code all the time (primarily COBOL and APS) and I promise that if OUR code looked like CRU’s, we’d have been closed down years ago. And our code doesn’t atttempt to justify the slavery of the entire free world.
rbateman (12:57:35) :
Yes, but don’t forget if you do not know or can’t see the big picture (forest for trees and all that) AND you do not know how it all works together, you probably don’t have enough of the RIGHT data to begin with.
For any kind of analysis model, the design of the model to be correct has to first be “complete”. Yes, complete is a relative term, but in global atmosphere modeling there is no amount of relativism you could possibly apply to claim we’re close enough.
THEN, there is that damn data….. ☺
mpaul (09:48:33)
“Tested twice a day”
Is that when they run it to feed us the output?