NOTE: Part 2 of this story has been posted: see The Smoking Code, part 2
The Proof Behind the CRU Climategate Debacle: Because Computers Do Lie When Humans Tell Them To
From Cube Antics, by Robert Greiner
I’m coming to you today as a scientist and engineer with an agnostic stand on global warming.
If you don’t know anything about “Climategate” (does anyone else hate that name?) Go ahead and read up on it before you check out this post, I’ll wait.
Back? Let’s get started.
First, let’s get this out of the way: Emails prove nothing. Sure, you can look like an unethical asshole who may have committed a felony using government funded money; but all email is, is talk, and talk is cheap.
Now, here is some actual proof that the CRU was deliberately tampering with their data. Unfortunately, for readability’s sake, this code was written in Interactive Data Language (IDL) and is a pain to go through.
NOTE: This is an actual snippet of code from the CRU contained in the source file: briffa_Sep98_d.pro
[sourcecode language=”text”]
;
; Apply a VERY ARTIFICAL correction for decline!!
;
yrloc=[1400,findgen(19)*5.+1904]
valadj=[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,-0.1,0.3,0.8,1.2,1.7,2.5,2.6,2.6,2.6,2.6,2.6]*0.75 ; fudge factor
if n_elements(yrloc) ne n_elements(valadj) then message,’Oooops!’
yearlyadj=interpol(valadj,yrloc,timey)
[/sourcecode]
Mouse over the upper right for source code viewing options – including pop-up window
What does this Mean? A review of the code line-by-line
Starting off Easy
Lines 1-3 are comments
Line 4
yrloc is a 20 element array containing:
1400 and 19 years between 1904 and 1994 in increments of 5 years…
yrloc = [1400, 1904, 1909, 1914, 1919, 1924, 1929, … , 1964, 1969, 1974, 1979, 1984, 1989, 1994]
findgen() creates a floating-point array of the specified dimension. Each element of the array is set to the value of its one-dimensional subscript
F = indgen(6) ;F[0] is 0.0, F[1] is 1.0….. F[6] is 6.0
Pretty straightforward, right?
Line 5
valadj, or, the “fudge factor” array as some arrogant programmer likes to call it is the foundation for the manipulated temperature readings. It contains twenty values of seemingly random numbers. We’ll get back to this later.
Line 6
Just a check to make sure that yrloc and valadj have the same number of attributes in them. This is important for line 8.
Line 8
This is where the magic happens. Remember that array we have of valid temperature readings? And, remember that random array of numbers we have from line two? Well, in line 4, those two arrays are interpolated together.
The interpol() function will take each element in both arrays and “guess” at the points in between them to create a smoothing effect on the data. This technique is often used when dealing with natural data points, just not quite in this manner.
The main thing to realize here, is, that the interpol() function will cause the valid temperature readings (yrloc) to skew towards the valadj values.
What the heck does all of this mean?
Well, I’m glad you asked. First, let’s plot the values in the valadj array.

Look familiar? This closely resembles the infamous hockey stick graph that Michael Mann came up with about a decade ago. By the way, did I mention Michael Mann is one of the “scientists” (and I use that word loosely) caught up in this scandal?
Here is Mann’s graph from 1999
As you can see, (potentially) valid temperature station readings were taken and skewed to fabricate the results the “scientists” at the CRU wanted to believe, not what actually occurred.
Where do we go from here?
It’s not as cut-and-try as one might think. First and foremost, this doesn’t necessarily prove anything about global warming as science. It just shows that all of the data that was the chief result of most of the environmental legislation created over the last decade was a farce.
This means that all of those billions of dollars we spent as a global community to combat global warming may have been for nothing.
If news station anchors and politicians were trained as engineers, they would be able to find real proof and not just speculate about the meaning of emails that only made it appear as if something illegal happened.
Conclusion
I tried to write this post in a manner that transcends politics. I really haven’t taken much of an interest in the whole global warming debate and don’t really have a strong opinion on the matter. However, being part of the Science Community (I have a degree in Physics) and having done scientific research myself makes me very worried when arrogant jerks who call themselves “scientists” work outside of ethics and ignore the truth to fit their pre-conceived notions of the world. That is not science, that is religion with math equations.
What do you think?
Now that you have the facts, you can come to your own conclusion!
Be sure to leave me a comment, it gets lonely in here sometimes.
hat tip to WUWT commenter “Disquisitive”
========================
NOTE: While there are some interesting points raised here, it is important to note a couple of caveats. First, the adjustment shown above is applied to the tree ring proxy data (proxy for temperature) not the actual instrumental temperature data. Second, we don’t know the use context of this code. It may be a test procedure of some sort, it may be something that was tried and then discarded, or it may be part of final production output. We simply don’t know. This is why a complete disclosure and open accounting is needed, so that the process can be fully traced and debugged. Hopefully, one of the official investigations will bring the complete collection of code out so that this can be fully examined in the complete context. – Anthony
Sponsored IT training links:
Join today for 646-985 exam prep and get a free newsletter for next 642-072 and 1z0-050 exams.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.


The emails have gotten most of the attention…for now. They are a lot smaller than the data sets, and most people can go through the emails and put together a time line. We still haven’t heard from the techies yes who undoubtedly have started breaking down the data sets and code. It will be interesting to see if they can replicate CRU’s outputs and then be able to explain what CRU did. But even if they can, that is only half of the story; the other half is what the output should have looked like if it hadn’t been tampered with. That will take some time.
I am not jumping on board as vehemently as some on this post. IDL is not a broadly used language. On the original site for this post someone claiming to be an IDL programmer raised some very interesting points that may lessen the accuracy of the charges made regarding this piece of code. I have been coming to this site for a year and am a solid skeptic of AGW but the rush to judgment on smoking guns can diminish the credibility of the those questioning the CRU.
In the past 20 years, how many BILLIONS of dollars wasted for this fraud and how many MILLIONS of lives could have been saved if the money would have been better spent?????
Just what do you think you’re doing HARRY_READ_ME?
Please check some of these comments above
THE CODE WAS COMMENTED OUT
IT WAS NOT USED at the point this image was taken
Check out this comment from the original post location
http://cubeantics.com/2009/12/the-proof-behind-the-cru-climategate-debacle-because-computers-do-lie-when-humans-tell-them-to/comment-page-1/#comment-664
TV TV TV. Anthony, we need you and other experts on TV TV TV….
nice idea
If you think that this commented section just means that they could have used it to fudge the data then you would also have to take into account all the code that briffa deleted and even the code that doesn’t exist.
This is beyond ludicous!
Has this code been used?
Or has it been preceded by a semicolon in the lines that follow the code cited here?
May be that this is off topic but check out Real Climate… Here is what Gavin had to say in his latest piece…
*************************************
Unusually, I’m in complete agreement with a recent headline on the Wall Street Journal op-ed page:
“The Climate Science Isn’t Settled”
*************************************
Gee.. I could have sworn that Gavin said it was…..
«That is not science, that is religion with math equations.»
This is, for me, the quote of the day! So true! 😀
While I agree with most of this analysis, it is NOT the temp data being adjusted it is the proxy data being merged with the temp data.
title=’Northern Hemisphere temperatures, MXD and corrected MXD’
title=’Northern Hemisphere temperatures and MXD reconstruction’
This code is from a briffa osborn reconstruction of temperature in a process I call hockeystickization. It’s a common practice in the black art of proxy temperatures. While it is absolutely disingenuous and IMO fraudulent it is not evidence of HadCRU temp data being manipulated. Michael Mann claimed no knowledge that any scientist had ever done such a thing as merge proxies directly with temp.
Kieth briffa referred to a different method of hockeysticization as Mikes trick to hide the decline.
Hey PWL Just follow the money and you’ll comprehend just fine…it’s called Natural Law…
PWL said: “I can’t comprehend their justification for this obvious blatant fraud. What was Mann thinking when he manNipulated this data in this manner?”
But what about the code or data that was supposedly deleted? If it is in fact destroyed that would be just as much a smoking gun in my eyes.
I’ll have to agree with Bill…the plot apparently does not use the “fudge factor.” But, for the IPCC reports stop using the Briffa plot around 1960 (somewhere around point 12 in the above graph). See:
http://wattsupwiththat.com/2009/11/30/playing-hide-and-seek-behind-the-trees/
This code does highlight the fact that the Briffa tree-ring data don’t match measured temperatures. Either the Actual temperatures are garbage or the tree-rings don’t make for a good temperature reconstruction.
All the discussion about whether this is “actual proof that the CRU was deliberately tampering with their data” is a distraction. Yes, the “fudge-factored” data may not have been used to produce any graphs. Yes, that array may not have even been used within this program. That’s all beside the point.
The Hockey Team could put this away for good by merely releasing their data and analysis, as they should have done in the first place. All they have to do is explain themselves. The fact that they won’t, and that we have to guess, is incriminating in its own right, and sufficient reason to reject their result. Mann, et.al. and his successors and colleagues have attempted to turn paleo-climate research on its head with their research, purporting to disprove the Roman optimum, Medieval warm period, etc. THE BURDEN OF PROOF IS ON THEM, NOT US. This isn’t a criminal prosecution; there isn’t any presumption the defendant is correct.
The fact that the Hockey Team can’t or won’t disprove the allegations is all we need to reject their results. Whether they committed any actual crimes is a question for another venue, with a different standard of evidence.
Isn’t this the “correction for decline” described in the Osborn, Briffa, Schweingruber, Jones (2004) paper cited in the file (Annually resolved patterns of summer temperature over the Northern Hemisphere since AD 1400) as one step in calibration of the proxy/temperature relationship?
“To overcome these problems, the decline is artificially removed from the calibrated tree-ring density series, for the purpose of making a final calibration. The removal is only temporary, because the final calibration is then applied to the unadjusted data set (i.e., without the decline artificially removed). Though this is rather an ad hoc approach, it does allow us to test the sensitivity of the calibration to time scale, and it also yields a reconstruction whose mean level is much less sensitive to the choice of calibration period.”
If so, it may be evidence of bad science, but not of outright fraud.
The reason you have a fudge factor like 0.75 is to allow you to scale the curve arbitrarily. You only have to change the fudge factor, rather than the entire array you see.
By the way, use of hard-coded “magic numbers” is considered very bad practice in software development. One of the first code-reviews I ever had made this very criticism of one of the modules I’d written :/.
If the CRU gave us the raw data and the algorithms they used to plot their graphs we could have a proper argument about this situation. They won’t because they can’t – they’ve “lost” the raw data.
They are as convincing as Joseph Smith saying he lost the original of the book of Mormon.
It has been said that this line was commented out and speculation drawn that the code could be explained as a legitimate attempt to debug/test other parts of their program.
As a coder (this claim is not conclusive – additionally, I am not familiar with the language, though the majority of them are largely similar IDL is by no means cryptic), that does NOT look like a debugging comment. The number fudging requires several other lines of code, as can be plainly seen.
The “Apply a VERY ARTIFICIAL…” comment reveals that the programmer’s aim is not to draw conclusions out of data, but to insert preconceptions into it.
This is NOT Mike’s Nature Trick – which made use of data being added to the 1960s and afterwards. From Jones’ email regarding that trick, it seems that program code itself was not altered to hide the decline. Instead, improper data was used. There may in fact be little to no evidence of fraud in the functional code itself (nor can the commented line be used to exonerate anyone in this scandal), but the comments further corroborate our growing common-sense suspicions – that a corrupt process was used to hide a decline in temperatures. There are at least two implements to hide the decline documented between Jones’ Nature-trick email and the code posted above.
Bill, you’re fighting a losing battle. Make the decision now to reanalyze your position and you will feel a lot better.
The following is probably how they eliminated the medieval warm period:
see: documents/osborn-tree6/summer_modes/pl_decline.pro
;
; Plots density ‘decline’ as a time series of the difference between
; temperature and density averaged over the region north of 50N,
; and an associated pattern in the difference field.
; The difference data set is computed using only boxes and years with
; both temperature and density in them – i.e., the grid changes in time.
; The pattern is computed by correlating and regressing the *filtered*
; time series against the unfiltered (or filtered) difference data set.
;
;*** MUST ALTER FUNCT_DECLINE.PRO TO MATCH THE COORDINATES OF THE
; START OF THE DECLINE *** ALTER THIS EVERY TIME YOU CHANGE ANYTHING ***
;
…
;
; Now apply a completely artificial adjustment for the decline
; (only where coefficient is positive!)
;
which is accompanied by a similar array, etc.
as well as:
;
; Now fit a 2nd degree polynomial to the decline series, and then extend
; it at a constant level back to 1400. In fact we compute its mean over
; 1856-1930 and use this as the constant level from 1400 to 1930. The
; polynomial is fitted over 1930-1994, forced to have the constant value
; in 1930.
It would be nice to add a graph which shows the valadj values along the Y axis and the years along the X axis. That image would make more obvious the scale and period affected. I comprehend what is being done, but most climate graphs show the years thus showing the years for this correction would help people compare the adjustment with the other graphs that they’ve seen. Yes, I know what starting at 1400 will do to the graph, and the programmer also knew what that would do.
How can anyone claim that this is a smoking gun without knowing definitively whether this code was used to produce the charts found in the literature and/or the IPCC reports?
I believe Gavin has already claimed that it was simply test code.
More sleuthing needs to be done.
I tend toward the skeptic side myself but as a programmer, I Just want to reiterate what some other people have already said here, namely that this is definitely NOT a smoking gun.
1) I can’t tell you how many snippets of code I have lying around in various directories on various computers where I work. Programmers write all kinds of things, sometimes just for their own curiosity or self-edification. There is no way of saying if this is ‘production’ code or just some little one-off that somebody was working on for some unknown reason. To me, it has the feel of the latter, but I couldn’t say for sure. The comments seem kind of suspicious but again, without knowing how it was used it by itself is pretty meaningless.
2) The author seems confused. He writes: “Remember that array we have of valid temperature readings?” Actually no. All I’ve seen is an array of years. Later he writes “…the valid temperature readings (yrloc)…”. Uh no, yrloc does not hold temperature readings, it holds year numbers as he just got done explaining to us a few paragraphs earlier.
3) It’s not clear to me what the interpol() functions does. What is the ‘timey’ parameter passed? Is THIS the temperature data in a yearly time series maybe? So maybe interpol() fills in those year gaps in (1400-1904, 1904-1909, 1909-1914, etc.) using the fudge factor numbers as some sort of weight? I don’t know, but the author’s explanation isn’t very clear at all.
Sorry, but to me this analysis looks out of context and ridiculous. I don’t think it’s possible to conclude anything based on that code. (I’m a computer programmer myself and I have some training in physics and math, and have done modeling in the past.)
I am a “climate skeptic” and a long time reader of WUWT and CA. Unfortunately, the quality of “climategate” discussions is starting to deteriorate and approach that of AGW propaganda at an alarming pace. If this trend contiunues then I’m afraid I’ll have to switch camps.
Still I’d like to thank Anthony for his hard work and for publishing quality materials.