NOTE: Part 2 of this story has been posted: see The Smoking Code, part 2
The Proof Behind the CRU Climategate Debacle: Because Computers Do Lie When Humans Tell Them To
From Cube Antics, by Robert Greiner
I’m coming to you today as a scientist and engineer with an agnostic stand on global warming.
If you don’t know anything about “Climategate” (does anyone else hate that name?) Go ahead and read up on it before you check out this post, I’ll wait.
Back? Let’s get started.
First, let’s get this out of the way: Emails prove nothing. Sure, you can look like an unethical asshole who may have committed a felony using government funded money; but all email is, is talk, and talk is cheap.
Now, here is some actual proof that the CRU was deliberately tampering with their data. Unfortunately, for readability’s sake, this code was written in Interactive Data Language (IDL) and is a pain to go through.
NOTE: This is an actual snippet of code from the CRU contained in the source file: briffa_Sep98_d.pro
[sourcecode language=”text”]
;
; Apply a VERY ARTIFICAL correction for decline!!
;
yrloc=[1400,findgen(19)*5.+1904]
valadj=[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,-0.1,0.3,0.8,1.2,1.7,2.5,2.6,2.6,2.6,2.6,2.6]*0.75 ; fudge factor
if n_elements(yrloc) ne n_elements(valadj) then message,’Oooops!’
yearlyadj=interpol(valadj,yrloc,timey)
[/sourcecode]
Mouse over the upper right for source code viewing options – including pop-up window
What does this Mean? A review of the code line-by-line
Starting off Easy
Lines 1-3 are comments
Line 4
yrloc is a 20 element array containing:
1400 and 19 years between 1904 and 1994 in increments of 5 years…
yrloc = [1400, 1904, 1909, 1914, 1919, 1924, 1929, … , 1964, 1969, 1974, 1979, 1984, 1989, 1994]
findgen() creates a floating-point array of the specified dimension. Each element of the array is set to the value of its one-dimensional subscript
F = indgen(6) ;F[0] is 0.0, F[1] is 1.0….. F[6] is 6.0
Pretty straightforward, right?
Line 5
valadj, or, the “fudge factor” array as some arrogant programmer likes to call it is the foundation for the manipulated temperature readings. It contains twenty values of seemingly random numbers. We’ll get back to this later.
Line 6
Just a check to make sure that yrloc and valadj have the same number of attributes in them. This is important for line 8.
Line 8
This is where the magic happens. Remember that array we have of valid temperature readings? And, remember that random array of numbers we have from line two? Well, in line 4, those two arrays are interpolated together.
The interpol() function will take each element in both arrays and “guess” at the points in between them to create a smoothing effect on the data. This technique is often used when dealing with natural data points, just not quite in this manner.
The main thing to realize here, is, that the interpol() function will cause the valid temperature readings (yrloc) to skew towards the valadj values.
What the heck does all of this mean?
Well, I’m glad you asked. First, let’s plot the values in the valadj array.

Look familiar? This closely resembles the infamous hockey stick graph that Michael Mann came up with about a decade ago. By the way, did I mention Michael Mann is one of the “scientists” (and I use that word loosely) caught up in this scandal?
Here is Mann’s graph from 1999
As you can see, (potentially) valid temperature station readings were taken and skewed to fabricate the results the “scientists” at the CRU wanted to believe, not what actually occurred.
Where do we go from here?
It’s not as cut-and-try as one might think. First and foremost, this doesn’t necessarily prove anything about global warming as science. It just shows that all of the data that was the chief result of most of the environmental legislation created over the last decade was a farce.
This means that all of those billions of dollars we spent as a global community to combat global warming may have been for nothing.
If news station anchors and politicians were trained as engineers, they would be able to find real proof and not just speculate about the meaning of emails that only made it appear as if something illegal happened.
Conclusion
I tried to write this post in a manner that transcends politics. I really haven’t taken much of an interest in the whole global warming debate and don’t really have a strong opinion on the matter. However, being part of the Science Community (I have a degree in Physics) and having done scientific research myself makes me very worried when arrogant jerks who call themselves “scientists” work outside of ethics and ignore the truth to fit their pre-conceived notions of the world. That is not science, that is religion with math equations.
What do you think?
Now that you have the facts, you can come to your own conclusion!
Be sure to leave me a comment, it gets lonely in here sometimes.
hat tip to WUWT commenter “Disquisitive”
========================
NOTE: While there are some interesting points raised here, it is important to note a couple of caveats. First, the adjustment shown above is applied to the tree ring proxy data (proxy for temperature) not the actual instrumental temperature data. Second, we don’t know the use context of this code. It may be a test procedure of some sort, it may be something that was tried and then discarded, or it may be part of final production output. We simply don’t know. This is why a complete disclosure and open accounting is needed, so that the process can be fully traced and debugged. Hopefully, one of the official investigations will bring the complete collection of code out so that this can be fully examined in the complete context. – Anthony
Sponsored IT training links:
Join today for 646-985 exam prep and get a free newsletter for next 642-072 and 1z0-050 exams.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.


As someone who has programmed for a living for the past 15 years may I just point out that you write code to a spec – find who specced the application – there is the clue as to who is guilty of fudging the figures.
anyone have a link to the page that has all the emails available to read? thanks so much!
So far the code reviewers have demonstrated that proxy values have been blended with instrumental temperatures contrary to what Michael Mann has publicly stated is NEVER done by real scientists.
Beyond the the frustration of exhibited in the Harry Read Me file, we don’t as yet have any smoking guns with respect to similar manipulation of the temperature record.
It is quite clear from comparison of previous versions of the global temperature record that CRU, NASA, and others have been reducing past temperature and raising modern ones to accentuate the warming, but so far we have not seen this code or the methods used. This, if it exists, will be the goods that can shut down these clowns.
My advice to the code reviewers is to keep digging.
Jeff Id (07:17:23) :
“Kieth briffa referred…”
I had a college roommate named Keith who got exasperated with me everytime I couldn’t remember whether his name had an “ie” or “ei”. I still need to stop and think about it.
This code fragment was mentioned on page 8 of Monckton’s fire and brimstone paper that you linked to earlier in the week.
>> Billy (07:36:27) :
3) It’s not clear to me what the interpol() functions does. What is the ‘timey’ parameter passed? Is THIS the temperature data in a yearly time series maybe? So maybe interpol() fills in those year gaps in (1400-1904, 1904-1909, 1909-1914, etc.) using the fudge factor numbers as some sort of weight? I don’t know, but the author’s explanation isn’t very clear at all. <<
It seems obvious to me that interpol is an interpolation subroutine, probably just a linear interpolation. It takes the 'yrloc' array and the 'valadj' array and creates a value adjustment for a given year (timey) that fits between the 'valadj' values for the 'yrloc' boundary years.
Anthony – please pay attention to the comments above. Leaving this post up with no caveat damages the credibility of your blog. Many will be passing by here without the time/interest to read the comments in detail. Unless this can be shown to produce production output to call it “smoking code” is disingenuous.
But, but, but . . . commenting out lines of code is just for simplicities sake. On when you need/want it, off when you don’t. Commented out or not, what is it doing in there at all?
The gun is smoking alright.
This is one of the most important finds.
In some ways isn’t this is all moot?…. Jones has already admitted that the original data is gone. Therefore this whole database is meaningless, which makes that HadCRUT temperature history corrupt and meaningless…. Am I right in saying that? Is Jones’s goose pretty much cooked already, no matter what?
@Bill, I responded to you on my site.
As I said before, this proves that the CRU data can’t be trusted. Let’s get the results re-run and go from there.
@Anthony,
Thanks for adding the caveat. I didn’t intend to take the point of disproving anything about global warming. I just wanted to show that there is enough proof in the CRU source code to warrant an investigation. Not some off-the-wall email that was sent 5 years ago.
g hall (08:57:10) :
J. Bob (08:56:09) :
John Galt (08:43:01)
JJ (08:39:45) :
david (08:14:07) etc. etc.
The smoking gun is commented out (see the “;” = comment). Why cannot you see this?? It is obvious.
The code is written by scientists for a 1off use. Why would you clean it up and make it presentable. They are not going to sell it to others.
I’m a Cambridge compsci, and I think there are a couple of bits missing from this explanation. The thrust of the analysis is roughly right, but I would advise being cautious unless we can show whether and where this code was used.
So, here is the complete analysis, from me 🙂
Firstly yrloc does not contain “temperature readings”… it is literally just a list of years. You are quite right though, the years are:
[1400, 1904, 1909, …. 1984, 1989, 1994]
These are the “x-values” of a function. Note that they form an irregular grid, with the first value being in the late medieval era, and all others in the C20th.
Then, they create the “y-values” of this function .. one for each “x-value”:
[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,-0.1,0.3,0.8,1.2,1.7,2.5,2.6,2.6,2.6,2.6,2.6]*0.75
Note, the *0.75 on the end just multiplies every value in the list (it’s like this so they can all be adjusted in a single step). So -0.1 above becomes -0.075 etc. Then, the final mapping of inputs to outputs looks like this:
year y-value
1400 0
1904 0
1909 0
1914 0
1919 0
1924 -0.075
1929 -0.1875
1934 -0.225
1939 0
1944 -0.075
1949 0.225
1954 0.6
1959 0.9
1964 1.275
1969 1.875
1974 1.95
1979 1.95
1984 1.95
1989 1.95
1994 1.95
Now then. This is fine, but it’s irregular. There is no value for 1452 for example, and no value for 1970… The IDL interpolate function is documented here (thanks Nasa 😉 :
http://idlastro.gsfc.nasa.gov/idl_html_help/INTERPOL.html
It takes another list of years, called “timey”, which we don’t have here, and linearly interpolates the equivalent y-value for each of them. So if the year in “timey” was 1960, the resulting y-value would be interpolated between the given values for 1959 and 1964.
In all likelihood, “timey” is a list of every year from 1400 to present, and so this is just a way to expand the “valadj” array to cover all the years in the range.
It definitely looks suspicious, in as much as that someone at somepoint has played with arbitrary adjustments to the C20th. However, it is *not* a smoking gun unless we can show (a) that they used it, (b) where they used it, and (c) how they used it.
Hope that helps! Send me more yummy code to digest 😉
“Am I right in saying that? Is Jones’s goose pretty much cooked already, no matter what?”
No, unfortunately. I fail to see an easy way out of this very tangled mess for anyone involved, and they’ll try to salvage every last bit of their pride they can. This is, I believe, perhaps the biggest conspiracy in history, and everyone’s going to do whatever they can to detach their name from it. Their is simply too much pride involved for people and institutions and governments to admit they were wrong openly and frankly.
Array should have been:
[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,-0.1,0.3,0.8,1.2,1.7,2.5,2.6,2.6,2.6,2.6,2.6]*0.75
Here’s an interesting file – jones-foiathoughts.doc – entire text below. Take note of the last line. How professional!
———————–
Options appear to be:
1. Send them the data
2. Send them a subset removing station data from some of the countries who made us pay in the normals papers of Hulme et al. (1990s) and also any number that David can remember. This should also omit some other countries like (Australia, NZ, Canada, Antarctica). Also could extract some of the sources that Anders added in (31-38 source codes in J&M 2003). Also should remove many of the early stations that we coded up in the 1980s.
3. Send them the raw data as is, by reconstructing it from GHCN. How could this be done? Replace all stations where the WMO ID agrees with what is in GHCN. This would be the raw data, but it would annoy them.
Getting trimmed for some reason.
[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,-0.1,0.3,
0.8,1.2,1.7,2.5,2.6,2.6,2.6,2.6,2.6]*0.75
As a person very experienced in IDL programming, I would like to clarify a little, and summarize.
1) Think of the data in the variables yrloc and valadj as a series of (x,y) points on a graph. In this case, the 19 points are (1400, 0.0), (1904, 0.0), … (1994, 2.6*0.75). What the “interpol” function does is to draw straight lines between these points on a graph, and then pick off values for a different series of x values (the years in the variable “timey”). This is done because the dataset you want to add this function to has a different set of x values than the “fudge” function has. The whole point is to get the “fudge” values for the same set of years as some other data set, so you can add the “fudge” to that other data (called “yyy” here). Saying that this processing is making a “guess” might mislead some readers – better to call it an “estimate”, an “interpolation”, or a “resampling”.
The author of this article really should have included an “x vs y” plot, instead of what he has, to better show what’s really going on. It actually is even more of a “hockey stick”. Anybody with Excel could make such a chart pretty easily by typing the yrloc values in one column, the valadj in another (remember to include the “*0.75”), and then making an x vs y plot from those two columns.
BTW, the “fudge” is more appropriately called a “term” than a “factor” because it is added in, not multiplied.
2) As far as the ethical considerations, this can only be called “suspicious”. From the comments, and the variable name, it’s clear that the author of the code does not like being ordered to add this “fudge” in there AT ALL. Also, the “*0.75” (which multiplies all the numbers within the square brackets by 0.75) suggests some trial-and-error tinkering even with the “fudge” – as do the several leading zeros at the beginning of valadj (which have no effect). Without knowing where the “fudge” numbers come from, one can’t say for sure how legit it is or not, but IF the author of the code really understood what was going on, their commentary makes it likely that something unsavory was going on.
We also don’t know how this code was used. If the results never saw the light of day, then it’s no big deal – just somebody being unsuccessful in trying to pull something (maybe for internal use).
3) The insider explanation offer by David Schure (06:30:57) is such an obvious exercise in blowing smoke You-Know-Where that it is really very insulting. It would make me even more suspicious, but I know that guys like that spew BS as a reflex, whether there’s something to hide or not.
4) As to the point about the code that uses the result being commented out, that’s a total red herring. The semi-colons can be added or deleted in a few seconds (I do this all the time). The fact that the code was written in the first place tells you that somebody used it at one time. The fact that it was commented out only tells you that the extra “fudged” curve on the plot wasn’t included the LAST time the code was used. The fact that the commented-out lines were not simply deleted tells you that the author thinks they might need to include them back in again later, or that the author wanted to be able to refer back to see how a prior result had been obtained. People who say “it means nothing” because it’s commented out are “conveniently” ignoring the obvious question as to why it would have been written in the first place.
4) In summary, we know that fudged data was included on a plot over the strong objection of the programmer, but that it was subsequently (if only temporarily) removed. How bad this is depends on where the “fudge” numbers came from, and how the output might have been used. It’s very suspicious, but we need to know more to really prove something.
I don’t see that this code snippet or the rest of the code from CRU proves malpractice.
It reinforces the view that they have been up to no good for years.
It suggests that their code development and data management practices are a mess and well below what is required for such important work.
It gives reason to doubt the validity of their output.
If it proves anything, it’s that there’s an absolute need for their data and methods to be brought into the open for review and for them to explain themselves. If they can’t or won’t, they can’t complain if the world judges them on the basis of the leaked material. The clock is ticking.
I go along with the view that the title, “The Smoking Code” and the article presented with no qualifying statement, are somewhat precipitate; not as measured as WUWT usually is.
Brent (09:17:09) :
But, but, but . . . commenting out lines of code is just for simplicities sake. On when you need/want it, off when you don’t. Commented out or not, what is it doing in there at all?
The gun is smoking alright.
I hope you’re kidding!!!!!!!!!!!
Why not castigate briffa for all the fudge factors that he or someone else MAY have written but didn’t!!!!!!!!!!!!!!!!!!!!!!
As someone else said – why bother with the code why not just draw the line you want?
>> Tom_R (09:15:52) :
“It seems obvious to me that interpol is an interpolation subroutine, probably just a linear interpolation. It takes the ‘yrloc’ array and the ‘valadj’ array and creates a value adjustment for a given year (timey) that fits between the ‘valadj’ values for the ‘yrloc’ boundary years.”
Agreed. Over at the author’s blog he posted a link to the interpol() function documentation. After reading that and thinking about it some more I posted over there with essentially the same explanation you give above. I’m guessing timey is just an array containing the values [1400, 1401, 1402, … 1992, 1993, 1994].
Stay calm people. In my opinion this code doesn’t quite rise to the level of a “smoking gun”. I would liken it to gun smoke in the air. It certainly raises a lot of interesting questions.
Why was this code written?
Where and how was it used?
Please, people, hang on to your skepticism. Don’t rush to judgment and declare this code “evidence of fraud.” That would be too much like declaring “the science is settled.” Let’s not make that mistake.
It really would help if folks would read the Hide the decline posts that McIntyre has written ( or Jean S or UC ).
As usual the warmists are trying to underplay the “chartsmanship” they engaged in and the sceptics are overplaying it.
The smoking gun:
Everyone is looking at all the smoking guns,
and totally missing what I think is the most obvious one.
If you believe CRU, “most of the raw data was destroyed when we moved in 1980”
Then no matter what the current head of it says:
I destroyed it.
I will destroy it.
etc etc
He (CRU director Phil Jones) was not there.
Jones did not work for CRU in 1980.
If you believe CRU, what this really says is that all of Jones’ work, every computer climate program/model….
….everything was built on their “adjusted” data.