The Smoking Code, part 2

Climategate Code Analysis Part 2

atomThere are three common issues that have been raised in my previous post that I would like to officially address concerning the CRU’s source code.

If you only get one thing from this post, please get this. I am only making a statement about the research methods of the CRU and trying to show proof that they had the means and intent to falsify data. And, until the CRU’s research results can be verified by a 3rd party, they cannot be trusted.

Here are the four most frequent concerns dealing with the CRU’s source code:

  1. The source code that actually printed the graph was commented out and, therefore, is not valid proof.
  2. No proof exists that shows this code was used in publishing results.
  3. Interpolation is a normal part of dealing with large data sets, this is no different.
  4. You need the raw climate data to prove that foul play occurred.

If anyone can think of something I missed, please let me know.

The source code that actually printed the graph was commented out and, therefore, is not valid proof.

Had I done a better job with my source analysis, I would have found a later revision of the briffa_sep98_d.pro source file (linked to in my previous post) contained in a different working tree which shows the fudge-factor array playing a direct result in the (uncommented) plotting of the data.

Snippit from: harris-tree/briffa_sep98_e.pro (see the end of the post for the full source listing)

;

; APPLY ARTIFICIAL CORRECTION

;

yearlyadj=interpol(valadj,yrloc,x)

densall=densall+yearlyadj

  ;

  ; Now plot them

  ;

  filter_cru,20,tsin=densall,tslow=tslow,/nan

  cpl_barts,x,densall,title='Age-banded MXD from all sites',$

    xrange=[1399.5,1994.5],xtitle='Year',/xstyle,$

    zeroline=tslow,yrange=[-7,3]

  oplot,x,tslow,thick=3

  oplot,!x.crange,[0.,0.],linestyle=1

  ;

Now, we can finally put this concern to rest.

Interpolation is a normal part of dealing with large data sets, this is no different.

This is partially true, the issue doesn’t lie in the fact that the CRU researchers used interpolation. The issue is the weight of the valadj array with respect to the raw data. valadj simply introduces too large of an influence to the original data to do anything productive with it.

Here is the graph I plotted of the valadj array. When we’re talking about trying to interpret temperature data that grows on the scale of one-tenths of a degree over a period of time, “fudging” a value by 2.5 is going to have a significant impact on the data set.

No proof exists that shows this code was used in publishing results.

Correct! That’s why I am (and always have) taken the following stand: Enough proof exists that the CRU had both the means and intent to intentionally falsify data. This means that all of their research results cannot be trusted until they are verified. Period.

The fact that the “fudge-factor” source code exists in the first place is reason enough for alarm. Hopefully, they didn’t use fudged results in the CRU research results, but the truth is, we just don’t know.

You need the raw climate data to prove that foul play occurred.

This is assuming the raw data are valid, which I maintain that it probably is. Several people question the validity of the climate data gathering methods used by the different climate research institutions, but I am not enough of a climate expert to have an opinion one way or the other. Furthermore, It simply doesn’t matter if the raw climate data are correct or not to demonstrate the extreme bias the valadj array forces on the raw data.

So, the raw data could actually be temperature data or corporate sales figures, the result is the same; a severe manipulation of data.

Full Source Listing

As promised, here is the entire source listing for: harris-tree/briffa_sep98_e.pro

[sourcecode language=”text”]

1. ;

2. ; PLOTS ‘ALL’ REGION MXD timeseries from age banded and from hugershoff

3. ; standardised datasets.

4. ; Reads Harry’s regional timeseries and outputs the 1600-1992 portion

5. ; with missing values set appropriately. Uses mxd, and just the

6. ; "all band" timeseries

7. ;****** APPLIES A VERY ARTIFICIAL CORRECTION FOR DECLINE*********

8. ;

9. yrloc=[1400,findgen(19)*5.+1904]

10. valadj=[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,-0.1,0.3,0.8,1.2,1.7,2.5,2.6,2.6,$

11. 2.6,2.6,2.6]*0.75 ; fudge factor

12. if n_elements(yrloc) ne n_elements(valadj) then message,’Oooops!’

13. ;

14. loadct,39

15. def_1color,20,color=’red’

16. plot,[0,1]

17. multi_plot,nrow=4,layout=’large’

18. if !d.name eq ‘X’ then begin

19. window, ysize=800

20. !p.font=-1

21. endif else begin

22. !p.font=0

23. device,/helvetica,/bold,font_size=18

24. endelse

25. ;

26. ; Get regional tree lists and rbar

27. ;

28. restore,filename=’reglists.idlsave’

29. harryfn=[‘nwcan’,’wnam’,’cecan’,’nweur’,’sweur’,’nsib’,’csib’,’tib’,$

30. ‘esib’,’allsites’]

31. ;

32. rawdat=fltarr(4,2000)

33. for i = nreg-1 , nreg-1 do begin

34. fn=’mxd.’+harryfn(i)+’.pa.mean.dat’

35. print,fn

36. openr,1,fn

37. readf,1,rawdat

38. close,1

39. ;

40. densadj=reform(rawdat(2:3,*))

41. ml=where(densadj eq -99.999,nmiss)

42. densadj(ml)=!values.f_nan

43. ;

44. x=reform(rawdat(0,*))

45. kl=where((x ge 1400) and (x le 1992))

46. x=x(kl)

47. densall=densadj(1,kl) ; all bands

48. densadj=densadj(0,kl) ; 2-6 bands

49. ;

50. ; Now normalise w.r.t. 1881-1960

51. ;

52. mknormal,densadj,x,refperiod=[1881,1960],refmean=refmean,refsd=refsd

53. mknormal,densall,x,refperiod=[1881,1960],refmean=refmean,refsd=refsd

54. ;

55. ; APPLY ARTIFICIAL CORRECTION

56. ;

57. yearlyadj=interpol(valadj,yrloc,x)

58. densall=densall+yearlyadj

59. ;

60. ; Now plot them

61. ;

62. filter_cru,20,tsin=densall,tslow=tslow,/nan

63. cpl_barts,x,densall,title=’Age-banded MXD from all sites’,$

64. xrange=[1399.5,1994.5],xtitle=’Year’,/xstyle,$

65. zeroline=tslow,yrange=[-7,3]

66. oplot,x,tslow,thick=3

67. oplot,!x.crange,[0.,0.],linestyle=1

68. ;

69. endfor

70. ;

71. ; Restore the Hugershoff NHD1 (see Nature paper 2)

72. ;

73. xband=x

74. restore,filename=’../tree5/densadj_MEAN.idlsave’

75. ; gets: x,densadj,n,neff

76. ;

77. ; Extract the post 1600 part

78. ;

79. kl=where(x ge 1400)

80. x=x(kl)

81. densadj=densadj(kl)

82. ;

83. ; APPLY ARTIFICIAL CORRECTION

84. ;

85. yearlyadj=interpol(valadj,yrloc,x)

86. densadj=densadj+yearlyadj

87. ;

88. ; Now plot it too

89. ;

90. filter_cru,20,tsin=densadj,tslow=tshug,/nan

91. cpl_barts,x,densadj,title=’Hugershoff-standardised MXD from all sites’,$

92. xrange=[1399.5,1994.5],xtitle=’Year’,/xstyle,$

93. zeroline=tshug,yrange=[-7,3],bar_color=20

94. oplot,x,tshug,thick=3,color=20

95. oplot,!x.crange,[0.,0.],linestyle=1

96. ;

97. ; Now overplot their bidecadal components

98. ;

99. plot,xband,tslow,$

100. xrange=[1399.5,1994.5],xtitle=’Year’,/xstyle,$

101. yrange=[-6,2],thick=3,title=’Low-pass (20-yr) filtered comparison’

102. oplot,x,tshug,thick=3,color=20

103. oplot,!x.crange,[0.,0.],linestyle=1

104. ;

105. ; Now overplot their 50-yr components

106. ;

107. filter_cru,50,tsin=densadj,tslow=tshug,/nan

108. filter_cru,50,tsin=densall,tslow=tslow,/nan

109. plot,xband,tslow,$

110. xrange=[1399.5,1994.5],xtitle=’Year’,/xstyle,$

111. yrange=[-6,2],thick=3,title=’Low-pass (50-yr) filtered comparison’

112. oplot,x,tshug,thick=3,color=20

113. oplot,!x.crange,[0.,0.],linestyle=1

114. ;

115. ; Now compute the full, high and low pass correlations between the two

116. ; series

117. ;

118. perst=1400.

119. peren=1992.

120. ;

121. openw,1,’corr_age2hug.out’

122. thalf=[10.,30.,50.,100.]

123. ntry=n_elements(thalf)

124. printf,1,’Correlations between timeseries’

125. printf,1,’Age-banded vs. Hugershoff-standardised’

126. printf,1,’ Region Full <10 >10 >30 >50 >100′

127. ;

128. kla=where((xband ge perst) and (xband le peren))

129. klh=where((x ge perst) and (x le peren))

130. ts1=densadj(klh)

131. ts2=densall(kla)

132. ;

133. r1=correlate(ts1,ts2)

134. rall=fltarr(ntry)

135. for i = 0 , ntry-1 do begin

136. filter_cru,thalf(i),tsin=ts1,tslow=tslow1,tshigh=tshi1,/nan

137. filter_cru,thalf(i),tsin=ts2,tslow=tslow2,tshigh=tshi2,/nan

138. if i eq 0 then r2=correlate(tshi1,tshi2)

139. rall(i)=correlate(tslow1,tslow2)

140. endfor

141. ;

142. printf,1,’ALL SITES’,r1,r2,rall,$

143. format='(A11,2X,6F6.2)’

144. ;

145. printf,1,’ ‘

146. printf,1,’Correlations carried out over the period ‘,perst,peren

147. ;

148. close,1

149. ;

150. end

[/sourcecode]

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
209 Comments
Inline Feedbacks
View all comments
Invariant
December 6, 2009 3:31 am

From previous post:
Carsten Arnholm, Norway (02:57:28) : It really does not matter whether the people writing this code were “smart” or not. What matters is that the result is of very poor quality.
No. That’s complete nonsense! We need to COMPILE the code, EXECUTE the code, REVERSE ENGINEER how it is supposed to work together with the RAW data files in the dump in order to CONCLUDE exactly what it is doing.
Complaining that the source code is poor quality does not help us in any possible way, that’s certainly a dead end a RED HERRING that draws attention away from the central issue which is whether they have ADAPTED the code to the AGW hypothesis. Imagine that we manage to find accurate digital proof that the code reveals CONVENIENT ADJUSTMENTS – that would really be something.

Stacey
December 6, 2009 3:34 am

Mama mia her I go again my my why can’t I resist it.
On the subject of Al Capone
Dear Robert
Maybe at present there is insufficient evidence for murder but maybe there is sufficient evidence for tax evasion. (my view is they are banged to rights)
I believe the raw data is available for the Central England temperature series from mid 1600 to 1973. Professor Manley. Hadcrut have turned the benign series into a hockey stick graph.
With all the enquiries around something plain and simple is needed again like the NZ fiddling.
It may seem parochial to concentrate on the uk but that is where two of the enquiries are to be held.

December 6, 2009 3:38 am

Robert (OP), the point you haven’t dealt with, that several people pointed out in the previous posting, is that this code was either used as “thought experiment” test of the calibration procedure for the Briffa tree ring data (the filename indicates this), or as a way of bootstrapping a correlation process, both of which are perfectly reasonable things to do.
Here’s Gavin of RC on the subject (which was quoted by “Norman” in comments on your previous posting):
“It was an artificial correction to check some calibration statistics to see whether they would vary if the divergence was an artifact of some extra anthropogenic impact. It has never been used in a published paper (though something similar was explained in detail in this draft paper by Osborn). It has nothing to do with any reconstruction used in the IPCC reports.”
And indeed, in the same set of comments, “Morgan” pointed out that the Osborn et al. paper explicitly describes this step:
“To overcome these problems, the decline is artificially removed from the calibrated tree-ring density series, for the purpose of making a final calibration. The removal is only temporary, because the final calibration is then applied to the unadjusted data set (i.e., without the decline artificially removed). Though this is rather an ad hoc approach, it does allow us to test the sensitivity of the calibration to time scale, and it also yields a reconstruction whose mean level is much less sensitive to the choice of calibration period.”
I’m not sure which one of these your particular code snippet is doing, but either seem perfectly reasonable explanations to me – and both require the code to be added and them removed again. The lazy programmer’s way of doing this is by commenting and uncommenting.
Let me give you an analogy: My professional sphere is in real-time video streaming software. In one of my bits of server code there is a mode (switchable by configuration, not comments, but never mind…) where it can corrupt every Nth byte of every Mth packet; we use this to test the resilience of the client device to network errors.
Now let’s imagine some ‘hacker’ breaks in and steals this code, and for some reason the probity of our software was internationally contentious (not likely, but still…), and people start picking over the code looking for “juicy bits” without understanding the full context. Maybe they find something like this:
// Corrupt every ‘spacing’ bytes
void Buffer::corrupt(unsigned int spacing)
{
for(unsigned int i=0; i<length; i+=spacing)
*((unsigned char *)start+i) ^= 0xA5;
}
Imagine the furore! Deliberate corruption/falsification of video streams! Programmer uses hacky C-style casts in C++! What is the significance of this mysterious A5 value?!
Let me be clear: I think there are issues about scientific openness here, especially given the importance of the output, but really, this isn't one of them.

December 6, 2009 3:43 am

may be OT. In previous thread bill (18:55:00) pointed out :
“If it’s a moving average then there should only be 10years centred on the date averaged. This allows you to use up to 4years 11months from ends”
I have now extended graph to show up to 2005, which of course shows noticeable drop for CET.
http://www.vukcevic.co.uk/CET.gif

Gumby
December 6, 2009 3:48 am

My question would be… what kind of configuration management system or scheme was the code stored in at UEA, and can the history of changes to the code be tracked from release to release?
Certainly, if trillions of dollars are at stake, along with the birth of a new industry (to dwarf all other industries), they’d have at least tried at a minimum, to proceed using a “bare minimum entry-level” of code configuration and documentation to cover their hineys, wouldn’t they?
If this is not the case, is it acceptable practice in the scientific community to not have past code versions stored/documented, which produced specific charts and data?
All our code is under CM where I work but I’ve always chuckled about the fact nobody has ever had to use it to quantify something from the past – we’ve never had to rely on our CM system although not much is at stake making it seem like a waste. However, this particular exercise makes quite clear to me, that despite my arrogance and my laughing about the usefulness of it, I clearly and completely now get it; I understand why, at a minimum, we do attempt to practice this “bare minimum” of due diligence; this is my newly self imposed personal professional biaach slap and it is a good one. My face is red but it is smiling 🙂
So, in summary, to what degree of CM did UEA keep this seemingly important code under? I mean it must have one hefty history eh??

Peter Whale
December 6, 2009 3:48 am

As a layman who is wary of all government agencies and quangos. I would have thought that the route to finding out the truth about science related matters is for all major science matters to be open sourced through countries accepted science bodies like The Royal Society. Whereby any scientist can check their area of expertise, where any party can access and assess the quality of the raw code,where anyone can view and comment on results given. Maybe science does not want real science any longer!

Stacey
December 6, 2009 3:49 am

Ok so after all this they are not in fact Climate Scientist but very very bad statisticians?

Rhys Jaggar
December 6, 2009 3:51 am

Where is Deep Throat and who is Bernstein??!!

IDL_chap
December 6, 2009 3:55 am

I’ve been programming in IDL to produce plots for scientific papers for about 10 years and this pro clearly isn’t producing a plot for publication. The pro just plots to screen – if it were for publication then it write the graph out as a postscript file to submit to the publisher. To me, this looks like someone experimenting with the data, which doesn’t really mean anything.

Andrew
December 6, 2009 3:56 am

Gumby, as I work for a software company, I had similar thoughts as you. One would think with as much money at stake, and the fact that this is project taken on at the highest levels of academia, that they would have some sort of control on versions of their code – but alas, having never worked in an academic setting, my guess is they have no concept of this, and it probably never even crossed their mind that they would. This is the benefit of working in a make believe world as opposed to us in the real world – there is no accountability, and it never really even crosses their mind.

December 6, 2009 3:57 am

Suzanne,
Since when is permission needed to quote someone?
Quoting is everyone’s right.

December 6, 2009 4:08 am

AlanG (01:01:07) :
harrfn (harry filename?) is an array of data filenames found here:
29. harryfn=[‘nwcan’,’wnam’,’cecan’,’nweur’,’sweur’,’nsib’,’csib’,’tib’,$
30. ‘esib’,’allsites’]
My guess is they correspond as follows:
nwcan = North West Canada
wnam = Western North America
cecan = Central Canada
nweur = North West Europe
sweur = South West Europe
cs and es could be ISO country codes but ns and ti are not.

nsib = Northern Siberia?
csib = Central Siberia?
tib = ??
esib = Eastern Siberia?

Julian in Wales
December 6, 2009 4:14 am

Most of the media reports dwell on “the leaked / stolen emails” when in fact they should speak more accurately of “the leaked/stolen emails and code”. (I suppose code is just too difficult and makes many in their audiences switch off).
[b]”The emails tell us about the mindset, the code was the weapon that was used to corrupt the data”[/b]. The message about the emails is out, but not the message about the smoking code”. This thought led me to wonder are whether journalists unwilling to deal with the climategate information because it does not arrive on their desks prepackaged as a press release from a reliable source?
Would it be possible for, Watts up, Climate Audit and Bishop’s Hill to put together an official press releases from the environmental sceptic movement which is written in journalist style language using phrases that encapsulate ideas in everyday language, phrase such as “Climategate; the leaked emails and code” “the smoking code” “magic numbers”. Something that a non scientist hack could convert into an article without making a fool of themselves.
What the sceptics need is a feed into the mainstream media that will put this important information across to the very large number of people who do not go on to the web for their news and do not go near websites that have a lot of technical language.
Who is it that organises the sceptic conferences in New York? – this might be a good role for this organisation.

December 6, 2009 4:18 am

Invariant (03:31:01) :
From previous post:
Carsten Arnholm, Norway (02:57:28) : It really does not matter whether the people writing this code were “smart” or not. What matters is that the result is of very poor quality.
No. That’s complete nonsense! We need to COMPILE the code, EXECUTE the code, REVERSE ENGINEER how it is supposed to work together with the RAW data files in the dump in order to CONCLUDE exactly what it is doing.

No need to shout. I was commenting on your assertion that “smart” people wrote poor code, which indeed is wrong and a distraction.
Of course we need to figure out what the code is doing, nobody suggested anything else.

JMANON
December 6, 2009 4:18 am

G
If, as Fox news says, the EPA Declare Carbon Dioxide a Public Danger, then where does that leave us?
At a guess, in a long hot summer drinking flat beer or flat coke and Champagne will be banned from sports winner’s rostrums.
Since we all breathe out CO2 I guess that makes us all public dangers and we will have to wear breathing equipment to filter out the CO2 from our exhaled breath returning only the inert gases to the atmosphere – each day we will take our cartridges of liquefied CO2 for disposal at $20 a pop ($5 to Al Gore).
Oh, we may all have to wear warning signs and have audible alerts that sound to signal our presence to blind people who won’t be able to hear us above their own audible alerts and through wearing their own breathing sets.
Pets will no longer be allowed except maybe kangaroos (because they don’t emit methane http://news.bbc.co.uk/1/hi/uk/7551125.stm), but then pets are on the way out anyway because they have a huge carbon footprint (http://www.telegraph.co.uk/earth/environment/climatechange/6416683/Pet-dogs-as-bad-for-planet-as-driving-4x4s-book-claims.html) and we can no longer afford the land to produce their food because so much is given over to biodiesel crops that there is barely enough left over for human food, mainly rice.

durox
December 6, 2009 4:20 am

[Completely off topic. ~dbs, mod.]

anna v
December 6, 2009 4:20 am

woodfortrees (Paul Clark) (03:38:54) :
After the horse escapes from the barn, a lot of thought can be given on how it had been secured and if the door was safe.
As a particle physicist I have written a lot of code, fortran that is, because that was the language of my time. We used to put C in the first letter of a comments line. If any of the two scenaria you give were true, there should have been a comment about it, of the form:
; temporary fudge to test the bla bla.
or
; temporary fudge to test extraneous forcings
and for good measure
; to be removed from normal runs.
The reason is because a lot of graduate students get hold of codes, and it is not the job of the programmer to set riddles, but to have a clear path for the next person wading in and checking and using the code.
Since these caveats are not there, the simplest explanation is that the code was used or would be used as is for data processing, certainly by an unsuspecting graduate student.
I believe in KISS ( keep it simple stupid).

Jim
December 6, 2009 4:20 am

It seems (paranoia perhaps), that all the major search engines are trying to play the skeptics (sites, news etc..) down.. Has anybody else noticed? Delaying news stories, typing climategate etc..?

Peter
December 6, 2009 4:23 am

The BIG question is why the ‘fudge’ code was there in the first place.
The excuses I’ve heard are:
a) “to see what effect it would have”. But why would you need a computer program to tell you what effect multiplying a number by 2.5 would have?
b)”to facilitate calibration of the data”. Hang on there. If you don’t have reliable data for a period, you leave that period out of the calibration – you don’t substitute ‘made-up’ data.

December 6, 2009 4:24 am

Marc Sheppard explains THE TRICK
http://www.americanthinker.com/2009/12/understanding_climategates_hid.html
This is where the focus has to be, and thr FOIA violations.

December 6, 2009 4:26 am

Sheppard:
” the decline Jones so urgently sought to hide was not one of measured temperatures at all, but rather figures infinitely more important to climate alarmists – those determined by proxy reconstructions. “

Peter
December 6, 2009 4:29 am

IDL_chap:

To me, this looks like someone experimenting with the data, which doesn’t really mean anything.

It speaks volumes about the mindset of those involved.

December 6, 2009 4:32 am

Peter (04:23:00) :
The BIG question is why the ‘fudge’ code was there in the first place
Absolutely spot on. And the word itself ‘fudge’ speaks volumes all by itself.

December 6, 2009 4:34 am

Anna V: Does not
;****** APPLIES A VERY ARTIFICIAL CORRECTION FOR DECLINE*********
amount to much the same caveat? Certainly if I wanted to actually fudge something without anyone knowing, a 15-star comment wouldn’t be my first thought.