Climategate: hide the decline – codified

WUWT blogging ally Ecotretas writes in to say that he has made a compendium of programming code segments that show comments by the programmer that suggest places where data may be corrected, modified, adjusted, or busted. Some the HARRY_READ_ME comments are quite revealing. For those that don’t understand computer programming, don’t fret, the comments by the programmer tell the story quite well even if the code itself makes no sense to you.

To say that the CRU code might be “buggy” would be…well I’ll just let CRU’s programmer tell you in his own words.

FOIA\documents\osborn-tree6\mann\oldprog\maps12.proFOIA\documents\osborn-tree6\mann\oldprog\maps15.proFOIA\documents\osborn-tree6\mann\oldprog\maps24.pro; Plots 24 yearly maps of calibrated (PCR-infilled or not) MXD reconstructions ; of growing season temperatures. Uses "corrected" MXD - but shouldn't usually ; plot past 1960 because these will be artificially adjusted to look closer to
; the real temperatures.
FOIA\documents\harris-tree\recon_esper.pro; Computes regressions on full, high and low pass Esper et al. (2002) series, ; anomalies against full NH temperatures and other series. ; CALIBRATES IT AGAINST THE LAND-ONLY TEMPERATURES NORTH OF 20 N ; ; Specify period over which to compute the regressions (stop in 1960 to avoid
; the decline
FOIA\documents\harris-tree\calibrate_nhrecon.pro; ; Specify period over which to compute the regressions (stop in 1960 to avoid ; the decline that affects tree-ring density records)
;
FOIA\documents\harris-tree\recon1.pro
FOIA\documents\harris-tree\recon2.proFOIA\documents\harris-tree\recon_jones.pro;
; Specify period over which to compute the regressions (stop in 1940 to avoid ; the decline
;
FOIA\documents\HARRY_READ_ME.txt17. Inserted debug statements into anomdtb.f90, discovered that a sum-of-squared variable is becoming very, very negative! Key output from the debug statements: (..) forrtl: error (75): floating point exception IOT trap (core dumped) ..so the data value is unbfeasibly large, but why does the
sum-of-squares parameter OpTotSq go negative?!!
FOIA\documents\HARRY_READ_ME.txt22. Right, time to stop pussyfooting around the niceties of Tim's labyrinthine software suites - let's have a go at producing CRU TS 3.0! since failing to do that will be the
definitive failure of the entire project..
FOIA\documents\HARRY_READ_ME.txtgetting seriously fed up with the state of the Australian data. so many new stations have been introduced, so many false references.. so many changes that aren't documented. Every time a cloud forms I'm presented with a bewildering selection of similar-sounding sites, some with references, some with WMO codes, and some with both. And if I look up the station metadata with one of the local references, chances are the WMO code will be wrong (another station will have
it) and the lat/lon will be wrong too.
FOIA\documents\HARRY_READ_ME.txtI am very sorry to report that the rest of the databases seem to be in nearly as poor a state as Australia was. There are hundreds if not thousands of pairs of dummy stations, one with no WMO and one with, usually overlapping and with the same station name and very similar coordinates. I know it could be old and new stations, but why such large overlaps if that's the case? Aarrggghhh!
There truly is no end in sight.
FOIA\documents\HARRY_READ_ME.txt28. With huge reluctance, I have dived into 'anomdtb' - and already I have
that familiar Twilight Zone sensation.
FOIA\documents\HARRY_READ_ME.txtWrote 'makedtr.for' to tackle the thorny problem of the tmin and tmax databases not being kept in step. Sounds familiar, if worrying. am I the first person to attempt
to get the CRU databases in working order?!!
FOIA\documents\HARRY_READ_ME.txtWell, dtr2cld is not the world's most complicated program. Wheras cloudreg is, and I immediately found a mistake! Scanning forward to 1951 was done with a loop that, for completely unfathomable reasons, didn't include months! So we read 50 grids instead of 600!!! That may have had something to do with it. I also noticed, as I was correcting THAT, that I reopened the DTR and CLD data files when I should have been opening the
bloody station files!!
FOIA\documents\HARRY_READ_ME.txtBack to the gridding. I am seriously worried that our flagship gridded data product is produced by Delaunay triangulation - apparently linear as well. As far as I can see, this renders the station counts totally meaningless. It also means that we cannot say exactly how the gridded data is arrived at from a statistical perspective - since we're using an off-the-shelf product that isn't documented sufficiently to say that. Why this wasn't coded up in Fortran I don't know - time pressures perhaps? Was too much effort expended on homogenisation, that there wasn't enough time to write a gridding
procedure? Of course, it's too late for me to fix it too. Meh.
FOIA\documents\HARRY_READ_ME.txtHere, the expected 1990-2003 period is MISSING - so the correlations aren't so hot! Yet the WMO codes and station names /locations are identical (or close). What the hell is
supposed to happen here? Oh yeah - there is no 'supposed', I can make it up. So I have :-)
FOIA\documents\HARRY_READ_ME.txtWell, it's been a real day of revelations, never mind the week. This morning I discovered that proper angular weighted interpolation was coded into the IDL routine, but that its use was discouraged because it was slow! Aaarrrgghh. There is even an option to tri-grid at 0.1 degree resolution and then 'rebin' to 720x360 - also deprecated! And now, just before midnight (so it counts!), having gone back to the tmin/tmax work, I've found that most if not all of the Australian bulletin stations have been unceremoniously dumped into the files
without the briefest check for existing stations.
FOIA\documents\HARRY_READ_ME.txtAs we can see, even I'm cocking it up! Though recoverably. DTR, TMN and TMX need to be written as (i7.7)./code>
FOIA\documents\HARRY_READ_ME.txtOH FUCK THIS. It's Sunday evening, I've worked all weekend, and just when I thought it was done I'm hitting yet another problem that's based on the hopeless state of our databases. There is no uniform
data integrity, it's just a catalogue of issues that continues to grow as they're found.
FOIA\documents\osborn-tree6\mann\mxdgrid2ascii.proprintf,1,’Osborn et al. (2004) gridded reconstruction of warm-season’ printf,1,’(April-September) temperature anomalies (from the 1961-1990 mean).’ printf,1,’Reconstruction is based on tree-ring density records.’ printf,1 printf,1,’NOTE: recent decline in tree-ring density has been ARTIFICIALLY’ printf,1,’REMOVED to facilitate calibration. THEREFORE, post-1960 values’ printf,1,’will be much closer to observed temperatures then they should be,’ printf,1,’which will incorrectly imply the reconstruction is more skilful’
printf,1,’than it actually is. See Osborn et al. (2004).’
FOIA\documents\osborn-tree6\summer_modes\data4sweden.pro
FOIA\documents\osborn-tree6\summer_modes\data4sweden.proprintf,1,'IMPORTANT NOTE:'
printf,1,'The data after 1960 should not be used. The tree-ring density' printf,1,'records tend to show a decline after 1960 relative to the summer' printf,1,'temperature in many high-latitude locations. In this data set' printf,1,'this "decline" has been artificially removed in an ad-hoc way, and' printf,1,'this means that data after 1960 no longer represent tree-ring printf,1,'density variations, but have been modified to look more like the
printf,1,'observed temperatures.'
FOIA\documents\osborn-tree6\combined_wavelet_col.pro; ; Remove missing data from start & end (end in 1960 due to decline) ; kl=where((yrmxd ge 1402) and (yrmxd le 1960),n)
sst=prednh(kl)
FOIA\documents\osborn-tree6\mann\mxd_pcr_localtemp.pro; Tries to reconstruct Apr-Sep temperatures, on a box-by-box basis, from the ; EOFs of the MXD data set. This is PCR, although PCs are used as predictors ; but not as predictands. This PCR-infilling must be done for a number of ; periods, with different EOFs for each period (due to different spatial ; coverage). *BUT* don’t do special PCR for the modern period (post-1976), ; since they won’t be used due to the decline/correction problem. ; Certain boxes that appear to reconstruct well are “manually” removed because
; they are isolated and away from any trees.
FOIA\documents\osborn-tree6\briffa_sep98_d.pro;mknormal,yyy,timey,refperiod=[1881,1940] ; ; Apply a VERY ARTIFICAL correction for decline!! ; yrloc=[1400,findgen(19)*5.+1904] valadj=[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,-0.1,0.3,0.8,1.2,1.7,2.5,2.6,2.6,$ 2.6,2.6,2.6]*0.75 ; fudge factor (...) ; ; APPLY ARTIFICIAL CORRECTION ; yearlyadj=interpol(valadj,yrloc,x)
densall=densall+yearlyadj
FOIA\documents\osborn-tree6\summer_modes\pl_decline.pro; ; Plots density ‘decline’ as a time series of the difference between ; temperature and density averaged over the region north of 50N, ; and an associated pattern in the difference field. ; The difference data set is computed using only boxes and years with ; both temperature and density in them – i.e., the grid changes in time. ; The pattern is computed by correlating and regressing the *filtered* ; time series against the unfiltered (or filtered) difference data set. ; ;*** MUST ALTER FUNCT_DECLINE.PRO TO MATCH THE COORDINATES OF THE
; START OF THE DECLINE *** ALTER THIS EVERY TIME YOU CHANGE ANYTHING ***
FOIA\documents\osborn-tree6\mann\oldprog\maps12.pro; ; Plots 24 yearly maps of calibrated (PCR-infilled or not) MXD reconstructions ; of growing season temperatures. Uses “corrected” MXD – but shouldn’t usually ; plot past 1960 because these will be artificially adjusted to look closer to ; the real temperatures.
;
FOIA\documents\osborn-tree6\mann\oldprog\calibrate_correctmxd.pro; We have previously (calibrate_mxd.pro) calibrated the high-pass filtered ; MXD over 1911-1990, applied the calibration to unfiltered MXD data (which ; gives a zero mean over 1881-1960) after extending the calibration to boxes ; without temperature data (pl_calibmxd1.pro). We have identified and ; artificially removed (i.e. corrected) the decline in this calibrated ; data set. We now recalibrate this corrected calibrated dataset against ; the unfiltered 1911-1990 temperature data, and apply the same calibration
; to the corrected and uncorrected calibrated MXD data.
FOIA\documents\osborn-tree6\summer_modes\calibrate_correctmxd.pro; No need to verify the correct and uncorrected versions, since these ; should be identical prior to 1920 or 1930 or whenever the decline
; was corrected onwards from.
FOIA\documents\osborn-tree5\densplus188119602netcdf.pro; we know the file starts at yr 440, but we want nothing till 1400, so we ; can skill lines (1400-440)/10 + 1 header line ; we now want all lines (10 yr per line) from 1400 to 1980, which is ; (1980-1400)/10 + 1 lines (...) ; we know the file starts at yr 1070, but we want nothing till 1400, so we ; can skill lines (1400-1070)/10 + 1 header line ; we now want all lines (10 yr per line) from 1400 to 1991, which is
; (1990-1400)/10 + 1 lines (since 1991 is on line beginning 1990)

FOIA\documents\osborn-tree6\mann\oldprog\maps12.pro
FOIA\documents\osborn-tree6\mann\oldprog\maps15.pro

FOIA\documents\osborn-tree6\mann\oldprog\maps24.pro

; Plots 24 yearly maps of calibrated (PCR-infilled or not) MXD reconstructions
; of growing season temperatures. Uses "corrected" MXD - but shouldn't usually ; plot past 1960 because these will be artificially adjusted to look closer to
; the real temperatures.
FOIA\documents\harris-tree\recon_esper.pro
; Computes regressions on full, high and low pass Esper et al. (2002) series,
; anomalies against full NH temperatures and other series. ; CALIBRATES IT AGAINST THE LAND-ONLY TEMPERATURES NORTH OF 20 N ; ; Specify period over which to compute the regressions (stop in 1960 to avoid
; the decline
FOIA\documents\harris-tree\calibrate_nhrecon.pro
;
; Specify period over which to compute the regressions (stop in 1960 to avoid ; the decline that affects tree-ring density records)
;
FOIA\documents\harris-tree\recon1.pro
FOIA\documents\harris-tree\recon2.proFOIA\documents\harris-tree\recon_jones.pro

;
; Specify period over which to compute the regressions (stop in 1940 to avoid ; the decline
;
FOIA\documents\HARRY_READ_ME.txt
17. Inserted debug statements into anomdtb.f90, discovered that
a sum-of-squared variable is becoming very, very negative! Key output from the debug statements: (..) forrtl: error (75): floating point exception IOT trap (core dumped) ..so the data value is unbfeasibly large, but why does the
sum-of-squares parameter OpTotSq go negative?!!
FOIA\documents\HARRY_READ_ME.txt
22. Right, time to stop pussyfooting around the niceties of Tim's labyrinthine software
suites - let's have a go at producing CRU TS 3.0! since failing to do that will be the
definitive failure of the entire project..
FOIA\documents\HARRY_READ_ME.txt
getting seriously fed up with the state of the Australian data. so many new stations have been
introduced, so many false references.. so many changes that aren't documented. Every time a cloud forms I'm presented with a bewildering selection of similar-sounding sites, some with references, some with WMO codes, and some with both. And if I look up the station metadata with one of the local references, chances are the WMO code will be wrong (another station will have
it) and the lat/lon will be wrong too.
FOIA\documents\HARRY_READ_ME.txt
I am very sorry to report that the rest of the databases seem to be in nearly as poor a state as
Australia was. There are hundreds if not thousands of pairs of dummy stations, one with no WMO and one with, usually overlapping and with the same station name and very similar coordinates. I know it could be old and new stations, but why such large overlaps if that's the case? Aarrggghhh!
There truly is no end in sight.
FOIA\documents\HARRY_READ_ME.txt
28. With huge reluctance, I have dived into 'anomdtb' - and already I have
that familiar Twilight Zone sensation.
FOIA\documents\HARRY_READ_ME.txt
Wrote 'makedtr.for' to tackle the thorny problem of the tmin and tmax databases not
being kept in step. Sounds familiar, if worrying. am I the first person to attempt
to get the CRU databases in working order?!!
FOIA\documents\HARRY_READ_ME.txt
Well, dtr2cld is not the world's most complicated program. Wheras cloudreg is, and I
immediately found a mistake! Scanning forward to 1951 was done with a loop that, for completely unfathomable reasons, didn't include months! So we read 50 grids instead of 600!!! That may have had something to do with it. I also noticed, as I was correcting THAT, that I reopened the DTR and CLD data files when I should have been opening the
bloody station files!!
FOIA\documents\HARRY_READ_ME.txt
Back to the gridding. I am seriously worried that our flagship gridded data product is produced by
Delaunay triangulation - apparently linear as well. As far as I can see, this renders the station counts totally meaningless. It also means that we cannot say exactly how the gridded data is arrived at from a statistical perspective - since we're using an off-the-shelf product that isn't documented sufficiently to say that. Why this wasn't coded up in Fortran I don't know - time pressures perhaps? Was too much effort expended on homogenisation, that there wasn't enough time to write a gridding
procedure? Of course, it's too late for me to fix it too. Meh.
FOIA\documents\HARRY_READ_ME.txt
Here, the expected 1990-2003 period is MISSING - so the correlations aren't so hot! Yet
the WMO codes and station names /locations are identical (or close). What the hell is
supposed to happen here? Oh yeah - there is no 'supposed', I can make it up. So I have :-)
FOIA\documents\HARRY_READ_ME.txt
Well, it's been a real day of revelations, never mind the week. This morning I
discovered that proper angular weighted interpolation was coded into the IDL routine, but that its use was discouraged because it was slow! Aaarrrgghh. There is even an option to tri-grid at 0.1 degree resolution and then 'rebin' to 720x360 - also deprecated! And now, just before midnight (so it counts!), having gone back to the tmin/tmax work, I've found that most if not all of the Australian bulletin stations have been unceremoniously dumped into the files
without the briefest check for existing stations.
FOIA\documents\HARRY_READ_ME.txt
As we can see, even I'm cocking it up! Though recoverably. DTR, TMN and TMX need to be written as (i7.7)./code>
FOIA\documents\HARRY_READ_ME.txt
OH FUCK THIS. It's Sunday evening, I've worked all weekend, and just when I thought it was done I'm
hitting yet another problem that's based on the hopeless state of our databases. There is no uniform
data integrity, it's just a catalogue of issues that continues to grow as they're found.
FOIA\documents\osborn-tree6\mann\mxdgrid2ascii.pro
printf,1,’Osborn et al. (2004) gridded reconstruction of warm-season’
printf,1,’(April-September) temperature anomalies (from the 1961-1990 mean).’ printf,1,’Reconstruction is based on tree-ring density records.’ printf,1 printf,1,’NOTE: recent decline in tree-ring density has been ARTIFICIALLY’ printf,1,’REMOVED to facilitate calibration. THEREFORE, post-1960 values’ printf,1,’will be much closer to observed temperatures then they should be,’ printf,1,’which will incorrectly imply the reconstruction is more skilful’
printf,1,’than it actually is. See Osborn et al. (2004).’
FOIA\documents\osborn-tree6\summer_modes\data4sweden.pro
FOIA\documents\osborn-tree6\summer_modes\data4sweden.pro

printf,1,'IMPORTANT NOTE:'
printf,1,'The data after 1960 should not be used. The tree-ring density' printf,1,'records tend to show a decline after 1960 relative to the summer' printf,1,'temperature in many high-latitude locations. In this data set' printf,1,'this "decline" has been artificially removed in an ad-hoc way, and' printf,1,'this means that data after 1960 no longer represent tree-ring printf,1,'density variations, but have been modified to look more like the
printf,1,'observed temperatures.'
FOIA\documents\osborn-tree6\combined_wavelet_col.pro
;
; Remove missing data from start & end (end in 1960 due to decline) ; kl=where((yrmxd ge 1402) and (yrmxd le 1960),n)
sst=prednh(kl)
FOIA\documents\osborn-tree6\mann\mxd_pcr_localtemp.pro
; Tries to reconstruct Apr-Sep temperatures, on a box-by-box basis, from the
; EOFs of the MXD data set. This is PCR, although PCs are used as predictors ; but not as predictands. This PCR-infilling must be done for a number of ; periods, with different EOFs for each period (due to different spatial ; coverage). *BUT* don’t do special PCR for the modern period (post-1976), ; since they won’t be used due to the decline/correction problem. ; Certain boxes that appear to reconstruct well are “manually” removed because
; they are isolated and away from any trees.
FOIA\documents\osborn-tree6\briffa_sep98_d.pro;mknormal,yyy,timey,refperiod=[1881,1940] ; ; Apply a VERY ARTIFICAL correction for decline!! ; yrloc=[1400,findgen(19)*5.+1904] valadj=[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,-0.1,0.3,0.8,1.2,1.7,2.5,2.6,2.6,$ 2.6,2.6,2.6]*0.75 ; fudge factor (...) ; ; APPLY ARTIFICIAL CORRECTION ; yearlyadj=interpol(valadj,yrloc,x)
densall=densall+yearlyadj
FOIA\documents\osborn-tree6\summer_modes\pl_decline.pro
;
; Plots density ‘decline’ as a time series of the difference between ; temperature and density averaged over the region north of 50N, ; and an associated pattern in the difference field. ; The difference data set is computed using only boxes and years with ; both temperature and density in them – i.e., the grid changes in time. ; The pattern is computed by correlating and regressing the *filtered* ; time series against the unfiltered (or filtered) difference data set. ; ;*** MUST ALTER FUNCT_DECLINE.PRO TO MATCH THE COORDINATES OF THE
; START OF THE DECLINE *** ALTER THIS EVERY TIME YOU CHANGE ANYTHING ***
FOIA\documents\osborn-tree6\mann\oldprog\maps12.pro
;
; Plots 24 yearly maps of calibrated (PCR-infilled or not) MXD reconstructions ; of growing season temperatures. Uses “corrected” MXD – but shouldn’t usually ; plot past 1960 because these will be artificially adjusted to look closer to ; the real temperatures.
;
FOIA\documents\osborn-tree6\mann\oldprog\calibrate_correctmxd.pro
; We have previously (calibrate_mxd.pro) calibrated the high-pass filtered
; MXD over 1911-1990, applied the calibration to unfiltered MXD data (which ; gives a zero mean over 1881-1960) after extending the calibration to boxes ; without temperature data (pl_calibmxd1.pro). We have identified and ; artificially removed (i.e. corrected) the decline in this calibrated ; data set. We now recalibrate this corrected calibrated dataset against ; the unfiltered 1911-1990 temperature data, and apply the same calibration
; to the corrected and uncorrected calibrated MXD data.
FOIA\documents\osborn-tree6\summer_modes\calibrate_correctmxd.pro
; No need to verify the correct and uncorrected versions, since these
; should be identical prior to 1920 or 1930 or whenever the decline
; was corrected onwards from.
FOIA\documents\osborn-tree5\densplus188119602netcdf.pro
; we know the file starts at yr 440, but we want nothing till 1400, so we
; can skill lines (1400-440)/10 + 1 header line ; we now want all lines (10 yr per line) from 1400 to 1980, which is ; (1980-1400)/10 + 1 lines (...) ; we know the file starts at yr 1070, but we want nothing till 1400, so we ; can skill lines (1400-1070)/10 + 1 header line ; we now want all lines (10 yr per line) from 1400 to 1991, which is
; (1990-1400)/10 + 1 lines (since 1991 is on line beginning 1990)

Sponsored IT training links:

Join 70-291 training program to pass 642-446 test plus get free practice files for next 70-643 exam.

5 2 votes

Article Rating

445 Comments

E.M.Smith

Editor

November 27, 2009 5:04 pm

politicoassassin (14:10:51) : 1) I don’t understand the technical issues referred to in the above notes …

Anyone who thinks they can draw a conclusion from it is just seeing what they want to see.
No. Some of us, like me, do understand the technical issues. We can very easily conclude:
1) There code management stinks.
2) They have ‘goal seeking behavoiur” in their “science”.
3) They have no QA and the code is buggy.
4) The results are worse than worthless, they are a deception by design.

N Bhashyam

November 27, 2009 5:19 pm

It is very sad and disappointing to find highly distinguished professionals indulging in such unethical activities.Society expects better conduct from these seekers of truth.Falsehood and manipulations are expected to be foreign to these scientists – cream of the society.

E.M.Smith

Editor

November 27, 2009 5:23 pm

Bruckner8 (15:23:17) :
I’ve made a good living as computer programmer,and I do stuff like that all the time. If you saw my code, you’d see comments like “The customer insists that I do this even though I know this make the outcome skewed in a [positive/negative] direction.”
Oddly, the early parchment copies of things done by very dedicated monks sometimes have comments in the margin like: “I think this is sacrilege but I must copy the document faithfully”.
Seems that scribes of all ages have had that in common… “The customer wants it this way, despite my protests, so I’m doing it, but don’t blame me…”
If only the patrons were so bound to honesty and fidelity …

Yan

November 27, 2009 5:31 pm

Lets be clear here. Climate change does take place. It always has and always will. A proper look into the past clearly shows this. However Co2 has so very little to do with climate change. Nature has to do with climate change. Ask anyone of these people how much cap and trade will reduce changes in the climate and listen closely to the answers they try to give you…if any.
Climate change is natural, we must accept that and then we can move past that idea to see through this cap and trade nonsense for what it is. An attempt to control, and makes ridiculous profits for some.
Create a problem that does not truly exist and then manufacture a way to make money from it. This can also be observed throughout history.
Al Gore stands to make billions, and that doesn’t account for the money used so far to fund this shady science premise.
Still no MSM coverage save one little CNN video and FOX’s coverage.
The biggest global money making scheme ever and based on what?
In the long run this will make the bailouts look like chump change, and freedom will continue to erode.
I wish someone could stop these globalist idiots.

E.M.Smith

Editor

November 27, 2009 6:02 pm

DaveE (15:41:04) : E.M.Smith is better qualified to answer this & I believe GISTemp does the same thing with station Temps. You note that Months with 9999 are ignored but what isn’t obvious from that is that there may only be one day of readings missing. The obvious thing to do would be to take the average of the adjacent days to salvage the month but they don’t, they just write the month off.
GIStemp gets the data from NOAA (in what it turns out is the NCDC product of GHCN) all ready rolled up into a single “monthly” datum. One has to swim upstream to NCDC / NOAA / GHCN to find out how they decided to put a “missing data flag” in a month (-9999 or sometimes 9999 in some data sets and some steps of GIStemp).
So to the assertion that a single missing day might cause a missing data flag, I can not speak (yet…) Heck, NCDC could choose to simply fill in any month with a ‘missing data flag’ if it fails their “QA Tests”. (They do something along those lines already). But by the time it gets to GHCN and thus to GIStemp there is no daily detail left.
To the issue of “data creation”: It is RAMPANT throughout the entire GISS and HadCRUt process. There are so many holes in the data by time and by space that they have no choice but to pick one:
1) Admit they have no hope of creating a “global temperature” for any significant length of time.
2) Make up ‘temperature values’ for the 80% or so of time and space that are missing. (The southern hemisphere is substantially empty for the first half of the temperature record and is still remarkably blank. Everything with about 20 Degrees of the North Pole is fabricated. Etc.)
The links in: http://chiefio.wordpress.com/2009/11/09/gistemp-a-human-view/
cover it pretty well in detail, while being readable at the top level by anyone.
Especially see the graph here:
http://chiefio.wordpress.com/2009/02/24/so_many_thermometers_so_little_time/
and the coverage charts here:
http://chiefio.wordpress.com/2009/11/03/ghcn-the-global-analysis/
Talking of E.M. Smith. I believe he’s fixed the -ve sum of squares problem, though I’ve not been over there to find out.
Yes, I have. A couple of different ways 🙂
It’s a ‘square of integers’ (which can have overflow) problem. There is a commenter “Steve” who asserts it was just a single bad data item and that “Harry Readme” removing it is all it takes to “fix it”. Totally insufficient. There was one bad data item big enough to cause an integer overflow, but there could just as easily be others that did not cause a crash like the one that lead “Harry Readme” to pluck that bad datum from the set. (i.e. there could still be bogus values not yet found).
There are 3 levels of fix:
1) Range check in the program (i.e. catch broken large data before it causes an overflow).
2) The “square of INTEGERs” gets stuck into a floating point number (so an implicit “cast to float” is done). Just change the INTEGERs into FLOATS before the squaring (“cast to float” first) and you eliminate the overflow ( IEEE compliant Floating point math does not overflow) though you might still have wrong too large data in your input. Not strictly needed if range checking is perfect, but a nice bit of robustness anyway. Belt and Suspenders, don’t you know…
3) Write a “preening” program to check for insane data in the input file prior to running. This can be more detailed than the basic range checks in the program itself. (I.e. the program might just check temps between -90 and +60 C while the ‘preening’ step might assume even 0C was too warm and wrong at the South Pole while -89 C was a possible, yet at the equator might accept nothing below 10C unless at altitude for hot countries…
This lets you run the ‘preening’ as a distinct step for debug, data quality report and assessment, efficiency, etc.

E.M.Smith

Editor

November 27, 2009 6:51 pm

Jeff C. (20:21:17) : Thanks for your other comments regarding the code. Anyone know if there is a thread somewhere dedicated to the CRUt code? It would be nice to pool information without having to wade through 300+ comments.
I’ve got a little one going. Started with the one program in the URL, but there is a link to an online archive of the file structure with all the code populated and the “guidance” is to put the link for a particular program you want to comment about in your comment and then add your observations.
If anyone knows of a ‘bigger discussion group’ doing code review, feel free to add a pointer to it in a comment.
http://chiefio.wordpress.com/2009/11/25/crut-fromexcel-f90-program-listing/

E.M.Smith

Editor

November 27, 2009 7:06 pm

Raredog (00:15:14) : The biggest problem though is that raw temperature data is increasingly hard to access. NASA GISS Temp had raw figures from around the world presented numerically and as graphs, which made this an easy exercise but they took those pages down some time ago (they could be up again).
GIStemp does not produce nor take in “raw” data. It takes in GHCN (aka NCDC already adjusted and massaged data) and produces more massaged anomaly maps. You can download the GHCN dataset (aka NCDC data) from their FTP site, but they have deleted close to 90% of the cold thermometer records (for recent years only, leaving them in the older baseline periods…).
See:
http://chiefio.wordpress.com/2009/11/03/ghcn-the-global-analysis/
that includes a link to the NCDC / NOAA / GHCN ftp download.
That’s as close to “raw” as I’ve been able to find so far. They do have some other data sets on the NCDC site that may include more (there are some daily values for example) but I’ve not had time to wander through it all.
See:
http://lwf.ncdc.noaa.gov/oa/climate/surfaceinventories.html
Happy Wading …

E.M.Smith

Editor

November 27, 2009 7:22 pm

Rabe (02:54:21) : Floating point numbers don’t “overflow” this way. They just stay at (+)INF in case of about 10^300 even in fortran (real*8).
They didn’t square a float, they square an INT then stuff it into a float. The INT overflows to a negative on the squaring…
from down in the comments of:
http://chiefio.wordpress.com/2009/11/21/hadley-hack-and-cru-crud/

Further, to your assertion the changing a bad data item “fixed it” and “the code now works”. It does not. It is just as broken as it ever was. From the code:

integer, pointer, dimension (:,:,:) :: Data,DataA,DataB,DataC ... real :: OpVal,OpTot,OpEn,OpTotSq,OpStdev,OpMean,OpDiff, ... OpTotSq=OpTotSq+(DataA(XAYear,XMonth,XAStn)**2)

The “square an INT” and stuff it into a REAL running total is still there.

matthew

November 27, 2009 11:52 pm

My apologies, Glen, i meant to say you’re a liar by your own standards AND everything that happens to you is somebody else ‘s fault, poor baby.
“Harry” is commentary on the same source code. It is no different to the comments directly in the code that have been discussed ad infinitum above.
Eric was perfectly within justifiable bounds to suggest the comments might just be indications of opinions on the work, not implications about its purpose, hidden or otherwise, even if that isn’t the general opinion of the board.

Glenn

November 28, 2009 11:42 am

matthew (23:52:22) :
“My apologies, Glen, i meant to say you’re a liar by your own standards AND everything that happens to you is somebody else ’s fault, poor baby.
“Harry” is commentary on the same source code. It is no different to the comments directly in the code that have been discussed ad infinitum above.
Eric was perfectly within justifiable bounds to suggest the comments might just be indications of opinions on the work, not implications about its purpose, hidden or otherwise, even if that isn’t the general opinion of the board.”
*********************
You’re not in a position to determine what is justifiable here, after breaking etiquette rules, making rude unfounded accusations followed by intentional misrepresenttation, further obfuscation and misrepresentation… you’ve got it all, warmer. Sounds like you’d fit right in with those in the email files.
Eric made incorrect, misleading and preposterous statements. One was that Mann had “grafted” the instrumental record on his graph, defending what he tried to characterize as a valid technique.
Eric said posters were referencing comments IN the code that “indicate frustration at the large data set”. There are no such comments in the actual code.
Eric claimed the comments IN the code “clearly don’t indicate a fraud”, but clearly the comments in the actual code DOES indicate fraud.
This was clearly an attempt to “hide the fraud” seen in the code, by use of the strawman “the indicate frustration with the database”. Tactic not unlike that used Eric’s attempt to legitimize Mann’s hockey stick.
Now to you.
You claim that the comments in Harry are “no different” than comments “directly” in the code. But general comments in a programmer log is not the same as specific comments in source code. It IS “different”.
You claim Eric is justified in “suggesting the comments” does not indicate fraud. But they do. The comment about databases does not, then again that comment is not a source code comment but a “Harry” comment. You tried to characterize it as “source text”.
See the pattern between your argument and Eric’s, matthew?

John Q. Public

November 28, 2009 11:58 am

Climategate
Let’s see:
a) subverting the peer review process
b) stacking the UN IPCC
c) obstruction of the Freedom on Information Act
d) breach of university and state ethics codes
… and we haven’t even talked about the data yet.
Climate Science – the new Ponzi scheme!
p.s. – Is this what Science is all about? Meet the new boss (science), same as the old boss (religion). When are they issuing funny hats to scientists?
p.p.s. – Who needs Wall Street when you have Science?

Bill P

November 28, 2009 5:28 pm

Looking at the Harry Read Me files online,
http://di2.nu/foia/HARRY_READ_ME-0.html
it appears they incorporate other (likely huge) data files. I can imagine that they had so many chances to corrupt their data, if that were their intent. A scientist with a warming agenda could presumably:
core a strip bark tree where an obvious, recent bulge on one side would yield wider rings.
select (from many cores) the ones with visibly higher density in modern periods.
“analyze” the ring widths with a prejudice to fudging the numbers.
pay off the data entry person to accidentally show more weight in the 20th century.
hire a data programmer to write the codes, presumably one who is not squeamish about such things, or disgruntled about his job…
I don’t know if that characterizes this fellow or not. Did he want someone to read this stuff?
Has this been analyzed by an expert data programmer to full explicate what was done? (Some sort of comparison of graphs which were created here – contrasted to what would have been created without fudging?)
And (I suppose) the naive question of the year: is there any uncorrupted data still available?

treeringer

November 28, 2009 8:24 pm

OK people – you have to understand the problems tree ring researchers are faced with to understand what this program does. This is called understanding the ‘problem domain’.
MXD = maximum latewood density
It is a known fact that tree ring width is a function of temperature. However, it also a known fact that as a tree ages, the density of tree rings begins to DECLINE after the tree reaches a certain age. This presents a problem if one is to use tree rings to determine temperatures from hundreds of years ago.
A way around this problem is to take tree ring data from trees that have grown recently (ie. last hundred years) and compare it to actual, observed and measured temperatures, where the tree was alive during the years where temperatures were measured.
You can see where the decline in tree ring density starts and compare it to actual temperature series.
Armed with this new data, you now a have a way to ‘calibrate’ your analysis of older tree rings, perhaps hundreds of years old.
It would appear from some of the programmers comments, that the decline in tree ring density in the samples that he has, corresponds to the year 1960, 1940, etc. So he has to ‘hard code’ his calibration technique for tree rings after that date, so they reflect actual observed and measured temperatures.
This gives you the ability to look at the tree density of rings from hundreds of years ago, and know that it represents a certain temperature, because you’ve observed the same tree ring density that corresponds to actual measured temperatures.
It’s valid science, and it’s logical and reasonable.
Everyone here is taking this WAY out of context. Without seeing the program in its entirety, it’s impossible to put in the proper context and determine exactly what it does.
Again, he’s not hiding a decline in temperatures, but a decline in tree ring densities!!!

John M

November 29, 2009 7:30 am

treeringer (20:24:35) :

Without seeing the program in its entirety, it’s impossible to put in the proper context and determine exactly what it does.

Really?

MattR

November 29, 2009 9:49 am

I used to work in the parallel scientific computing area and I wish I could still be there but one of the things that really bothered me was how poor the code quality was. What I saw was nothing compared to the Harry description. I don’t believe that Harry was trying to cook the books. (The hide the decline stuff is another issue.) Harry was trying to do the right thing, which was to generate an accurate model, but he was set up to fail. He had a pile of numbers and a vague description of what they represented. He was making hundreds of assumptions about the data. They aren’t all correct. So the resulting model is questionable. It might show things worse than they really are or better. We have no idea.
Before we change the world’s economy it would be prudent to do a rather large data and code review.

John A. Jauregui

November 29, 2009 3:15 pm

Speaking of tree ring proxies, the ONE thing Prof. M Mann’s bristle-cone proxied “hockey stick” study proved was that nothing has done more to GREEN (verb) the planet over the past few decades than elevated levels of atmospheric CO2 in the presence of moderate sun-driven warming.

Bart

November 29, 2009 7:12 pm

treeringer (20:24:35) :
“It would appear from some of the programmers comments… It’s valid science, and it’s logical and reasonable.”
So, subjective and arbitrary revisions, for which the only documentation is what one may glean from what “appears” to be in computer code comments, based on some low level computer jockey’s eyeballing of data are “logical and reasonable” “valid science”?
Not where I come from, Bub.

Dennis

November 30, 2009 5:16 am

And to think Anthony and others were laughed at when they suggested an ISO 9000 overview.

November 30, 2009 8:49 am

Nearly all the comments here are making the unwarranted assumption that if the data were 100% complete and needed no interpolations or adjustments for factors such as the well-known tree ring issue (explained above by treeringer at 20:24:35), then it would show no warming.
Except for paranoia, there is no particular reason to believe that.
It is equally possible that if the data were complete — and so complete that there would be no need to use indirect proxies for temperature such as tree-ring thickness (which inherently require calibration) — we would see even more clear evidence of global warming. There are numerous indications of global warming completely independent of these temperature records.
Because this is such an important issue, the proper response is not to deny global warming, but to demand a substantial increase in expenditures to collect better data and more thoroughly and carefully analyze what we have.
The real scandal revealed by these code comments is that analyses of extreme importance to the world are ridiculously underfunded.

TJA

November 30, 2009 10:52 am

“he’s not hiding a decline in temperatures, but a decline in tree ring densities!!!”
Sure treeringer, whatever. I guess that there are documented and widely used within the specialty, formulas for adjusting the density numbers? Why then would the programmer use the word “artificial”?

TJA

November 30, 2009 10:54 am

Adjusting the density number for the age of the trees, I mean. This seems like a phenomenon that would be known and measured and calculable.

John M

November 30, 2009 4:12 pm

jm (08:49:56) :

The real scandal revealed by these code comments is that analyses of extreme importance to the world are ridiculously underfunded.

You’re kidding right?
But let’s say for the sake of your argument that it was underfunded.
Isn’t that the usual state of affairs for a “settled” science?

Michael Santomauro

November 30, 2009 8:57 pm

Lies, damned lies, and statistics.
The falsification of data and the conspiracy to commit same etc, constitutes serious criminal activity. Further, the granting of public funds for research warrants a federal investigation. I’m hoping the perpetrators, including possibly Professor Michael Mann, director of Pennsylvania State University’s Earth System Science Centre and a regular contributor to the popular climate science blog Real Climate, and their facilitators will be tracked down and prosecuted to the fullest extent the law allows. — Michael Santomauro, Publisher of “Debating The Holocaust: A New Look At Both Sides by Thomas Dalton

Anne Sheehan

November 30, 2009 11:25 pm

Harry’s a hero. He may not have intended to, but he’s done the world a huge favour – without publicly verifiable data, the climate change debate is meaningless.
Enough crap computer modelling, let’s concentrate on getting decent data.
And yes, some of us are verbose programmers, especially after a long frustrating weekend trying to clean up some other idiot’s mess, while they’re at the beach.

ZeeZee

December 1, 2009 7:33 am

After reading through Harry’s notes, I had a lot of sympathy for him. It’s not easy puzzling through a mess like this. It’s worse when you conclude your own organization’s data is unusable.
It’s easy to conclude his predecessor, apparently somebody named Tim, was responsible, but I couldn’t help but wonder if Tim was in a similar position as Harry.