Distribution analysis suggests GISS final temperature data is hand edited – or not

UPDATE: As I originally mentioned at the end of this post, I thought we should “give the benefit of the doubt” to GISS as there may be a perfectly rational explanation. Steve McIntyre indicates that he has done an analysis also and doubts the other analyses:

I disagree with both Luboš and David and don’t see anything remarkable in the distribution of digits.

I tend to trust Steve’s intuition and analysis skills,as his track record has been excellent. So at this point we don’t know what is the root cause or even if there is any human touch to the data. But as Lubos said on CA “there’s still an unexplained effect in the game”.

I’m sure it will get much attention as the results shake out.

UPDATE2: David Stockwell writes in comments here:

Hi,

I am gratified with the interest in this, very preliminary analysis. There’s a few points from the comments above.

1. False positives are possible, for a number of reasons.

2. Even though data are subjected to arithmetric operations, distortions in digit frequency at an earlier stage can still be observed.

3. The web site is still in development.

4. One of the deviant periods in GISS seems to be around 1940, the same as the ‘warmest year in the century’ and the ‘SST bucket collection’ issues.

5. Even if in the worst case there was manipulation, it wouldn’t affect AGW science much. The effect would be small. Its about something else. Take the Madoff fund. Even though investors knew the results were managed, they still invested because the payouts were real (for a while).

6. To my knowledge, noone has succeeded in exactly replicating the GISS data.

7. I picked that file as it is the most used – global land and ocean. I haven’t done an extensive search of files as I am still testing the site.

8. Lubos relicated this study more carefully, using only the monthly series and got the same result.

9. Benfords law (on the first digit) has a logarithmic distribution, and really only applies to data across many orders of magnitude. Measurement data that often has a constant first digit doesn’t work, although the second digit seems to. I don’t see why last digit wouldn’t work, and should approach a uniform distribution according to the Benford’s postulate.

That’s all for the moment. Thanks again.


This morning I received an email outlining some work that David Stockwell has done in some checking of the GISS global Land-Ocean temperature dataset:

Detecting ‘massaging’ of data by human hands is an area of statistical analysis I have been working on for some time, and devoted one chapter of my book, Niche Modeling, to its application to environmental data sets.

The WikiChecks web site now incorporates a script for doing a Benford’s analysis of digit frequency, sometimes used in numerical analysis of tax and other financial data.

The WikiChecks Site Says:

‘Managing’ or ‘massaging’ financial or other results can be a very serious deception. It ranges from rounding numbers up or down, to total fabrication. This system will detect the non-random frequency of digits associated with human intervention in natural number frequency.

Stockwell runs a test on GISS and writes:

One of the main sources of global warming information, the GISS data set from NASA showed significant management, particularly a deficiency of zeros and ones. Interestingly the moving window mode of the algorithm identified two years, 1940 and 1968 (see here).

You can actually run this test yourself, visit the WikiChecks web site, and paste the URL for the GISS dataset

http://data.giss.nasa.gov/gistemp/tabledata/GLB.Ts+dSST.txt

into it and press submit. Here is what you get as output from WikiChecks:

GISS

Frequency of each final digit: observed vs. expected

0 1 2 3 4 5 6 7 8 9 Totals
Observed 298 292 276 266 239 265 257 228 249 239 2609
Expected 260 260 260 260 260 260 260 260 260 260 2609
Variance 5.13 3.59 0.82 0.08 1.76 0.05 0.04 4.02 0.50 1.76 17.75
Significant * . *
Statistic DF Obtained Prob Critical
Chi Square 9 17.75 <0.05 16.92
RESULT: Significant management detected. Significant variation in digit 0: (Pr<0.05) indicates rounding up or down. Significant variation in digit 1: (Pr<0.1) indicates management. Significant variation in digit 7: (Pr<0.05) indicates management.

Stockwell writes of the results:

The chi-square test is prone to produce false positives for small samples. Also, there are a number of innocent reasons that digit frequency may diverge from expected. However, the tests are very sensitive. Even if arithmetic operations are performed on data after the manipulations, the ‘fingerprint’ of human intervention can remain.

I also ran it on the UAH data and RSS data and it flagged similar issues, though with different deviation scores. Stockwell did the same and writes:

The results, listed from lowest deviation to highest are listed below.

RSS – Pr<1

GISS – Pr<0.05

CRU – Pr<0.01

UAH – Pr<0.001

Numbers such as missing values in the UAH data (-99.990) may have caused its high deviation. I don’t know about the others.

Not being familiar with this mathematical technique, there was little I could do to confirm or refute the findings, so I let it pass until I could get word of replication from some other source.

It didn’t take long. About two hours later,  Lubos Motl, of the Reference Frame posted his results obtained independently via another method when he ran some checks of his own:

David Stockwell has analyzed the frequency of the final digits in the temperature data by NASA’s GISS led by James Hansen, and he claims that the unequal distribution of the individual digits strongly suggests that the data have been modified by a human hand.

With Mathematica 7, such hypotheses take a few minutes to be tested. And remarkably enough, I must confirm Stockwell’s bold assertion.

But that’s not all, Lubos goes on to say:

Using the IPCC terminology for probabilities, it is virtually certain (more than 99.5%) that Hansen’s data have been tempered with.

To be fair, Lubos runs his test on UAH data as well:

It might be a good idea to audit our friends at UAH MSU where Stockwell seems to see an even stronger signal.

In plain English, I don’t see any evidence of man-made interventions into the climate in the UAH MSU data. Unlike Hansen, Christy and Spencer don’t seem to cheat, at least not in a visible way, while the GISS data, at least their final digits, seem to be of anthropogenic origin.

Steve McIntyre offered an explanation in the way rounding occurs when converting from Fahrenheit to Centigrade, but Lubos can’t seem to replicate the same results he gets from the GISS data:

Steve McIntyre has immediately offered an alternative explanation of the non-uniformity of the GISS final digits: rounding of figures calculated from other units of temperature. Indeed, I confirmed that this is an issue that can also generate a non-uniformity, up to 2:1 in the frequency of various digits, and you may have already downloaded an updated GISS notebook that discusses this issue.

I can’t get 4,7 underrepresented but there may exist a combination of two roundings that generates this effect. If this explanation is correct, it is a result of much less unethical approach of GISS than the explanation above. Nevertheless, it is still evidence of improper rounding.

Pretty strong stuff, but given the divergence of the GISS signal with other datasets, unsurprising.  I wonder if it isn’t some artifact of the GISS Homogenization process for surface temperature data, which I view as flawed in its application.

But let’s give the benefit of the doubt here. I want to see what GISS has to say about it, there may be a perfectly rational explanation that can be applied that will demonstrate that these statistical accusations are without merit. I’m sure they will post something on RC soon.

Stay tuned.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
161 Comments
Inline Feedbacks
View all comments
RBerteig
January 14, 2009 1:15 pm

I’ve been wondering when some of the forensic accounting anti-fraud techniques would get applied, but only knew enough about them myself to know that they exist.
First, Steve and Ross applied stats ideas from econometrics and found problems.
Then Anthony applied good old amateur science and found problems.
Now Stockwell has applied stats ideas from auditing and found problems.
I predict that Stockwell will be treated as well by the establishment as Anthony, Steve and Ross have been… but then that is an easy prediction to make.

RK
January 14, 2009 1:22 pm

The checking and double checking of scientific data is how basic science is done. Even if the source of the statistical anomaly turns out to be methodological rather than human malice, it would help to correct and eradicate the error. Nice work, everyone.

AKD
January 14, 2009 1:23 pm

<blockquote cite="Using the IPCC terminology for probabilities, it is virtually certain (more than 99.5%) that Hansen’s data have been tempered with."
Tempered or tampered? I suppose either could work in this case.

January 14, 2009 1:25 pm

Great stuff!
BTW, it’s suspicious that Hansen doesn’t sue people who assert that he changes numbers to support his crumbling theory. Me included. The only reason NOT to sue? Discovery.

stan
January 14, 2009 1:26 pm

Hasn’t someone done something similar with the direction of adjustments made to the temperature record and concluded that the adjustments demonstrate a clear pattern that isn’t random?

Pieter F
January 14, 2009 1:26 pm

Stockwell’s work is impressive. Understanding the essence of the process is beyond my pay grade, but the results grab my attention. Assuming the underlying analysis is tight, perhaps a formal peer-reviewed paper is warranted on this.

Les Johnson
January 14, 2009 1:26 pm

wow….I can see certain sectors of the blogosphere going psychotic over this….

Joe Black
January 14, 2009 1:26 pm

Management?
I’d suggest using the word manipulation.
(Rounding up? who’d have guessed?)

Noblesse Oblige
January 14, 2009 1:30 pm

Not to worry. This datum will be ‘corrected’in the future so as to produce or increase a warming trend, as has been the NASA SOP.

crosspatch
January 14, 2009 1:31 pm

“I want to see what GISS has to say about it”
Judging from Hansen’s past responses to things, it might be a long wait. He doesn’t seems to want to respond to “court jesters” (his terminology, not mine) who question his work.

January 14, 2009 1:33 pm

This is a very interesting way of detecting forgery of data. I’ve tried it on my data in the past. I then tried to edit some data introducing “humanized” random data. I was shocked with the results!
Ecotretas

Alan S. Blue
January 14, 2009 1:35 pm

What happens if you start with the “raw” data? That is, the completely unadjusted USHCN etc.?

Editor
January 14, 2009 1:36 pm

A few decades ago there was a flurry of activity when people looked at books of logarithm tables and noticed there was more wear on first pages (e.g. logs of 1-2) than on the final pages (e.g. logs of 9-10). That quickly expanded into the observations that many more numbers start with 1 than 9. While the studies were fascinating, they results were quite believable. Any number that had some multiplication behind it tended to fall into that sort of pattern. The logarithms of various number sets were more likely to be evenly distributed.
I bring that up because from what I recall, the number distributions were quite nice and even without untoward spikes. (Of course if the numbers were the final digit of prices in stores, 8, 9, and 0 are overrepresented, and of course, that has an anthropogenic basis!)
The final digit distribution above is quite worrisome. If it were not anomaly data, I’d suggest that perhaps there were transcription errors where ‘7’ was copied as ‘1’, but that’s not the case here.
The two outlier years are also worrisome, but I’d expect that there should be some sequences that manage to trip the alarm – there should be a similar number of bad years in the other data sets for other years. If there are not, that would be a serious sign that something is wrong with GISS data. Of course, the satellite data sets are too short, so HadCrut may be the best comparison.
Okay, okay, it was four decades ago….

StuartR
January 14, 2009 1:37 pm

I guess as said, rounding errors, the relatively small size of the sampled set and other things in the pipeline may emerge as a reasonable explanation.
I think the question needs to be put to bed, because it does seem to some (including me) that a clear confirmation bias is used in some of the less rigorous reporting that goes on and gets promoted, I mean just spin with the media.
However a suspicion that this could extend to the data itself could be very harmful, it needs to be cleared up. We need a clear delineation of the process from sampling, to weighting, to reporting. It will help cut through a lot of wasteful doubt.
Congrats to Mr Watts for being part of the process for doing just that ( and winning best Blog? 🙂 )

Adam Sullivan
January 14, 2009 1:38 pm

Hard for James Hansen to avoid this one. Any rational person would demand an explanation and be right to do so.

Mick
January 14, 2009 1:39 pm

This is huge. Truly the best science blog.

johnvanvliet
January 14, 2009 1:40 pm

As Steve McIntyre pointed out to Lubos when he wrote about this, an unequal distribution of final digits is expected when converting between Fahrenheit and Celsius.

January 14, 2009 1:44 pm

What are you going to do when you lose the argument, lie and cheat.
What’s wrong with this statement — Pay more in taxes to government, so government scientists can pretend to control the weather. Isn’t this statement an obvious to everyone conflict of interest?
I never understood what is so complicated about running a summation program on tables of numbers that the process had to be guarded as if it were a top secret nuclear weapon or some such. Wouldn’t it produce much better science if the process were completely open?
So give me one good reason why this entire analysis could not all be done under the publics’ gaze. Does the secrecy not just reek with phony and faked.

dahduh
January 14, 2009 1:50 pm

Here’s _one_ rational explanation: examine enough data sets and by pure chance some will appear to have been tampered with. Lubos says his significance is 99.5%; I’m not sure where he gets that. According to the WikiChecks data presented above, the Chi square probability is around 0.05, meaning one in 20 data sets will throw up something like this. So how many data sets produced by Hansen have been examined in this way, and how many of these appear ‘suspect’?

old construction worker
January 14, 2009 1:56 pm

What the………..?
I hope there is a sound explanation, if not Hansen and team must go.

insurgent
January 14, 2009 2:01 pm

Lubos says the 40 out of 10,000 had these properties

Mike Bryant
January 14, 2009 2:02 pm

Who is going to run this on the Mauna Loa CO2 record?

Gripegut
January 14, 2009 2:04 pm

Tarpon said:
“So give me one good reason why this entire analysis could not all be done under the publics’ gaze. Does the secrecy not just reek with phony and faked.”
I completely agree. The whole point of doing scientific experimentation is to record data that can be corroborated by anyone using the same experimental techniques. Any process of science that has to be hidden is by definition fraud, and this analysis further proves it.

REPLY:
It is not proven to be fraud, let’s not use the word. Give GISS a chance to respond and see what other issues may arise. – Anthony

Will
January 14, 2009 2:08 pm

I recreated your results on wikichecks. Then I stripped all the example text and years from the file and resubmitted. After all, we are pretty sure nobody has manipulated the digits in the years. When I resubmitted the stripped down version of the data, the probability of manhandling was reduced.
I’m no expert on this, so I could be doing something wrong without realizing.

Paddy
January 14, 2009 2:09 pm

Watts, McIntyre, McKitrick, Motl and Stockwell, watch your backs. [snip – none of that talk here]

1 2 3 7