Distribution analysis suggests GISS final temperature data is hand edited – or not

UPDATE: As I originally mentioned at the end of this post, I thought we should “give the benefit of the doubt” to GISS as there may be a perfectly rational explanation. Steve McIntyre indicates that he has done an analysis also and doubts the other analyses:

I disagree with both Luboš and David and don’t see anything remarkable in the distribution of digits.

I tend to trust Steve’s intuition and analysis skills,as his track record has been excellent. So at this point we don’t know what is the root cause or even if there is any human touch to the data. But as Lubos said on CA “there’s still an unexplained effect in the game”.

I’m sure it will get much attention as the results shake out.

UPDATE2: David Stockwell writes in comments here:

Hi,

I am gratified with the interest in this, very preliminary analysis. There’s a few points from the comments above.

1. False positives are possible, for a number of reasons.

2. Even though data are subjected to arithmetric operations, distortions in digit frequency at an earlier stage can still be observed.

3. The web site is still in development.

4. One of the deviant periods in GISS seems to be around 1940, the same as the ‘warmest year in the century’ and the ‘SST bucket collection’ issues.

5. Even if in the worst case there was manipulation, it wouldn’t affect AGW science much. The effect would be small. Its about something else. Take the Madoff fund. Even though investors knew the results were managed, they still invested because the payouts were real (for a while).

6. To my knowledge, noone has succeeded in exactly replicating the GISS data.

7. I picked that file as it is the most used – global land and ocean. I haven’t done an extensive search of files as I am still testing the site.

8. Lubos relicated this study more carefully, using only the monthly series and got the same result.

9. Benfords law (on the first digit) has a logarithmic distribution, and really only applies to data across many orders of magnitude. Measurement data that often has a constant first digit doesn’t work, although the second digit seems to. I don’t see why last digit wouldn’t work, and should approach a uniform distribution according to the Benford’s postulate.

That’s all for the moment. Thanks again.


This morning I received an email outlining some work that David Stockwell has done in some checking of the GISS global Land-Ocean temperature dataset:

Detecting ‘massaging’ of data by human hands is an area of statistical analysis I have been working on for some time, and devoted one chapter of my book, Niche Modeling, to its application to environmental data sets.

The WikiChecks web site now incorporates a script for doing a Benford’s analysis of digit frequency, sometimes used in numerical analysis of tax and other financial data.

The WikiChecks Site Says:

‘Managing’ or ‘massaging’ financial or other results can be a very serious deception. It ranges from rounding numbers up or down, to total fabrication. This system will detect the non-random frequency of digits associated with human intervention in natural number frequency.

Stockwell runs a test on GISS and writes:

One of the main sources of global warming information, the GISS data set from NASA showed significant management, particularly a deficiency of zeros and ones. Interestingly the moving window mode of the algorithm identified two years, 1940 and 1968 (see here).

You can actually run this test yourself, visit the WikiChecks web site, and paste the URL for the GISS dataset

http://data.giss.nasa.gov/gistemp/tabledata/GLB.Ts+dSST.txt

into it and press submit. Here is what you get as output from WikiChecks:

GISS

Frequency of each final digit: observed vs. expected

0 1 2 3 4 5 6 7 8 9 Totals
Observed 298 292 276 266 239 265 257 228 249 239 2609
Expected 260 260 260 260 260 260 260 260 260 260 2609
Variance 5.13 3.59 0.82 0.08 1.76 0.05 0.04 4.02 0.50 1.76 17.75
Significant * . *
Statistic DF Obtained Prob Critical
Chi Square 9 17.75 <0.05 16.92
RESULT: Significant management detected. Significant variation in digit 0: (Pr<0.05) indicates rounding up or down. Significant variation in digit 1: (Pr<0.1) indicates management. Significant variation in digit 7: (Pr<0.05) indicates management.

Stockwell writes of the results:

The chi-square test is prone to produce false positives for small samples. Also, there are a number of innocent reasons that digit frequency may diverge from expected. However, the tests are very sensitive. Even if arithmetic operations are performed on data after the manipulations, the ‘fingerprint’ of human intervention can remain.

I also ran it on the UAH data and RSS data and it flagged similar issues, though with different deviation scores. Stockwell did the same and writes:

The results, listed from lowest deviation to highest are listed below.

RSS – Pr<1

GISS – Pr<0.05

CRU – Pr<0.01

UAH – Pr<0.001

Numbers such as missing values in the UAH data (-99.990) may have caused its high deviation. I don’t know about the others.

Not being familiar with this mathematical technique, there was little I could do to confirm or refute the findings, so I let it pass until I could get word of replication from some other source.

It didn’t take long. About two hours later,  Lubos Motl, of the Reference Frame posted his results obtained independently via another method when he ran some checks of his own:

David Stockwell has analyzed the frequency of the final digits in the temperature data by NASA’s GISS led by James Hansen, and he claims that the unequal distribution of the individual digits strongly suggests that the data have been modified by a human hand.

With Mathematica 7, such hypotheses take a few minutes to be tested. And remarkably enough, I must confirm Stockwell’s bold assertion.

But that’s not all, Lubos goes on to say:

Using the IPCC terminology for probabilities, it is virtually certain (more than 99.5%) that Hansen’s data have been tempered with.

To be fair, Lubos runs his test on UAH data as well:

It might be a good idea to audit our friends at UAH MSU where Stockwell seems to see an even stronger signal.

In plain English, I don’t see any evidence of man-made interventions into the climate in the UAH MSU data. Unlike Hansen, Christy and Spencer don’t seem to cheat, at least not in a visible way, while the GISS data, at least their final digits, seem to be of anthropogenic origin.

Steve McIntyre offered an explanation in the way rounding occurs when converting from Fahrenheit to Centigrade, but Lubos can’t seem to replicate the same results he gets from the GISS data:

Steve McIntyre has immediately offered an alternative explanation of the non-uniformity of the GISS final digits: rounding of figures calculated from other units of temperature. Indeed, I confirmed that this is an issue that can also generate a non-uniformity, up to 2:1 in the frequency of various digits, and you may have already downloaded an updated GISS notebook that discusses this issue.

I can’t get 4,7 underrepresented but there may exist a combination of two roundings that generates this effect. If this explanation is correct, it is a result of much less unethical approach of GISS than the explanation above. Nevertheless, it is still evidence of improper rounding.

Pretty strong stuff, but given the divergence of the GISS signal with other datasets, unsurprising.  I wonder if it isn’t some artifact of the GISS Homogenization process for surface temperature data, which I view as flawed in its application.

But let’s give the benefit of the doubt here. I want to see what GISS has to say about it, there may be a perfectly rational explanation that can be applied that will demonstrate that these statistical accusations are without merit. I’m sure they will post something on RC soon.

Stay tuned.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
161 Comments
Inline Feedbacks
View all comments
just me
January 14, 2009 2:10 pm

:(… actually, i do not understand the “skeptics”. You have the raw data, you have GISS code, you can check it. Do it.
I do not know, there is one temperature series that had (has?) really big problems. The trends were completely wrong. You still know that, don’t you? It is the UAH temperature series. The UAH data was totally flawed. And you are not skeptical about their data today? You do not have their source code.

REPLY:
I’d point out that FORTRAN code provided by GISS is so antiquated and so environment specific, that to my knowledge, nobody that has tried, has been able to get it to run. – Anthony

Adam Sullivan
January 14, 2009 2:14 pm

Here is something that might be worth placing on a bumper sticker:
“The Debate is Over: Audit Hansen”

Editor
January 14, 2009 2:15 pm

http://icecap.us/images/uploads/Stationdropout.jpg
ICECAP blog has an interesting story documenting how 2/3rds of the ground stations dropped out or were kicked out of the system around 1990, most of which were rurally located, at the time that global measured temperatures started averaging higher….

Michael
January 14, 2009 2:19 pm

SM on Climate Audit posts that he sees no problem with the datta distribution.

dave
January 14, 2009 2:19 pm

Doesn’t GISS rely heavily on manually entered data from volunteer reporting stations? Could inaccurate daa entry there create an anomoly that would persist through the aggregation process?
I think someone needs to go back to the raw weather station data and see if there is evidence of data manipulation there.

just me
January 14, 2009 2:26 pm

@Anthony
oh yeah, evil Fortran. What do you think the researches do @GISS? Rewriting the code in a new fancy programming language every year? Of course, there are new, better languages now. Furthermore, the code of scientists is often not the best. It has to work. Nothing more.
Is the clearclimatecode.org project still running? They started to re-implement the code in Python. IMHO, that is a good project that cooperated with GISS and tried to help everybody. I hope it is still running.

lulo
January 14, 2009 2:33 pm

I suspect that the problem is down to rounding errors. With the urban heat island growth, the rural station removal and a solar Grand Maximum on their side, they probably would have had no reason to fudge the record. I think it’s great that you’re holding them to a high standard, however.

lulo
January 14, 2009 2:33 pm

(Above, I meant the rounding ‘process,’ not ‘errors.’)

Raven
January 14, 2009 2:34 pm

Anthony,
An interesting post but I think a reference to Steve Mc conversion hypothesis near the top of the artical would be appropriate because these kinds of stories will get blown out of proportion on the blogosphere if the caveats are not make extremely clear.

crosspatch
January 14, 2009 2:40 pm

Mike Lorrey:
Yes that station dropout corresponding to warmer temperatures is widely known. Steven McIntyre has brought it up before and it comes up from time to time in most climate related blogs. The station data are still there, you can get the monthly data over the Internet. NOAA never has responded, as far as I know, as to why they were dropped from the record.
I have wondered why nobody has attempted to “recover” the station data to see if it changes those temperatures. It seems like it could be put to rest easily enough, but nobody ever had.

Flanagan
January 14, 2009 2:47 pm

For what I remember, an equal distribution of digits in a numerical signal is the sign of a homogenously distributed random process, isn’t it ? For example: if a signal is, say, periodic with amplitude oscillating between 3.5 and 4.2, what will be the chance to get a figure ending with .3?
As was mentioned, there is in addition the roundup errors associated with switching from F to C.

DAV
January 14, 2009 2:48 pm

I know this is going to come across as negative but that test needs very special care. I just plugged in the beaten lengths from some 1999 horse races for the first listed 300 horses and got that the data were managed. The values were limited to ‘9’ meaning that 9 is really 9+. The rounding part was correct — the values were rounded to integers. Outside of that, no manipulation was done. So, apparently, any data obtained from an instrument that steps and limits the output, like say a digital voltmeter, or even something like trunc(tanh(x)*10+0.5)) is going to come across as ‘managed’.
By ‘special care’ it means it should only be applied to data that are simply tabulated like they are in financial records. ANY processing of the data beyond this is going to be flagged as ‘managed’, which, in a sense, is correct but hardly ‘faked’.

Significant variation in digit 0: (Pr<0.01) indicates rounding up or down.
Significant variation in digit 1: (Pr<0.1) indicates management.
Significant variation in digit 2: (Pr<0.1) indicates management.
Significant variation in digit 4: (Pr<0.1) indicates management.
Significant variation in digit 5: (Pr<0.05) indicates rounding to half.
Significant variation in digit 6: (Pr<0.01) indicates management.
Significant variation in digit 9: (Pr<0.001) indicates management.

data

8 6 4 3 1 0 4 2 7 0 7 7 1 8 3 7 0 9 8 9 8 4 9 5 8 0 8 0 2 5 5 5 9 4 6 4 4
0 9 2 9 6 2 2 9 6 9 0 9 0 7 3 7 7 7 3 5 5 0 7 9 8 9 9 4 9 9 3 7 4 0 9 9 9
9 2 0 9 2 9 9 9 9 9 9 2 2 0 9 9 9 4 1 0 1 1 9 1 0 6 0 0 9 9 3 8 5 9 0 2 9
9 1 7 2 0 7 0 3 2 5 0 3 5 9 6 5 4 0 4 3 7 4 8 5 9 4 0 8 2 9 9 8 7 8 9 7 0
6 9 4 9 9 9 9 9 9 3 0 1 3 9 0 1 1 9 9 5 9 8 0 6 6 4 9 9 2 3 3 9 9 0 2 1 4
0 3 2 8 7 9 9 6 8 9 0 3 9 9 8 9 1 9 5 5 8 0 9 0 0 5 9 0 9 3 9 9 8 5 9 1 2
9 0 5 9 8 2 6 9 0 9 9 4 9 9 9 5 0 6 7 1 0 3 9 8 0 9 0 9 6 1 9 9 3 9 9 8 1
0 3 3 0 9 4 3 9 0 9 8 9 6 9 1 0 9 1 4 7 1 0 1 9 2 7 8 3 9 7 8 3 9 8 9 0 9
0 0 9 7

When I plug in the values (still rounded but not limited to 9) I get:

Significant variation in digit 0: (Pr<0.001) indicates rounding up or down.
Significant variation in digit 9: (Pr<0.05) indicates management.

When I plug in the finish positions (which don’t have many zeroes) but not managed at all I get:

RESULT: Extremely Significant management detected.
Significant variation in digit 0: (Pr<0.001) indicates rounding up or down.
Significant variation in digit 1: (Pr<0.1) indicates management.
Significant variation in digit 2: (Pr<0.1) indicates management.
Significant variation in digit 5: (Pr<0.1) indicates rounding to half.
Significant variation in digit 8: (Pr<0.01) indicates management.

RyanO
January 14, 2009 2:50 pm

I have no great love of Hansen or GISS, but rather than accusing them of deliberate falsification, I’ll wait for proof. Showing the data has likely been manipulated is not the same thing as showing that it was for deceptive purposes.
.
By the way . . . congrats on the victory, Anthony!

January 14, 2009 2:54 pm

Hi,
I am gratified with the interest in this, very preliminary analysis. There’s a few points from the comments above.
1. False positives are possible, for a number of reasons.
2. Even though data are subjected to arithmetric operations, distortions in digit frequency at an earlier stage can still be observed.
3. The web site is still in development.
4. One of the deviant periods in GISS seems to be around 1940, the same as the ‘warmest year in the century’ and the ‘SST bucket collection’ issues.
5. Even if in the worst case there was manipulation, it wouldn’t affect AGW science much. The effect would be small. Its about something else. Take the Madoff fund. Even though investors knew the results were managed, they still invested because the payouts were real (for a while).
6. To my knowledge, noone has succeeded in exactly replicating the GISS data.
7. I picked that file as it is the most used – global land and ocean. I haven’t done an extensive search of files as I am still testing the site.
8. Lubos relicated this study more carefully, using only the monthly series and got the same result.
9. Benfords law (on the first digit) has a logarithmic distribution, and really only applies to data across many orders of magnitude. Measurement data that often has a constant first digit doesn’t work, although the second digit seems to. I don’t see why last digit wouldn’t work, and should approach a uniform distribution according to the Benford’s postulate.
That’s all for the moment. Thanks again.

Frank Perdicaro
January 14, 2009 3:18 pm

There may be bad at the root of this, but
perhaps not. I have a few patents in process
on related data conversion items.
First I agree with the general concept of digit analysis
to see if there is a Gaussian distribution. If there is
not a Gaussian, well-developed techniques can be
used to point out improbability of number sets.
But, this ONLY works if the underlying data has a
true Gaussian distribution! Is that the case here? We
do not know.
Here are two common (in this realm of science) counter
examples.
1) Imagine the actual temperature measurement was done
in a synthetic unit. In this example, the unit is Franks, and
a Frank is equal to 4 normal Fahrenheit degrees.
No matter how you convert from Franks to Fahrenheits,
you will not get a Gaussian distribution of digits on the
Fahrenheits scale.
The distribution you do get will be convolved with a product
of the prime factor decomposition of the relationship of the
scales. In this example, the only relative prime is 2.
Pure mathematics in this area are “Lattice Analysis” and
“Spanning Spaces”. I could (and have) gone on for hundreds
of pages, but I will stop here.
2) Instead of a formulaic conversion from Franks to Fahrenheit,
a look up table is used. The look up table is constructed so
as to have a minimized maximum error across the whole table.
But the whole table is not used! Only part of the table is used,
and that part does not have minimized error.
In the limit, where only one entry in the LUT is used, large errors
are introduced, and the distribution is a discrete, like a Dirac
delta function.
This problem has a whiff of fraud, but also smells strongly of
incorrect application of differential mathematics where the problem
is discrete in nature, and thus discrete calculus must be used.
Application of differential techniques to a discrete discontinuous
data set _always_ gives the wrong result.

crosspatch
January 14, 2009 3:20 pm

Another thing that might be a factor. In some cases a data value is missing in the raw data stream and a value is calculated to “fill” the missing value. There might be something in that calculation of fill values that tends to favor certain outcomes.

January 14, 2009 3:29 pm

Congratulations on the ‘Best Science Blog’ Anthony. An award well earned.
This article re the adjusting of temperature data if verified means that the suspicions of a lot of people have voiced without more than anecdotal evidence may well be true. We have been lied to, consistently, by the pro AGW camp.
Now if someone with the right mathematical and accounting know how could confirm the audit trail of ‘carbon trading’ funds back to the guilty parties lining their pockets on the back of ‘Anthropogenic Climate Change’………

DAV
January 14, 2009 3:49 pm

Perdicaro (15:18:07)
Well put. I’d venture to say again almost ANY manipulation (say multiplication by 2 which yields only even values) will likely be flagged. The fact that the others are also flagged should be a clue. I’m willing to bet that the output of every D/A converter will also get flagged particularly if the output is in the non-linear range.
Even if it were absolute proof of faking at GISS it wouldn’t count much in public opinion because of its technical quality. Think about it. If people ignore increasing ice fields and stalled temperature rises and all that, why would the distribution of digits in reported values suddenly convince them? It even comes across as a desperate cry.

TerryS
January 14, 2009 3:54 pm

Doesn’t GISS “fill in the blanks”. i.e. Calculate values for missing months based on other years/seasons in the dataset?
Since these filled in values would have a relationship with other months it might skew the results of the test.

George E. Smith
January 14, 2009 3:55 pm

“” Ric Werme (13:36:21) :
A few decades ago there was a flurry of activity when people looked at books of logarithm tables and noticed there was more wear on first pages (e.g. logs of 1-2) than on the final pages (e.g. logs of 9-10). That quickly expanded into the observations that many more numbers start with 1 than 9. While the studies were fascinating, they results were quite believable. Any number that had some multiplication behind it tended to fall into that sort of pattern. The logarithms of various number sets were more likely to be evenly distributed. “”
Well I wouldn’t be so quick to jump to conclusions of malfeasance.
Log(2) =0.301 while log (9) is 0.954 or 1-0.046
So the gap between 1 and 2 is about 6.6 times the gap between 9 and 10 in the log tables so I would expect about 6.6 times as much usage of the pages between 1 and 2, compared to between 9 and 10.
No conspiracy at all.

David Ermer
January 14, 2009 3:56 pm

If the GISS temperature adjustments were simply done 1) in public and/or 2) enough information was supplied so that they could be replicated, none of this ancillary statistical analysis (that may or may not be significant/relevant) would need to be done.
It isn’t incumbent on any one to prove there is a problem with the data. It is up to the “scientist” that generates to the data to do it in such a way that the question of a problem never comes up.

George Patch
January 14, 2009 3:58 pm

Of course this same analysis could be done at the individual station level on the raw data.
We already know that data from Dew Line stations were fabricated:
http://wattsupwiththat.com/2008/07/17/fabricating-temperatures-on-the-dew-line/
Watt about other stations?
How about Mohonk Lake:
http://wattsupwiththat.com/2008/09/17/calling-all-climate-sleuths/

January 14, 2009 4:06 pm

The following is not intended to be, nor should it be taken, as legal advice. No attorney-client relationship is established by these comments. Any person desiring legal advice should consult an attorney.
Fraud: It is well to be very careful in using that word to describe another. A false accusation of fraud could be grounds for a defamation lawsuit. I say *could* because there are a lot of factors involved.
Fraud generally also requires an intent to deceive, which is very difficult to prove. Different jurisdictions have slightly different wordings and requirements.
Please, as Anthony stated, let us wait for more analysis and confirmed results.
Roger E. Sowell, Esq.
Marina del Rey, California

John Philip
January 14, 2009 4:07 pm

I confidently predict that this will turn out to be a non-story. As our host is a meteorologist and interested in matters climatic, and on the day that WUWT was awarded Best Science Blog, can we expect some comment on the news that the American Meteorological Society has honoured the custodian of the GISS dataset with its highest commendation, the href=”http://www.nasa.gov/centers/goddard/news/topstory/2009/hansen_ams.html”> Carl-Gustaf Rossby Research Medal.
Newsworthy, surely?

John Philip
January 14, 2009 4:13 pm

I confidently predict that this will turn out to be a non-story. As our host is a meteorologist and interested in matters climatic, and on the day that WUWT was awarded Best Science Blog, can we expect some comment on the news that the American Meteorological Society has honoured the custodian of the GISS dataset with its highest commendation, the Carl-Gustaf Rossby Research Medal.
Or not?