Distribution analysis suggests GISS final temperature data is hand edited – or not

UPDATE: As I originally mentioned at the end of this post, I thought we should “give the benefit of the doubt” to GISS as there may be a perfectly rational explanation. Steve McIntyre indicates that he has done an analysis also and doubts the other analyses:

I disagree with both Luboš and David and don’t see anything remarkable in the distribution of digits.

I tend to trust Steve’s intuition and analysis skills,as his track record has been excellent. So at this point we don’t know what is the root cause or even if there is any human touch to the data. But as Lubos said on CA “there’s still an unexplained effect in the game”.

I’m sure it will get much attention as the results shake out.

UPDATE2: David Stockwell writes in comments here:

Hi,

I am gratified with the interest in this, very preliminary analysis. There’s a few points from the comments above.

1. False positives are possible, for a number of reasons.

2. Even though data are subjected to arithmetric operations, distortions in digit frequency at an earlier stage can still be observed.

3. The web site is still in development.

4. One of the deviant periods in GISS seems to be around 1940, the same as the ‘warmest year in the century’ and the ‘SST bucket collection’ issues.

5. Even if in the worst case there was manipulation, it wouldn’t affect AGW science much. The effect would be small. Its about something else. Take the Madoff fund. Even though investors knew the results were managed, they still invested because the payouts were real (for a while).

6. To my knowledge, noone has succeeded in exactly replicating the GISS data.

7. I picked that file as it is the most used – global land and ocean. I haven’t done an extensive search of files as I am still testing the site.

8. Lubos relicated this study more carefully, using only the monthly series and got the same result.

9. Benfords law (on the first digit) has a logarithmic distribution, and really only applies to data across many orders of magnitude. Measurement data that often has a constant first digit doesn’t work, although the second digit seems to. I don’t see why last digit wouldn’t work, and should approach a uniform distribution according to the Benford’s postulate.

That’s all for the moment. Thanks again.


This morning I received an email outlining some work that David Stockwell has done in some checking of the GISS global Land-Ocean temperature dataset:

Detecting ‘massaging’ of data by human hands is an area of statistical analysis I have been working on for some time, and devoted one chapter of my book, Niche Modeling, to its application to environmental data sets.

The WikiChecks web site now incorporates a script for doing a Benford’s analysis of digit frequency, sometimes used in numerical analysis of tax and other financial data.

The WikiChecks Site Says:

‘Managing’ or ‘massaging’ financial or other results can be a very serious deception. It ranges from rounding numbers up or down, to total fabrication. This system will detect the non-random frequency of digits associated with human intervention in natural number frequency.

Stockwell runs a test on GISS and writes:

One of the main sources of global warming information, the GISS data set from NASA showed significant management, particularly a deficiency of zeros and ones. Interestingly the moving window mode of the algorithm identified two years, 1940 and 1968 (see here).

You can actually run this test yourself, visit the WikiChecks web site, and paste the URL for the GISS dataset

http://data.giss.nasa.gov/gistemp/tabledata/GLB.Ts+dSST.txt

into it and press submit. Here is what you get as output from WikiChecks:

GISS

Frequency of each final digit: observed vs. expected

0 1 2 3 4 5 6 7 8 9 Totals
Observed 298 292 276 266 239 265 257 228 249 239 2609
Expected 260 260 260 260 260 260 260 260 260 260 2609
Variance 5.13 3.59 0.82 0.08 1.76 0.05 0.04 4.02 0.50 1.76 17.75
Significant * . *
Statistic DF Obtained Prob Critical
Chi Square 9 17.75 <0.05 16.92
RESULT: Significant management detected. Significant variation in digit 0: (Pr<0.05) indicates rounding up or down. Significant variation in digit 1: (Pr<0.1) indicates management. Significant variation in digit 7: (Pr<0.05) indicates management.

Stockwell writes of the results:

The chi-square test is prone to produce false positives for small samples. Also, there are a number of innocent reasons that digit frequency may diverge from expected. However, the tests are very sensitive. Even if arithmetic operations are performed on data after the manipulations, the ‘fingerprint’ of human intervention can remain.

I also ran it on the UAH data and RSS data and it flagged similar issues, though with different deviation scores. Stockwell did the same and writes:

The results, listed from lowest deviation to highest are listed below.

RSS – Pr<1

GISS – Pr<0.05

CRU – Pr<0.01

UAH – Pr<0.001

Numbers such as missing values in the UAH data (-99.990) may have caused its high deviation. I don’t know about the others.

Not being familiar with this mathematical technique, there was little I could do to confirm or refute the findings, so I let it pass until I could get word of replication from some other source.

It didn’t take long. About two hours later,  Lubos Motl, of the Reference Frame posted his results obtained independently via another method when he ran some checks of his own:

David Stockwell has analyzed the frequency of the final digits in the temperature data by NASA’s GISS led by James Hansen, and he claims that the unequal distribution of the individual digits strongly suggests that the data have been modified by a human hand.

With Mathematica 7, such hypotheses take a few minutes to be tested. And remarkably enough, I must confirm Stockwell’s bold assertion.

But that’s not all, Lubos goes on to say:

Using the IPCC terminology for probabilities, it is virtually certain (more than 99.5%) that Hansen’s data have been tempered with.

To be fair, Lubos runs his test on UAH data as well:

It might be a good idea to audit our friends at UAH MSU where Stockwell seems to see an even stronger signal.

In plain English, I don’t see any evidence of man-made interventions into the climate in the UAH MSU data. Unlike Hansen, Christy and Spencer don’t seem to cheat, at least not in a visible way, while the GISS data, at least their final digits, seem to be of anthropogenic origin.

Steve McIntyre offered an explanation in the way rounding occurs when converting from Fahrenheit to Centigrade, but Lubos can’t seem to replicate the same results he gets from the GISS data:

Steve McIntyre has immediately offered an alternative explanation of the non-uniformity of the GISS final digits: rounding of figures calculated from other units of temperature. Indeed, I confirmed that this is an issue that can also generate a non-uniformity, up to 2:1 in the frequency of various digits, and you may have already downloaded an updated GISS notebook that discusses this issue.

I can’t get 4,7 underrepresented but there may exist a combination of two roundings that generates this effect. If this explanation is correct, it is a result of much less unethical approach of GISS than the explanation above. Nevertheless, it is still evidence of improper rounding.

Pretty strong stuff, but given the divergence of the GISS signal with other datasets, unsurprising.  I wonder if it isn’t some artifact of the GISS Homogenization process for surface temperature data, which I view as flawed in its application.

But let’s give the benefit of the doubt here. I want to see what GISS has to say about it, there may be a perfectly rational explanation that can be applied that will demonstrate that these statistical accusations are without merit. I’m sure they will post something on RC soon.

Stay tuned.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
161 Comments
Inline Feedbacks
View all comments
Steven Talbot
January 16, 2009 10:21 am

Andrew,
I agree that original records should not be destroyed or overwritten, but they have not been. GISS (and others) present an analysis of those records that remain exisiting.
If your financial institution discovered an error from the past, would they not correct it in their analysis (my experience suggests that if I have been overpaid I don’t stay overpaid once the error is discovered!)?
Biased temperature readings are not the history of temperature but the history of recording error. It seems to me that, by your argument, UAH should still be reporting their faulty pre-2005 data, even though it’s known to be wrong.
In all fields there will be records from the past that are either unintentionally or intentionally biased. If we want to get closer to the truth then our analysis must take account of any known biases, in my view at least. 🙂

January 16, 2009 10:23 am

Steven Talbot:

What I do not share with many commentators here is the ready presumption that inaccuracies and adjustments are evidence of human bias in favour of showing a warming climate.

Niccolo Machiavelli wrote to a naive prince: “Men are evil, unless compelled to be good.”
Who can compel James Hansen to be honest? Hansen has accepted at least three quarters of a million dollars [that we know about] from entities with a strong global warming/AGW agenda — while the Best Science site is run on a pretty much voluntary basis.
Who are you gonna believe?
There isn’t much difference between believing what someone on the green payroll says, and believing what tobacco company front groups say. They are equally credible.
Finally, I would like to hear an explanation of why it is A-OK for someone in Hansen’s taxpayer-paid job [a job which allows him to arbitrarily “adjust,” “homogenize” and otherwise alter past climate data], to accept huge amounts of cash from groups that want him to push their agenda.
James Hansen is bought and paid for. He has endorsed lawbreaking to achieve his ends, therefore he is unethical; QED. So the presumption that he has deliberately corrupted GISS is warranted, IMHO.

Andrew
January 16, 2009 11:02 am

Stephen,
You can do whatever analysis you want and change whatever numbers you want in your analysis. As long as everyone knows that what you have is not the history but the analysis. Is that how it is perceived by everyone? Do the purveyors of the analysis disclose that and make sure everyone knows it? Or are you trying to say that analysis and history are the same thing? You may have a different analysis tomorrow or next week in light of new information. Do you see what I’m saying?
Andrew ♫

Andrew
January 16, 2009 11:47 am

Sorry, I spelled Steven wrong, my apologies.
Analogy: I got paid $20 instead of the $25 a week I’m supposed to get, from 1970-1990. Payroll discovered the mistake yesterday and I got retro paid and am getting the correct weekly salary *now*.
Still doesn’t change the fact that I got paid $20 instead of $25 all that time. If someone said I got paid $25 a week from 1970-1990 they would be wrong. They could say I was *supposed* to get $25 from 1970-1990.
Andrew ♫

Steven Talbot
January 17, 2009 5:24 am

Andrew,
Yes, I do see your point, and agree that a casual observer is unlikely to realise that the temperature analysis has been subject to processing. Of course, this is true of all the analyses, not just GISS. The satellites don’t even directly measure temperature in the first place, and the fact that RSS, UAH and others come up with different figures from the same input data shows us that there is no ‘absolute truth’ to be had. The satellite records are themselves subject to ongoing corrections, of course. Why are many people so concerned with USHCN/GISS corrections and not with those others, I wonder? The sum total of corrections to the global analysis does not seem to me to be of great consequence in terms of how it affects our judgment as to what to do, if anything (that’s JMV, of course, and I realise that others will disagree). Anthony’s link above is, of course, to the US analysis. Either way, it’s hardly evidence of deception, since the adjustments are openly explained in Hansen’s papers. I think we may have to agree to differ as to whether or not it offers us a more realisitic view of the true history of actual mean temperatures, as opposed to the history of recorded observations. 🙂
Smokey,
Hansen has accepted at least three quarters of a million dollars [that we know about] from entities with a strong global warming/AGW agenda
Do you mean in terms of personal remuneration? Do you have links in respect of that? I certainly didn’t know of it.
while the Best Science site is run on a pretty much voluntary basis.
Who are you gonna believe?

Well, I certainly don’t believe something because it’s voluntarily funded. There’s all sorts of stuff on the internet of that kind that I don’t believe! Actually, I know nothing of however Anthony Watts makes a living and think that’s none of my business. I try to assess and check out what I read regardless of such knowledge.
There isn’t much difference between believing what someone on the green payroll says, and believing what tobacco company front groups say. They are equally credible.
So therefore you don’t think that Fred Singer, for example, is credible? Personally I think it’s best to look at the science rather than making presumptions of that kind (I actually don’t think Singer is credible, but that’s because of what he has said rather than because of his associations with tobacco).
James Hansen is bought and paid for. He has endorsed lawbreaking to achieve his ends, therefore he is unethical; QED.
If you mean his testimony in respect of the Kingsnorth Six then no, he has not endorsed lawbreaking. They were found not guilty, and therefore did not break the law. You may think they should have been found guilty, but a UK jury decided otherwise. To suggest that justifies “the presumption that he has deliberately corrupted GISS” only explains your personal attitude.

January 17, 2009 7:11 am

Steven Talbot:
Hansen’s payola has been widely reported. Google “David foundation, Hansen” to start. Or “Hansen, Gore”. Or check this out:
http://www.canadafreepress.com/index.php/article/3671
It’s interesting that of the thousands upon thousands of public servants being paid solely by taxpayers, this particular individual gets so much outside loot — and only from organizations with a very heavy, one-sided, pro-AGW agenda.
Big money like that is a corrupting influence; that’s its unstated purpose. It’s like a local hood paying off the beat cop on the side. Justice takes a back seat. Every time Hansen has taken a big chunk of cash he has ratcheted up his wild-eyed AGW scenarios. That should tell you all you need to know about what’s going on.
If Hansen hands that cash to the U.S. Treasury or to charity, I’ll retract. But Hansen appears to be bought and paid for by outside interests. Where does that leave honest science? Where does that leave the taxpaying public?

Andrew
January 17, 2009 7:24 am

Steven,
My follow-up about any analysis is this: Does it sound like good judgement to say the recorded past temp is not good enough to use by itself, but it is good enough to stack adjustments on? That it’s wrong to use but right enough to use?
It would seem to me that you would have to know what the temp is supposed to be before you can do any reliable adjustment for your analysis.
Andrew

Pamela Gray
January 17, 2009 8:32 am

The gold standard study on facilitated communication used untrained volunteer college students. The subjects were told to only allow the person with autism to lay their forearms in their open hands above a desk. They were not told why or what the person with autism was supposed to do and did not see the keyboard until they entered the room. They were then given a question along with the person with autism, and then “supported the forearms” of the person with autism while they typed out the answer. The answers were clearly connected to the question. A second random group of untrained volunteer college students were given the same task of “facilitating” the person with autism to type out the answer to a question but were not told what the question was, only the person with autism was given the question. The typed answers were nonsense. Unwittingly, but with good intentions, the facilitators were internally motivated to help produce what they thought was the desired outcome, but only if they had come to some conclusion in their own mind of what the outcome was supposed to be beforehand. It is human nature to be biased.
Scientists must always struggle, daily and even every hour, to remain relatively free of bias. Trust me, I know this feeling. Hansen does not appear to even try to guard himself against it, and because of that lack of diligence, I believe is at high risk of biasing his own research.

Andrew
January 17, 2009 12:04 pm

Pamela,
Indeed. We need to have a disclaimer attached to everything from climate science that says:
*All temps are subject to change at a later date. The data we used to reach that conclusion is subject to change at a later date. The conclusion you just read is subject to change at a later date. D’oh… Never mind.
Andrew ♫

Jeff Alberts
January 18, 2009 12:24 pm

Unwittingly, but with good intentions, the facilitators were internally motivated to help produce what they thought was the desired outcome, but only if they had come to some conclusion in their own mind of what the outcome was supposed to be beforehand.

I still seriously doubt the majority did so unwittingly. There’s no way to tell of they were doing something on purpose or not, except their word.

1 5 6 7