This provides more evidence that normalizing to null is conservative. Had I normalized to 0.03, the slope would have been reduced further.
How did the El Chichón and Pinatubo volcanic eruptions affect global temperature records? – Part 2
14 01 2009Comments : 88 Comments »
Categories : climate data
Distribution analysis suggests GISS final temperature data is hand edited – or not
14 01 2009UPDATE: As I originally mentioned at the end of this post, I thought we should “give the benefit of the doubt” to GISS as there may be a perfectly rational explanation. Steve McIntyre indicates that he has done an analysis also and doubts the other analyses:
I disagree with both Luboš and David and don’t see anything remarkable in the distribution of digits.
I tend to trust Steve’s intuition and analysis skills,as his track record has been excellent. So at this point we don’t know what is the root cause or even if there is any human touch to the data. But as Lubos said on CA “there’s still an unexplained effect in the game”.
I’m sure it will get much attention as the results shake out.
UPDATE2: David Stockwell writes in comments here:
Hi,
I am gratified with the interest in this, very preliminary analysis. There’s a few points from the comments above.
1. False positives are possible, for a number of reasons.
2. Even though data are subjected to arithmetric operations, distortions in digit frequency at an earlier stage can still be observed.
3. The web site is still in development.
4. One of the deviant periods in GISS seems to be around 1940, the same as the ‘warmest year in the century’ and the ‘SST bucket collection’ issues.
5. Even if in the worst case there was manipulation, it wouldn’t affect AGW science much. The effect would be small. Its about something else. Take the Madoff fund. Even though investors knew the results were managed, they still invested because the payouts were real (for a while).
6. To my knowledge, noone has succeeded in exactly replicating the GISS data.
7. I picked that file as it is the most used – global land and ocean. I haven’t done an extensive search of files as I am still testing the site.
8. Lubos relicated this study more carefully, using only the monthly series and got the same result.
9. Benfords law (on the first digit) has a logarithmic distribution, and really only applies to data across many orders of magnitude. Measurement data that often has a constant first digit doesn’t work, although the second digit seems to. I don’t see why last digit wouldn’t work, and should approach a uniform distribution according to the Benford’s postulate.
That’s all for the moment. Thanks again.
This morning I received an email outlining some work that David Stockwell has done in some checking of the GISS global Land-Ocean temperature dataset:
Detecting ‘massaging’ of data by human hands is an area of statistical analysis I have been working on for some time, and devoted one chapter of my book, Niche Modeling
, to its application to environmental data sets.
The WikiChecks
web site now incorporates a script for doing a Benford’s analysis of digit frequency, sometimes used in numerical analysis of tax and other financial data.
The WikiChecks Site Says:
‘Managing’ or ‘massaging’ financial or other results can be a very serious deception. It ranges from rounding numbers up or down, to total fabrication. This system will detect the non-random frequency of digits associated with human intervention in natural number frequency.
Stockwell runs a test on GISS and writes:
One of the main sources of global warming information, the GISS data set
from NASA showed significant management, particularly a deficiency of zeros and ones. Interestingly the moving window mode of the algorithm identified two years, 1940 and 1968 (see here
).
You can actually run this test yourself, visit the WikiChecks web site, and paste the URL for the GISS dataset
http://data.giss.nasa.gov/gistemp/tabledata/GLB.Ts+dSST.txt
into it and press submit. Here is what you get as output from WikiChecks:
GISS Frequency of each final digit: observed vs. expected
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Totals | |
| Observed | 298 | 292 | 276 | 266 | 239 | 265 | 257 | 228 | 249 | 239 | 2609 |
| Expected | 260 | 260 | 260 | 260 | 260 | 260 | 260 | 260 | 260 | 260 | 2609 |
| Variance | 5.13 | 3.59 | 0.82 | 0.08 | 1.76 | 0.05 | 0.04 | 4.02 | 0.50 | 1.76 | 17.75 |
| Significant | * | . | * | ||||||||
| Statistic | DF | Obtained | Prob | Critical |
|---|---|---|---|---|
| Chi Square | 9 | 17.75 | <0.05 | 16.92 |
Comments : 161 Comments »
Categories : Science, climate data











RSS - Posts
Recent Comments