Distribution analysis suggests GISS final temperature data is hand edited – or not

UPDATE: As I originally mentioned at the end of this post, I thought we should “give the benefit of the doubt” to GISS as there may be a perfectly rational explanation. Steve McIntyre indicates that he has done an analysis also and doubts the other analyses:

I disagree with both Luboš and David and don’t see anything remarkable in the distribution of digits.

I tend to trust Steve’s intuition and analysis skills,as his track record has been excellent. So at this point we don’t know what is the root cause or even if there is any human touch to the data. But as Lubos said on CA “there’s still an unexplained effect in the game”.

I’m sure it will get much attention as the results shake out.

UPDATE2: David Stockwell writes in comments here:

Hi,

I am gratified with the interest in this, very preliminary analysis. There’s a few points from the comments above.

1. False positives are possible, for a number of reasons.

2. Even though data are subjected to arithmetric operations, distortions in digit frequency at an earlier stage can still be observed.

3. The web site is still in development.

4. One of the deviant periods in GISS seems to be around 1940, the same as the ‘warmest year in the century’ and the ‘SST bucket collection’ issues.

5. Even if in the worst case there was manipulation, it wouldn’t affect AGW science much. The effect would be small. Its about something else. Take the Madoff fund. Even though investors knew the results were managed, they still invested because the payouts were real (for a while).

6. To my knowledge, noone has succeeded in exactly replicating the GISS data.

7. I picked that file as it is the most used – global land and ocean. I haven’t done an extensive search of files as I am still testing the site.

8. Lubos relicated this study more carefully, using only the monthly series and got the same result.

9. Benfords law (on the first digit) has a logarithmic distribution, and really only applies to data across many orders of magnitude. Measurement data that often has a constant first digit doesn’t work, although the second digit seems to. I don’t see why last digit wouldn’t work, and should approach a uniform distribution according to the Benford’s postulate.

That’s all for the moment. Thanks again.


This morning I received an email outlining some work that David Stockwell has done in some checking of the GISS global Land-Ocean temperature dataset:

Detecting ‘massaging’ of data by human hands is an area of statistical analysis I have been working on for some time, and devoted one chapter of my book, Niche Modeling, to its application to environmental data sets.

The WikiChecks web site now incorporates a script for doing a Benford’s analysis of digit frequency, sometimes used in numerical analysis of tax and other financial data.

The WikiChecks Site Says:

‘Managing’ or ‘massaging’ financial or other results can be a very serious deception. It ranges from rounding numbers up or down, to total fabrication. This system will detect the non-random frequency of digits associated with human intervention in natural number frequency.

Stockwell runs a test on GISS and writes:

One of the main sources of global warming information, the GISS data set from NASA showed significant management, particularly a deficiency of zeros and ones. Interestingly the moving window mode of the algorithm identified two years, 1940 and 1968 (see here).

You can actually run this test yourself, visit the WikiChecks web site, and paste the URL for the GISS dataset

http://data.giss.nasa.gov/gistemp/tabledata/GLB.Ts+dSST.txt

into it and press submit. Here is what you get as output from WikiChecks:

GISS

Frequency of each final digit: observed vs. expected

0 1 2 3 4 5 6 7 8 9 Totals
Observed 298 292 276 266 239 265 257 228 249 239 2609
Expected 260 260 260 260 260 260 260 260 260 260 2609
Variance 5.13 3.59 0.82 0.08 1.76 0.05 0.04 4.02 0.50 1.76 17.75
Significant * . *
Statistic DF Obtained Prob Critical
Chi Square 9 17.75 <0.05 16.92
RESULT: Significant management detected. Significant variation in digit 0: (Pr<0.05) indicates rounding up or down. Significant variation in digit 1: (Pr<0.1) indicates management. Significant variation in digit 7: (Pr<0.05) indicates management.

Stockwell writes of the results:

The chi-square test is prone to produce false positives for small samples. Also, there are a number of innocent reasons that digit frequency may diverge from expected. However, the tests are very sensitive. Even if arithmetic operations are performed on data after the manipulations, the ‘fingerprint’ of human intervention can remain.

I also ran it on the UAH data and RSS data and it flagged similar issues, though with different deviation scores. Stockwell did the same and writes:

The results, listed from lowest deviation to highest are listed below.

RSS – Pr<1

GISS – Pr<0.05

CRU – Pr<0.01

UAH – Pr<0.001

Numbers such as missing values in the UAH data (-99.990) may have caused its high deviation. I don’t know about the others.

Not being familiar with this mathematical technique, there was little I could do to confirm or refute the findings, so I let it pass until I could get word of replication from some other source.

It didn’t take long. About two hours later,  Lubos Motl, of the Reference Frame posted his results obtained independently via another method when he ran some checks of his own:

David Stockwell has analyzed the frequency of the final digits in the temperature data by NASA’s GISS led by James Hansen, and he claims that the unequal distribution of the individual digits strongly suggests that the data have been modified by a human hand.

With Mathematica 7, such hypotheses take a few minutes to be tested. And remarkably enough, I must confirm Stockwell’s bold assertion.

But that’s not all, Lubos goes on to say:

Using the IPCC terminology for probabilities, it is virtually certain (more than 99.5%) that Hansen’s data have been tempered with.

To be fair, Lubos runs his test on UAH data as well:

It might be a good idea to audit our friends at UAH MSU where Stockwell seems to see an even stronger signal.

In plain English, I don’t see any evidence of man-made interventions into the climate in the UAH MSU data. Unlike Hansen, Christy and Spencer don’t seem to cheat, at least not in a visible way, while the GISS data, at least their final digits, seem to be of anthropogenic origin.

Steve McIntyre offered an explanation in the way rounding occurs when converting from Fahrenheit to Centigrade, but Lubos can’t seem to replicate the same results he gets from the GISS data:

Steve McIntyre has immediately offered an alternative explanation of the non-uniformity of the GISS final digits: rounding of figures calculated from other units of temperature. Indeed, I confirmed that this is an issue that can also generate a non-uniformity, up to 2:1 in the frequency of various digits, and you may have already downloaded an updated GISS notebook that discusses this issue.

I can’t get 4,7 underrepresented but there may exist a combination of two roundings that generates this effect. If this explanation is correct, it is a result of much less unethical approach of GISS than the explanation above. Nevertheless, it is still evidence of improper rounding.

Pretty strong stuff, but given the divergence of the GISS signal with other datasets, unsurprising.  I wonder if it isn’t some artifact of the GISS Homogenization process for surface temperature data, which I view as flawed in its application.

But let’s give the benefit of the doubt here. I want to see what GISS has to say about it, there may be a perfectly rational explanation that can be applied that will demonstrate that these statistical accusations are without merit. I’m sure they will post something on RC soon.

Stay tuned.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
161 Comments
Inline Feedbacks
View all comments
George E. Smith
January 14, 2009 4:16 pm

So who has done an analysis on the digits of (Pi) or (e) to see if they have been hand manipulated.
Just asking !

Phil
January 14, 2009 4:18 pm

The text in the article says a problem in the data is “a deficiency of zeros and ones” (oh God no!) and then shows a chart showing an excess of zeroes and ones compared to the arbitrary ‘expected’ amounts.
Since temperature measurements are not actually random numbers could someone explain why the data should contain an even distribution of all digits 0-9? Seriously, try to explain this because I’m in the mood for a good belly laugh.

January 14, 2009 4:29 pm

First, you won — Congratulations on the ‘Best Science Blog’ — Anthony. Hats off for the great job you do running this blog.
It’s way past time that the demands be made by the community that the process be opened up for all to see. It’s tables of numbers, not nuclear secrets here. If there is to be any credibility left in the process of establishing facts, the process, methods, software and the raw data needs to be opened up to scrutiny. Only then can we all agree on what the real facts are.
You cannot hope to make policy on this sort of crappy foundation, nor do good science. The point of having a clearing house like this is so the data that is ultimately produced can be relied upon. Whether they are actually fiddling the numbers or not, is not the question, it’s why would they behave the way they do if they weren’t?

Bill Illis
January 14, 2009 4:39 pm

A large number of 4’s are rounded up to 7’s.
Doesn’t sound like a big deal except we are only dealing with tenths of a degree here in the first place.
The rate that temps have increased is only 0.064C each year. Over 100 years, the increase is only 0.64C (which implies that global warming may not be such a big problem.)
Oops, I’ve rounded the 0.64C over 100 years to 0.67C and now the increase is 0.7C over 100 years (global warming is now within the range of errors of the models again).
The global warming models are projecting temps to increase at 0.2C per decade (the only number that gets temps close to +3.0C by 2100).
The actual observations since 1980 show an increase of only 0.14C per decade which indicates temps will increase less than 2.0C by 2100 – oops, I’ve now rounded that up to 0.17C per decade and viola, the global warming models are correct again and, yes, there will be dangerous global warming of +3.0C by 2100.
Rounding of such small numbers, especially the 4’s up to 7’s makes a huge impact in the long-term trends of 100 to 200 years that we are talking about here.

Molon Labe
January 14, 2009 4:48 pm

You need to bury this post, this argument, this “analysis” and pretend like it never happened. This silly effort could unravel years of effort in confronting the warmers.

Ed Scott
January 14, 2009 4:55 pm

Candy for number crunchers. Excuse the random excerpts.
————————————————————-
HEAT STORED BY GREENHOUSE GASES
When investigating the propagation of energy, we must take into account the science of thermodynamics which allows us to predict the trajectory of the process; and the science of heat transfer to know the modes by which energy is propagated from one system to other systems.
To understand heat transfer we have to keep in mind that heat is not a substance, but energy that flows from one system toward other systems with lower density of energy.
http://biocab.org/Heat_Storage.html
Note: If we take the last report from Mauna Loa for this algorithm, the mass of CO2 would be 0.00069 Kg. The change of temperature would be 0.0062 °C. The difference between the ΔT produced by 0.000614 Kg and the ΔT produced by 0.00069 Kg of CO2 is negligible (0.0062 – 0.00553 = 0.00067).
To cause a variation in the tropospheric temperature of 0.52 °C (average global temperature anomaly in 1998; UAH) required 1627.6 ppmv of CO2, a density of atmospheric CO2 which has never been recorded or documented anywhere in the last 420,000 years. (Petit et al. 1999)
The total change in the tropospheric temperature of 0.75 °C was given for the duration of one minute of one year (1998) (UAH); however, CO2 increased the tropospheric temperature by only 0.01 °C. We know now that 1934 was the warmest year of the last century. Where did the other 0.74 °C come from? Answer: it came from the Sun and from the remnants of supernovas.
CHANGE OF THE TROPOSPHERIC TEMPERATURE BY SOLAR IRRADIANCE
Planet Earth would not be warming if the Sun’s energy output (Solar Irradiance) was not increasing. Favorably, our Sun is emitting more radiation now than it was 200 years ago, and so we should have no fear of a natural cycle that has occurred many times over in the lifetime of our Solar System.
When the concentration of atmospheric carbon dioxide increases, the strong absorption lines become saturated. Thereafter its absorptivity increases logarithmically not linearly or exponentially; consequently, carbon dioxide convective heat transfer capacity decreases considerably.
ALGORITHM FOR METHANE (CH4)
ΔT = 0.00013 cal-th /0.0012 Kg (533.3 cal/Kg*°C) = 0.00013 cal / 0.64 cal*°C = 0.0002 °C
Consequently, Methane is not an important heat forcing gas at its current concentration in the atmosphere.
THE CASE ON 14 APRIL 1998 (RADIATIVE “FORCING”)
When we introduce real standards and apply the proper algorithms, the temperature increase caused by CO2 is no more than 0.1 K.
CO2 SCIENCE: THE CASE ON JUNE 22, 2007 (RADIATIVE “FORCING”):
A common error among some authors is to calculate the anomaly taking into account the whole mass of atmospheric CO2, when for any calculation we must take into account only the increase of the mass of atmospheric CO2. The error consists of taking the bulk mass of CO2 as if it were entirely the product of human activity, when in reality the increase in human CO2 contribution is only 34.29 ppmv out of a total of 381 ppmv (IPCC). This practice is misleading because the anomaly is caused not by the total mass of CO2, but by an excess of CO2 from an arbitrarily fixed “standard” density. There is however no such thing as a “standard” density of atmospheric CO2.
Does this mean that air temperature would increase by 0.02 °C per second until it reached scorching temperatures? No, it does not, as almost all of the absorbed heat is emitted in the very next second. Thus the temperature anomaly caused by CO2 cannot go up if the heat source does not increase the amount of energy transferred to CO2.
0.27 K/s is only 1.24% of the temperature difference between the ground and the air, which was 21.8 K. We can see that carbon dioxide is not able to cause the temperature anomalies that have been observed on Earth.
We would be mistaken if we were to think that the change of temperature was caused by CO2 when, in reality, it was the Sun that heated up the soil. Carbon dioxide only interfered with the energy emitted by the soil and absorbed a small amount of that radiation (0.0786 Joules), but carbon dioxide did not cause any warming. Please never forget two important points: the first is that carbon dioxide is not a source of heat, and the second is that the main source of warming for the Earth is the Sun.
WATER VAPOR:
It is evident that water vapor is a much better absorber-emitter of heat than carbon dioxide. Under the same conditions, water vapor transfers 160 times more heat than carbon dioxide.

crosspatch
January 14, 2009 5:00 pm

I wrote earlier:

:
I have wondered why nobody has attempted to “recover” the station data to see if it changes those temperatures. It seems like it could be put to rest easily enough, but nobody ever had.

There should be an easy test to see if those missing stations have biased GISS’s global temperature. See if the difference between GISS and one of the satellite (RSS, for example) measurements suddenly widens when the rural stations fall off the GISS data map. If the difference between them stays the same or narrows, then the rural stations didn’t matter much and the step up in temperatures is real. If the gap widens, you can then give some quantitative value to the amount of difference those “missing” stations made.

January 14, 2009 5:07 pm

tarpon:

Whether they are actually fiddling the numbers or not, is not the question, it’s why would they behave the way they do if they weren’t?

Exactly.
Why is the taxpaying public being stonewalled over access to the raw data that is collected by GISS? It’s not like GISS is guarding national defense secrets. This is the weather we’re talking about.

Andrew
January 14, 2009 5:10 pm

What’s good for the goose is good for the gander. If you can look at numbers and do an analysis, and it’s believable, why should anyone complain? This is the game we’re playing in climate science, isn’t it?
Andrew

Greg Smith
January 14, 2009 5:24 pm

Read between the lines of this latest release from GISS and you see that there are cracks appearing in the system! About time
http://data.giss.nasa.gov/gistemp/2008/
Congratulations Anthony on your great win and blog

davidgmills
January 14, 2009 5:32 pm

Meanwhile ….
Williston, N.D. — Minus 40F

January 14, 2009 5:36 pm

GISS Surface Temperature Analysis
Global Temperature Trends: 2008 Annual Summation
Updated Jan. 13, 2009
“…it still seems likely that a new global temperature record will be set within the next 1-2 years…”

January 14, 2009 5:40 pm

I disagree with the analysis also. Nothing remarkable here that I can see.
Note the table and an alpha probability of .05. Each one tagged as significant was just above the critical value, (which must be above 3.59 (column 1) and less than 4.02 (column 7). The other way to state columns 1 and 7 would be to display the probability that the data is not manipulated as something less than .05 for each, but not much less since it’s close (for the sake of expediency I’m not going to calculate the P values…)
Just rounding to .05 since it’s close, the probability of seeing a false positive using the binomial distribution (events=10,p=.05) is 31.5% for finding 1 event above .05, and 7.5% for seeing 2 events. The probability of finding all zeros in the last row is 69.5%, so anyone expecting to see all blank spaces in the “significant” row is going to be disappointed about 30% of the time.
So doing this analysis on purely random data and finding this result (2 events with P of around .05) should happen at least 7.5% of the time. Not significant.
Since any adjustments Hansen makes are run through a virtually unknowable meat grinder of calculations before the corrected historical temperature arrives (naturally much colder than it really was then), I wouldn’t expect to be able to discern any fingerprint, even if something untoward was going on with the original data. That something remarkable is happening with the result, on the other hand, is obvious even to the most casual observer.
But may I suggest: I think you would have a higher probability of finding something refutable by simply analyzing the “adjusted” historical temperatures for all stations remaining in the network, compared to the original readings. As Anthony has shown many times, some stations show a colder adjusted temperature many years ago, when we know quite well that people knew how to read thermometers back then just as well as we can today. I have a gut feeling it’s much more than siting, UHI, etc.
The null hypothesis is that the adjusted historical temperature of the nation (or region) is the same in a given year as the original record indicates, with about the same number of stations adjusted high as adjusted low. The alternative hypothesis is that the adjusted average is lower than the originally recorded average. Repeat test on many years.
That test would be easy to analyze, and if the hypothesis is rejected at P=.01 or so, it would be a strong indictment against Hansen’s methods, and could trigger a more in-depth audit.
Anthony, if you have any insights on how to collect such data, I can do the analysis and write up the results…

sprats
January 14, 2009 6:04 pm

Meanwhile. Toronto tonight negative 20 C. Normally negative 2 or 5 C. We’re told by the morning news stations that this has not happened since 2005. Not so long ago but Al Gore told me that it would get hotter and hotter each year (somewhat like Siberia has been). Longtime reader, great site that arms me for the odd conversation with “Nichola” types.

G Alston
January 14, 2009 6:09 pm

Molon Labe (16:48:36) :
You need to bury this post, this argument, this “analysis” and pretend like it never happened. This silly effort could unravel years of effort in confronting the warmers.
AGREED.
There is no good that can come of this. At best it sounds picayunish; at worst it sounds like a group of halfwits gunning for Dr. Hansen with claims that he looked askance at his female intern. Or thereabouts.

Bill Illis
January 14, 2009 6:11 pm

Speaking of rounding, Hansen has made a significant “rounding down” of global warming expectations over the past few days.
The warming trend has been reduced to 0.15C per decade (from the over +0.2C per decade GISS and the IPCC previously used.)
At this new rate of 0.15C per decade, there is no way to reach +3.0C by 2100 and only a little over +2.0C can be reached at this rate.
Here is the quote from Hansen of note:
“Greenhouse gases: Annual growth rate of climate forcing by long-lived greenhouse gases (GHGs) slowed from a peak close to 0.05 W/m2 per year around 1980-85 to about 0.035 W/m2 in recent years due to slowdown of CH4 and CFC growth rates [ref. 6]. Resumed methane growth, if it continued in 2008 as in 2007, adds about 0.005 W/m2. From climate models and empirical analyses, this GHG forcing trend translates into a mean warming rate of ~0.15°C per decade. ”
These numbers were repeated in Hansen’s personal blog and on the official GISSTemp page today.
http://data.giss.nasa.gov/gistemp/2008/
http://www.columbia.edu/~jeh1/mailings/2009/20090113_Temperature.pdf

Roger Knights
January 14, 2009 6:14 pm

There were 32 more “1s” than expected, and 32 fewer “7s”. Maybe somebody misread a sloppily written “7” as a “1”?
This could easily happen in the US, because we lack the European convention of crossing the 7 to avoid this sort of mis-read.

Richard deSousa
January 14, 2009 6:18 pm

http://data.giss.nasa.gov/gistemp/2008/
What I find interesting is the above link furnished by Greg Smith makes no mention of the PDO. Strange… it’s influence on the climate for the northern hemisphere is just as important as the ENSO/La Nina/El Nino ocean cycles.

January 14, 2009 6:37 pm

G Alston and Molon Labe,
You may well be right. But why are skeptics’ concerns never addressed? Can you suggest a method of obtaining the raw data that Hansen hides from everyone outside of his clique? [For “clique” see the Wegman Report to Congress.]
It is their refusal to archive the raw data that makes plenty of people justifiably suspicious of GISS, which has a record of diddling with the official temperature chronology: click [look closely]
And note how GISS is out of step with UAH, HadleyCRUT3 and RSS data: click
For an example of a local area “homogenized” by GISS: click
Michael Mann also refuses to disclose his raw data and methodology, resulting in: click
…Which is refuted by: click
These major inconsistencies could be quickly resolved by publicly archiving the raw [taxpayer-funded] data. But Hansen, Mann, GISS and the UN/IPCC refuse to do so. The question is: what are they hiding??

crosspatch
January 14, 2009 6:48 pm

“But can you suggest a method of obtaining the raw data that Hansen hides from anyone outside of his clique? ”
He isn’t hiding the raw data.
Hansen’s raw input data are provided by NOAA. The raw data is available for download. His algorithm for adjustment is archived and also available for download.

January 14, 2009 6:52 pm

crosspatch:
My response was apparently parroting the comments of others which I accepted at face value, and for that I apologize. Can you provide a link to the raw data, and the methodology, that GISS uses? Thanks.

Earle Williams
January 14, 2009 7:06 pm

OT
Heard on Alaska public radio this morning:
Sea Ice in the Bering Sea causes problems for crab fishermen
The APRN.org web site wasn’t responding. 🙁

January 14, 2009 7:14 pm

sprats
Good data! Sounds mighty cold. But in the interest of not cherry-picking the temperature data, it is mighty warm in Southern California this week. We are having a heat wave with highs of 83 or so, and lows of 60 more or less. Had an all-time high earlier this week of 88 degrees. All those temps in F, of course!
Roger E. Sowell
Marina del Rey, California

January 14, 2009 7:18 pm

A little help, guys?
I saw an article by somebody whom I have forgotten detailing how as CO2 concentrations rise, their ability to absorp radiation begins declining at a certain point. And the jist of the conclusion was that CO2 would not be able to supply the feedback effect promised by the IPCC.
I saw this in the last month, but can’t find it in my 4-5 dozen GW articles I have collected.
anybody know who and what this was? I realize this is way OT. But any help here?

Pamela Gray
January 14, 2009 7:22 pm

You guys sometimes make me giggle. You are quibbling (and I do sometimes as well because when I haven’t had a glass of red wine and a bit of chocolate, accuracy is important) over whether or not winters were actually .02 degrees colder than the official data claims.
Well, it has been a glass and a half later and I am more concerned about whether or not to buy the shoes with a racy strap that angles over my ankle and rides high in the back, or the mary janes that have two cute little straps that cross in front.