Analysis: CRU tosses valid 5 sigma climate data

Above: map of mean temperature and departure by state for February 1936 in the USA, a 5 sigma event. Source: NCDC’s map generator at http://www.ncdc.noaa.gov/oa/climate/research/cag3/cag3.html

Steve Mosher writes in to tell me that he’s discovered an odd and interesting discrepancy in CRU’s global land temperature series. It seems that they are tossing out valid data that is 5 sigma () or greater. In this case, an anomalously cold February 1936 in the USA. As a result, CRU data was much warmer than his analysis was, almost 2C. This month being an extreme event is backed up by historical accounts and US surface data. Wikipedia says about it:

The 1936 North American cold wave ranks among the most intense cold waves of the 1930s. The states of the Midwest United States were hit the hardest. February 1936 was one of the coldest months recorded in the Midwest. The states of North Dakota, South Dakota, and Minnesota saw the their coldest month on record. What was so significant about this cold wave was that the 1930s had some of the mildest winters in the US history. 1936 was also one of the coldest years in the 1930s. And the winter was followed one of the warmest summers on record which brought on the 1936 North American heat wave.

This finding of tossing out 5 sigma data is all part of an independent global temperature program he’s designed called “MOSHTEMP” which you can read about here. He’s also found that it appears to be seasonal. The difference between CRU and Moshtemp is a seasonal matter. When they toss 5 sigma events it appears that the tossing happens November through February.

His summary and graphs follow: Steve Mosher writes:

A short update. I’m in the process of integration the Land Analysis and the SST analysis into one application. The principle task in front of me is integrating some new capability in the ‘raster’ package.  As that effort proceeds I continue to check against prior work and against the accepted ‘standards’. So, I reran the Land analysis and benchmarked against CRU. Using the same database, the same anomaly period, and the same CAM criteria. That produced the following:

My approach shows a lot more noise. Something not seen in the SST analysis which matched nicely. Wondering if CRU had done anything else I reread the paper.

” Each grid-box value is the mean of all available station anomaly values, except that station outliers in excess of five standard deviations are omitted.”

I don’t do that!  Curious, I looked at the monthly data:

The month where CRU and I differ THE MOST is  Feb, 1936.

Let’s look at the whole year of 1936.

First CRU:

had1936

[1] -0.708 -0.303 -0.330 -0.168 -0.082  0.292  0.068 -0.095  0.009  0.032  0.128 -0.296

> anom1936

[1] “-0.328″ “-2.575″ “0.136″  ”-0.55″  ”0.612″  ”0.306″  ”1.088″  ”0.74″   “0.291″  ”-0.252″ “0.091″  ”0.667″

So Feb 1936 sticks out as a big issue.

Turning to the anomaly data for 1936, here is what we see in UNWEIGHTED Anomalies for the entire year:

summary(lg)

Min.     1st Qu.      Median        Mean     3rd Qu.        Max.        NA’s

-21.04000    -1.04100     0.22900     0.07023     1.57200    13.75000 31386.00000

The issue when you look at the detailed data is for example some record cold in the US. 5 sigma type weather.

Looking through the data you will find that in the US you have Feb anomalies beyond the 5 sigma mark with some regularity. And if you check Google, of course it was a bitter winter. Just an example below. Much more digging is required here and other places where the method of tossing out 5 sigma events appears to cause differences(in apparently both directions). So, no conclusions yet, just a curious place to look. More later as time permits. If you’re interested double check these results.

had1936

[1] -0.708 -0.303 -0.330 -0.168 -0.082  0.292  0.068 -0.095  0.009  0.032  0.128 -0.296

> anom1936

[1] “-0.328″ “-2.575″ “0.136″  ”-0.55″  ”0.612″  ”0.306″  ”1.088″  ”0.74″   “0.291″  ”-0.252″ “0.091″  ”0.667″

had1936[1] -0.708 -0.303 -0.330 -0.168 -0.082  0.292  0.068 -0.095  0.009  0.032  0.128 -0.296> anom1936[1] “-0.328″ “-2.575″ “0.136″  ”-0.55″  ”0.612″  ”0.306″  ”1.088″  ”0.74″   “0.291″  ”-0.252″ “0.091″  ”0.667″

Previous post on the issue:

CRU, it appears, trims out station data when it lies outside 5 sigma. Well, for certain years where there was actually record cold weather that leads to discrepancies between CRU and me. probably happens in warm years as well. Overall this trimming of data amounts to around .1C. ( mean of all differences)

Below, see what 1936 looked like. Average for every month, max anomaly, min anomaly, and 95% CI (orange) And note these are actual anomalies from 1961-90 baseline. So that’s a -21C departure from the average.  With a standard deviation around 2.5 that means CRU is trimming  departures greater than 13C or so.  A simple look at the data showed bitterly cold  weather in the US. Weather that gets snipped by a 5 sigma trim.

And more interesting facts: If one throws out data because of outlier status one can expect outliers to be uniformly distributed over the months. In other words bad data has no season. So, I sorted the ‘error’ between CRU and Moshtemp. Where do we differ. Uniformly over the months? Or, does the dropping of 5sigma events happen in certain seasons? First lets look at when CRU is warmer than Moshtemp. I take the top 100 months in terms of positive error. Months here are expressed as fractions 0= jan

Next, we take the top 100 months in terms of negative error. Is that uniformly distributed?

If this data holds up upon further examination it would appear that CRU processing has a seasonal bias, really cold winters and really warm winters ( 5 sigma events) get tossed. Hmm.

The “delta” between Moshtemp and CRU varies with the season. The worst months on average are Dec/Jan. The standard deviation for the winter month delta is twice that of other months. Again, if these 5 sigma events were just bad data we would not expect this. Over all Moshtemp is warmer that CRU, but  when we look at TRENDS it matters where these events happen

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

114 Comments
Inline Feedbacks
View all comments
Ken Hall
September 5, 2010 2:57 pm

It seems to me that they could be omitting this data for two possible reasons.
1. To reduce the amount of noise in the trend. This makes any “anthropogegic” fingerprint appear more obvious and greater than it actually is.
2. To reduce the appearance of unusual extreme weather anomalies in their records so that modern extreme weather events could falsely be labelled as “unprecedented”. That word that is wrongly and massively overused by the CAGW alarmists.
These are just more tricks used to bolster the very weak CAGW theory.

Theo Goodwin
September 5, 2010 3:37 pm

So-called scientists who prefer massaged data to empirical data are not worthy of the name ‘scientist’. Once again, these people have been caught red-handed. Would everyone please stop giving them grant money?

DirkH
September 5, 2010 4:00 pm

Phil. says:
September 5, 2010 at 1:55 pm

“DirkH says:
Phil., i hope you don’t work in avionics?”
Why Dirk, would you rather not eliminate bad data and spurious points?”
Why – because you would endanger lifes.
And no, in engineering i wouldn’t eliminate “bad data” and “spurious points”. Rather, i would examine exactly these “spurious points”. Very very closely.
Think about your answer. You just said that the cold wave of 1936 was bad data; a spurious point – something that has not really happened.
If all scientists work this way, no wonder they never achieve anything.

Harold Pierce Jr
September 5, 2010 4:14 pm

ATTN: Anthony and Steve
In the 1930’s, what level of acccuracy was used for temperature measurements?
It certainly was not to +/- 0.001 deg F. Were not temperatures measured to +/- 1 deg F? These computed numbers should be rounded back to the the level to the accuracy used for temperature measurements.
If temperature data are rounded off to nearest whole deg C, global warming vanishes!

September 5, 2010 4:21 pm

DirkH says:
September 5, 2010 at 4:00 pm
Phil. says:
September 5, 2010 at 1:55 pm

“DirkH says:
Phil., i hope you don’t work in avionics?”
Why Dirk, would you rather not eliminate bad data and spurious points?”
Why – because you would endanger lifes.
And no, in engineering i wouldn’t eliminate “bad data” and “spurious points”. Rather, i would examine exactly these “spurious points”. Very very closely.
Think about your answer. You just said that the cold wave of 1936 was bad data; a spurious point – something that has not really happened.

No I didn’t and neither did CRU, their procedure has been incorrectly described here.
As I pointed out several hours ago:
http://wattsupwiththat.com/2010/09/05/analysis-cru-tosses-valid-5-sigma-climate-data/#comment-475769
What was actually done was to use the 5-sigma test as a screen and then examine those data for signs of problems:
“To assess outliers we have also calculated monthly standard deviations for all stations with at least 15 years of data during the 1921–90 period. All outliers in excess of five standard deviations from the 1961–90 mean were compared with neighbors and accepted, corrected, or set to the missing code. Correction was possible in many cases because the sign of the temperature was wrong or the temperature was clearly exactly 10……

Ian W
September 5, 2010 4:29 pm

John A says:
September 5, 2010 at 8:51 am
If this is true, then it cannot be seen as anything other than scientific misconduct, unless the CRU can justify this step on physical grounds.

It looks more like carelessness and application of rules in algorithms that a meteorologist would say were not realistic, in the same way a botanist would say using tree rings for temperature is not realistic. But the result was ‘in line with expectations’ so a confirmation bias made it all look good.
This is the problem when guesstimation algorithms are used to ‘correct’ or ‘adjust’ data they can lead to systemic errors that are not immediately apparent.
So scientific misconduct is perhaps a little strong – but it was (is) an amateurish approach to collating data demonstrating a lack of quality control or quality management.

Jim Powell
September 5, 2010 4:43 pm

I started tracking weather statistics for Huntley, Montana back in 2006. At that time I downloaded the historical data from: United States Historical Climatology Network, http://cdiac.ornl.gov/epubs/ndp/ushcn/ushcn.html . This January I ran across some discrepancies so I downloaded the same file again and compared it to the original file. When I graphed the result it looked like someone had gone through the entire file with a computer program decreasing all of the temperature records prior to 1992. I still have both original files. February 1936 from the original file downloaded in 2006 was -.62 for the month and February 1936 from the file downloaded in January was -2.1 degrees F. My comments at that time were “Isn’t climate science fun! Everyday you learn new ways to commit fraud. I’d love to see the programmers’ comments on this little piece of code. I can imagine that it is something like “Hallelujah Brother, 2009 is the warmest year on record!!!!!”

September 5, 2010 4:50 pm

Orkneygal says:
September 5, 2010 at 2:38 pm

If I had presented an assignment at university with such sloppy and incomplete graphs, it would have been returned without marking and I would have received no credit.
The standard of presentation at this site is slipping.

…—…—…
Er, uhm, no.
This was NOT a “university assignment”. It was NOT a graded paper, nor a funded paper (of any kind) nor a response to a privately funded nor commercial “assignment” to be returned to the boss for a profit and loss analysis.
It was (is!) a privately-volunteered, unfunded, spontaneous self-assigned question by a private individual. It WAS then bravely “volunteered” for public review by that individual for public review and comment and correction – which was subsequently provided by the private author in his response above.
The supposed “science” that has created Mann-made global warming IS publicly funded and has been awarded Nobel Prizes and billions of dollars of new funding over the past 20+ years. THAT “science” is hidden, has no review outside of friends and associates, and is NOT corrected nor publicly reviewed. But that “science” is responsible for 1.3 trillion in new taxes and today’s recession.

kadaka (KD Knoebel)
September 5, 2010 4:55 pm

Looks like according to GISS this is a non-issue.
Let’s look at the numbers for “Annual and five-year running mean surface air temperature in the contiguous 48 United States (1.6% of the Earth’s surface) relative to the 1951-1980 mean,” from originating page here, using the tabular data. For 1936 the annual mean anomaly was only +.13°C.
The global average temperature anomaly is of course more important as it tracks the AGW signal. By the land-only (aka meteorological stations-only) numbers, tabular data here, the 1936 annual mean anomaly was exactly 0.00°C. With the Contiguous US (CONUS) only having 1.6% of the Earth’s surface, with the seas occupying 70% thus CONUS only being 5.3% of the land area, if there was a little snip of the 1936 US winter numbers then it still wouldn’t have meaningfully changed the more important global anomaly number.
Also, from the article:

And more interesting facts: If one throws out data because of outlier status one can expect outliers to be uniformly distributed over the months.

Why?
In the winter months, around the temperate zones, there are the temporary snowfall accumulations. Snow falls, the albedo switches from dark ground to shiny snow, snow melts, the albedo is back to dark ground. With air temperatures checked so near the ground, the heating from sunlight during the day is important. So one December the meteorological conditions give CONUS little if any snow, next year there’s enough that the more northern areas have snow on the ground for the entire month. The albedo change will yield a larger temperature difference for that month between those years than a similar change in precipitation for a summer month like July, even when other factors (ratio of cloudy to clear days, etc) would yield little or no difference. Thus the albedo change gives a potential for greater temperature variances during the winter months than the rest of the year. Amazingly enough, it looks like that’s what you’ve found.
So, someone thought they had found something that looked suspicious with a temperature dataset, made up some graphs, posted here while noting he’s not making conclusions just noticing something that looks off, then certain commentators point out there is nothing wrong, it’s in the documentation, it makes no difference even if true, his analysis is faulty, etc.
Would it be wrong to say Mosher has just pulled a Goddard?

——-
Side notes: GISS now says, for CONUS, 1934 had only 1.20 Celsius anomaly units, 1998 and 2006 were clearly warmer at 1.32 and 1.30 respectively. For the global numbers, the January-July Mean Surface Temperature Anomaly (°C) graph shows 2010 the warmest of the 131 year record.
2010 is gonna be da hottest evah!

September 5, 2010 4:57 pm

Ron Broberg says:
September 5, 2010 at 7:17 am
You can see the code for this in the MET’s released version of CRUTEM in the station_gridder.perl
[From the code notes!]
# Round anomalies to nearest 0.1C – but skip them if too far from normal
if ( defined( $data{normals}[$i] )
&& $data{normals}[$i] > -90
—…—…—…—
Harold Pierce Jr says:
September 5, 2010 at 4:14 pm
ATTN: Anthony and Steve
In the 1930′s, what level of acccuracy was used for temperature measurements?
It certainly was not to +/- 0.001 deg F. Were not temperatures measured to +/- 1 deg F? These computed numbers should be rounded back to the the level to the accuracy used for temperature measurements.
—…—…—…—…—…—…
Look above: CRU input data is “rounded off” to the nearest 1/10 of one degree BEFORE analysis begins – yet the entire CAGW theory is based on a change of less than 1/2 of one degree.
Worse “Phil” claims proudly that the subsequent “corrections” to the original data for “exactly 10 degrees differences” ??????? – plus eliminating data outside of 5 sigma is “prudent” and good science.

September 5, 2010 5:18 pm

Thanks Anthony!
May be good:
FOX News: Hannity Special: The Green Swindle
Sunday, Sep 5, 9 PM EST

JTinTokyo
September 5, 2010 6:47 pm

If this information is correct, the researchers at CRU are guilty of laziness and sloppiness in their research. Good statistical researchers take the time to get to know their data and understand why extreme values are present while lazy researchers write programs that toss out these extreme values. Looks like someone messed up here.

wayne
September 5, 2010 6:48 pm

“When they toss 5 sigma events it appears that the tossing happens November through February.”
Of course it’s the low temperatures. Look at DMI’s arctic temperatures at http://ocean.dmi.dk/arctic/meant80n.uk.php, any year. It’s the same for all latitude’s temperature readings. Maximums don’t vary as much as minimums do. I still wonder exactly why this is so, in the physics of the atmosphere I mean.
So, there’s the hike to the temperatures. We all knew it was there somewhere. And I’m sorry but that IS called manipulation of the data. Still want to better understand our atmosphere so keep the science coming Athony, it’s thanks to you, no doubt.
Does that have to do with the MSM rarely covering cold events ?? Where they deleted too ?? ☺

Orkneygal
September 5, 2010 7:21 pm

Well RACookPE1978, by the nature of your response, I see that you are in complete agreement with me.
The posting would not be accepted as an assignment response at my University.
Sloppy graphs are a sign of sloppy logic.

September 5, 2010 7:26 pm

JTinTokyo says:
September 5, 2010 at 6:47 pm
If this information is correct, the researchers at CRU are guilty of laziness and sloppiness in their research.

As I’ve pointed out above they do not do this.

wayne
September 5, 2010 7:26 pm

Any real scientist would know if you have such an effect as high summer temperatures having a fraction of the winter temperature’s standard deviation (stdev) you would have to do this “purification” on a monthly or even weekly basis. And since certain stations show large variance to the normal variance in their area, as a station in a valley where the others are mainly the plains, it would really have to be preformed on a per station basis.
I don’t see why any are being tossed, even if it’s clearly a mistake, say 15 entered instead of 51, 25 instead of -25.
If it’s really cold around an area and a reading of all other stations are, let’s say -4.3 stdev, why toss a few that are -5.1 stdev. Now if all surrounding temperatures are +3 stdev and one lone station nearby comes in at -6 stdev, that clearly should be examined on a one by one basis by human eyes and hopefully with a brain. They would be very rare and could easily be handled. If it’s clearly should have been 51 and not 15 by checking surrounding stations, change it, with tags and notes of the correction, don’t toss it.
It would shock me if these “scientists” in the climate agencies were found to be applying such proper critique to the data they so control.

Policyguy
September 5, 2010 7:35 pm

So ,
How many other arbitrary data “corrections” does CRU and GISS rely upon?
Seasonal bias is a big deal. How is this acceptable to climate scientists?
How about granting entities that have spent so much money on building models that provide great graphics, but such poor prognostication.
Doesn’t anyone care that the foundation data is based upon cooked books?
Virginia is poised to answer some of these questions.

JK
September 5, 2010 7:58 pm

Weren’t the late 1930s warmer than the present?

kadaka (KD Knoebel)
September 5, 2010 8:07 pm

From: wayne on September 5, 2010 at 6:48 pm

Maximums don’t vary as much as minimums do. I still wonder exactly why this is so, in the physics of the atmosphere I mean.

It’s just physics. Systems like to shed heat, until they equalize with their surroundings. For maximums you’re filling faster a leaking bucket, for minimums you’re either filling it slower, not at all, or perhaps even removing heat (colder air moves in, water evaporating after rainfall). Thus for the same absolute values of energy rate changes, you’ll get larger changes in minimums than you will maximums (for sunlight, gaining an extra 5 watts per square meter yields a smaller temperature increase than the temperature decrease from losing 5).

wayne
September 5, 2010 8:11 pm

Ron Broberg says:
September 5, 2010 at 7:17 am
Thanks for the pointer to the code. http://www.metoffice.gov.uk/climatechange/science/monitoring/reference/station_gridder.perl
I want to pueck. Looks like quick and dirty code to me. Takes me back to programming in the 70’s. Guess I’m too used to writing 100% pure code with every single boundary verified.
What happens in PERL if an “undef”ed standard deviation is then used in calculations? NaN? Zero? Exception? Nothing? That “undef” for standard deviations is set in the block above the one you supplied above?

Harold Pierce Jr
September 5, 2010 8:47 pm

ATTN: RACookPE1978
I downloaded some temperature data for the remote Telluride CO from the USHCN.
The monthly temperature data for Tmax and Tmin are reported to +/- 1 deg F (ca. 0.5 deg C). The computed monthly Tmean was reported to +/- 0.00001 deg F. The SD was reported to +/- 0.000001. This is nuts!
The problem is that most “climate scientists” are computer and math jocks not experimentalists. Nowadays digital data from instruments is assumed to be 100% accurate especially by begining grad studentrs.

September 5, 2010 9:23 pm

wayne says:
September 5, 2010 at 7:26 pm
I don’t see why any are being tossed, even if it’s clearly a mistake, say 15 entered instead of 51, 25 instead of -25.
If it’s really cold around an area and a reading of all other stations are, let’s say -4.3 stdev, why toss a few that are -5.1 stdev. Now if all surrounding temperatures are +3 stdev and one lone station nearby comes in at -6 stdev, that clearly should be examined on a one by one basis by human eyes and hopefully with a brain. They would be very rare and could easily be handled. If it’s clearly should have been 51 and not 15 by checking surrounding stations, change it, with tags and notes of the correction, don’t toss it.

That’s pretty much what CRU does.
It is rather rare as you surmise:
“We made changes to about 500 monthly averages, with approximately 80% corrected and 20% set to missing. In terms of the total number of monthly data values in the dataset this is a very small amount (<0.01%)."

henry
September 5, 2010 9:37 pm

So how would the current “5 sigma event tossing” have dealt with this event?
“…The world record for the longest sequence of days above 100° Fahrenheit (or 37.8° on the Celsius scale) is held by Marble Bar in the inland Pilbara district of Western Australia. The temperature, measured under standard exposure conditions, reached or exceeded the century mark every day from 31 October 1923 to 7 April 1924, a total of 160 days…”
Over a hundred degrees for 160 days.
In the early 20’s.
And to think, the world is much warmer now…

September 5, 2010 10:10 pm

Phil.
Ya it looks like the big trim comes in BEFORE CRU. If I read ron correctly CRU are using an adjusted version while I’m using GHCN raw. Of course this just turns the discussion to the differences between the raw and adjusted. IN THE END, the differences end up not being that great, but 2C in one month floored me, especially when I had matched the SST results to damn near 1/100th.

wayne
September 5, 2010 10:30 pm

Phil. says:
September 5, 2010 at 9:23 pm
Assumed a quote from CRU:
“We made changes to about 500 monthly averages, with approximately 80% corrected and 20% set to missing. In terms of the total number of monthly data values in the dataset this is a very small amount (<0.01%)."

That tells me nothing. Well… no, it actually does. It tells me it takes mere change of 0.01% of the records to make a rather large error in the trend that Steven Mosher is preliminarily showing.
You seem to know right where to go and get the core information. Do you happen to know where the 600 alterations/droppings are with the from and to and why, or those dropped, what they were and why. That wouldn’t be a very big file for them to keep and make public. Would save a lot of duplicate work. Or is it up to an investigator to try to recreate what they have done to the temperature records?