Above: map of mean temperature and departure by state for February 1936 in the USA, a 5 sigma event. Source: NCDC’s map generator at http://www.ncdc.noaa.gov/oa/climate/research/cag3/cag3.html
Steve Mosher writes in to tell me that he’s discovered an odd and interesting discrepancy in CRU’s global land temperature series. It seems that they are tossing out valid data that is 5 sigma (5σ) or greater. In this case, an anomalously cold February 1936 in the USA. As a result, CRU data was much warmer than his analysis was, almost 2C. This month being an extreme event is backed up by historical accounts and US surface data. Wikipedia says about it:
The 1936 North American cold wave ranks among the most intense cold waves of the 1930s. The states of the Midwest United States were hit the hardest. February 1936 was one of the coldest months recorded in the Midwest. The states of North Dakota, South Dakota, and Minnesota saw the their coldest month on record. What was so significant about this cold wave was that the 1930s had some of the mildest winters in the US history. 1936 was also one of the coldest years in the 1930s. And the winter was followed one of the warmest summers on record which brought on the 1936 North American heat wave.
This finding of tossing out 5 sigma data is all part of an independent global temperature program he’s designed called “MOSHTEMP” which you can read about here. He’s also found that it appears to be seasonal. The difference between CRU and Moshtemp is a seasonal matter. When they toss 5 sigma events it appears that the tossing happens November through February.
His summary and graphs follow: Steve Mosher writes:
A short update. I’m in the process of integration the Land Analysis and the SST analysis into one application. The principle task in front of me is integrating some new capability in the ‘raster’ package. As that effort proceeds I continue to check against prior work and against the accepted ‘standards’. So, I reran the Land analysis and benchmarked against CRU. Using the same database, the same anomaly period, and the same CAM criteria. That produced the following:
My approach shows a lot more noise. Something not seen in the SST analysis which matched nicely. Wondering if CRU had done anything else I reread the paper.
” Each grid-box value is the mean of all available station anomaly values, except that station outliers in excess of five standard deviations are omitted.”
I don’t do that! Curious, I looked at the monthly data:
The month where CRU and I differ THE MOST is Feb, 1936.
Let’s look at the whole year of 1936.
First CRU:
had1936
[1] -0.708 -0.303 -0.330 -0.168 -0.082 0.292 0.068 -0.095 0.009 0.032 0.128 -0.296
> anom1936
[1] “-0.328″ “-2.575″ “0.136″ ”-0.55″ ”0.612″ ”0.306″ ”1.088″ ”0.74″ “0.291″ ”-0.252″ “0.091″ ”0.667″
So Feb 1936 sticks out as a big issue.
Turning to the anomaly data for 1936, here is what we see in UNWEIGHTED Anomalies for the entire year:
summary(lg)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
-21.04000 -1.04100 0.22900 0.07023 1.57200 13.75000 31386.00000
The issue when you look at the detailed data is for example some record cold in the US. 5 sigma type weather.
Looking through the data you will find that in the US you have Feb anomalies beyond the 5 sigma mark with some regularity. And if you check Google, of course it was a bitter winter. Just an example below. Much more digging is required here and other places where the method of tossing out 5 sigma events appears to cause differences(in apparently both directions). So, no conclusions yet, just a curious place to look. More later as time permits. If you’re interested double check these results.
had1936
[1] -0.708 -0.303 -0.330 -0.168 -0.082 0.292 0.068 -0.095 0.009 0.032 0.128 -0.296
> anom1936
[1] “-0.328″ “-2.575″ “0.136″ ”-0.55″ ”0.612″ ”0.306″ ”1.088″ ”0.74″ “0.291″ ”-0.252″ “0.091″ ”0.667″
had1936[1] -0.708 -0.303 -0.330 -0.168 -0.082 0.292 0.068 -0.095 0.009 0.032 0.128 -0.296> anom1936[1] “-0.328″ “-2.575″ “0.136″ ”-0.55″ ”0.612″ ”0.306″ ”1.088″ ”0.74″ “0.291″ ”-0.252″ “0.091″ ”0.667″
Previous post on the issue:
CRU, it appears, trims out station data when it lies outside 5 sigma. Well, for certain years where there was actually record cold weather that leads to discrepancies between CRU and me. probably happens in warm years as well. Overall this trimming of data amounts to around .1C. ( mean of all differences)
Below, see what 1936 looked like. Average for every month, max anomaly, min anomaly, and 95% CI (orange) And note these are actual anomalies from 1961-90 baseline. So that’s a -21C departure from the average. With a standard deviation around 2.5 that means CRU is trimming departures greater than 13C or so. A simple look at the data showed bitterly cold weather in the US. Weather that gets snipped by a 5 sigma trim.
And more interesting facts: If one throws out data because of outlier status one can expect outliers to be uniformly distributed over the months. In other words bad data has no season. So, I sorted the ‘error’ between CRU and Moshtemp. Where do we differ. Uniformly over the months? Or, does the dropping of 5sigma events happen in certain seasons? First lets look at when CRU is warmer than Moshtemp. I take the top 100 months in terms of positive error. Months here are expressed as fractions 0= jan
Next, we take the top 100 months in terms of negative error. Is that uniformly distributed?
If this data holds up upon further examination it would appear that CRU processing has a seasonal bias, really cold winters and really warm winters ( 5 sigma events) get tossed. Hmm.
The “delta” between Moshtemp and CRU varies with the season. The worst months on average are Dec/Jan. The standard deviation for the winter month delta is twice that of other months. Again, if these 5 sigma events were just bad data we would not expect this. Over all Moshtemp is warmer that CRU, but when we look at TRENDS it matters where these events happen
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.







So CRU are a bunch of Tossers….?…….:-)
Am I missing something? I do not any labels for the lines in three of Mosher’s graphs.
I wonder if there is code that does that in the Climategate software release.
I don’t know if anyone has taken much of a look at the Climategate code beyond what is in Harry’s Readme file.
Or, given the recommendations of increased openness at CRU, if the relevant source is readily available now.
Steve Mosher: Please add title blocks to your graphs that specify what the graphs represent and the color coding of the datasets. I have no idea which dataset is which.
Title blocks are time consuming but they give the viewers an idea of what you’re illustrating. The more info the better.
An old and wise rule-of-thumb in data analysis is ‘never discard data for only statistical reasons’. Such as a 5-sigma cut-off. Only discard data on the basis on subject-matter expertise (e.g. declaring the values to be physically impossible, or so extraordinary as to be incredible) or specific investigation revealing measurement or other problems. Once again, the geographers and programmers behind climate alarmism have revealed a weakness in their statistical practice.
Five sigma from what, a thirty year average?
[REPLY – HadCRUt goes back to around 1850. ~ Evan]
A couple of quick thoughts:
Winter/cold temps are more volatile because air is dryer. Therefore standard deviation in winter is higher.
In the same vein, for a normal distribution,a 5 sigma monthly event occurs at p = 3 x 10^-7. For 10,000 stations, this implies that one of them will see a 5 sigma event every 30 years or so. What are they talking about?
More BS from the CRU. The “consensus” continues to crumble.
Well clearly it is invalid, it does not fit the computer models, the CO2 induced climate change paradigm and propaganda or the political thrust behind the scam, so bin it!
Hide the decline?
And yet, “Millitary Games” in the APAC region exclude a very big chunk of the world in this area. Hummmm….
Seems to me, the millitary machine is growing in the APAC space. I’ll take wagers, there will be a war for resources (In particular water) in my lifetime in the APAC region, and, sadly, New Zealand might bear the worst brunt of that battle (Well, they do have more “freely” available fresh water than Australia, or any other country in this space to be honest, does).
You know, it would be nice if we had some sort of profession where people gathered data, tried to ensure it was accurate, and recorded and disseminated such data along with trends and patterns. As opposed to fudging, deleting, massaging, and re-imagining data to fit an agenda.
I wonder what we would call such a profession?
The 1930’s in the USA has the lowest lows and the highest highs. I think there is good reason for this, in particular 1936. The PDO was about to flip into negative, but more important is perhaps the solar position. SC16 which peaked around 1928 was weak with a today count of around 75 SSN. Take off the Waldmeier/Wolfer inflation factor and this cycle would be close to a Dalton Minimum cycle. 1936 is near cycle minimum, so with the already weak preceding solar max the EUV values would be similar or less than today.
This is the pattern that looks to be occurring during low EUV, extremes at both ends of the temperature scale probably because of pressure differential changes that produce unusual pressure cell configurations that in turn form blocking patterns to jet streams. We have seen this occur in many regions over the past 2-3 years, Russia burning, South America freezing, Japan at record highs with Australia recording 30 year high snow falls and now amidst severe flooding like Pakistan. Go back a little further and we see record high temps in Australia with massive bush fires that rocked our world while the NH winter was a white out.
EUV is a big player of extremes, add to that the concurrent PDO and Arctic/Antarctic oscillations with the likelihood of stronger/more frequent La Nina episodes coupled with less ocean heat uptake, the stage is set for more extremes but with a downward trend.
You can see the code for this in the MET’s released version of CRUTEM in the station_gridder.perl
# Round anomalies to nearest 0.1C - but skip them if too far from normal
if ( defined( $data{normals}[$i] )
&& $data{normals}[$i] > -90
&& defined( $data{sds}[$i] )
&& $data{sds}[$i] > -90
&& abs( $data{temperatures}{$key} - $data{normals}[$i] ) <=
( $data{sds}[$i] * 5 ) )
{
$data{anomalies}{$key} = sprintf "%5.1f",
$data{temperatures}{$key} - $data{normals}[$i];
}
It is well documented that the winter season has the highest variability.
Jones, et al, 1982, Variations in Surface Air Temperatures: Part 1. Northern Hemisphere, 1881-1980
Jones, et al, 1999, Surface Air Temperature and Its Changes Over the Past 150 Years
As to documenting the standard deviation drops …
Jones and Moberg, 1992, Hemispheric and Large-Scale Surface Air Temperature Variations: An Extensive
Revision and an Update to 2001
Jones notes in an earlier paper that the corrections were prevalent for Greenland. These wx record coding errors remind me of the sort of thing that Anthony himself was pointing out here.
Does Jones inappropriately drop or adjust Feb 1936 records? Someone should go look. Most of the CRU data sets are available here:
http://www.metoffice.gov.uk/climatechange/science/monitoring/subsets.html
———–
Just a side note, Mosher’s stuff was posted prematurely with unlabeled graphs and abbreviated discussion (for instance – which paper is ‘the paper’ reread by Mosher). Given how careful he was with the response to McKitrick, I wonder if he personally approved posting these notes in draft form?
For us more ordinary people, please point out what “5 sigma” stands for. We await your elucidation.
Personally I am familiar with 6 sigma, as in being told that’s “about one in a million,” when the machine shop I was at, right after ISO 9002 certification, decided the Six Sigma quality assurance program was something the customers wanted and it would get us more business. “6 sigma” would have been the maximum defect rate allowed on parts that got shipped. You may freely guess the results, and anyway “that’s a goal to work towards.” And as with the American economy and the Stimulus Bill, if we hadn’t had that plan then the layoffs would have been much worse, of course, obviously, without a doubt, which those who recommended the Six Sigma (SS?) plan were certain of.
For the love of God, please put comprehensive labels on the axes, and include keys when there are multiple graphs in one plot. You could be reaching a MUCH larger audience with just a little more work. I’m a Ph.D. research scientist, and if I can’t discern what’s in a graph, what hope is there for the average first-time visitor?
Anthony says: “As a result, CRU data was much warmer than his analysis was, almost 2C.”
Mosher says: “Over all Moshtemp is warmer that CRU…”
So with these 5 sigma deletions, CRU is almost 2°C warmer than Moshtemp, and Moshtemp is overall warmer than CRU.
Is this one of those dangerous CAGW-inducing positive feedbacks?
What do the various colors signify on the graphs?
Detailed analysis on a regional and country basis by E.M.Smith/Chiefio have also shown consistent trends of winter records being manipulated and adjusted, but with no clear explanations of how and why. Maybe this is identifying the same thing?
Although this paper is not adequately illustrated and summarized, I get enough to see an interesting problem with tossing 5 sig for the future! If we are to have runaway global warming, soon the temp anomalies will be more frequently over 5 sig – are they going to toss these, too? You can bet your life they wont. This is a temporary thing to ensure the trend until they don’t need it anymore to maintain the present hotter than before, then, like GISS, they will re-adjust the earlier numbers.
Trimming what seem extreme outliers (“5-sigma” events) is never a good policy, because these tend to indicate inflection-points where positive regimes reverse to negative, and vice versa. Given valid data, such selective processing is a form of arbitrary smoothing, unjustifiably excluding anomalies which may in fact be no such thing.
So-called psychic researchers such as J.B. Rhine play this game in reverse, ignoring null-results in favor of “flash-points” supportive of their subjective theses. CRU’s unacknowledged resort to such seasonal manipulation completely invalidates any and all Warmist conclusions, obviating the very nature of their pseudo-scientific enterprise. If the Hadley Center’s fancy academics do not know, much less admit to this, they stand as one with Rene Blondlot, Trofim Lysenko, Immanuel Velikhovsky, and other charlatans of that same ilk.
Sorry Bob,
The Plots are just made on the fly as I walk through the data, For the chart in question, Red is the MAXIMUM departure from normal, blue is the minium. orange is +- 1.96sd, black is the mean. “index” is month from 1-12. For me this is just a curiousity that other can go take a look at as I’m working on debugging other stuff.
The procees of throwing out 5 sigma events has a seasonal bias
If this is true, then it cannot be seen as anything other than scientific misconduct, unless the CRU can justify this step on physical grounds.
Noblesse Oblige says:
September 5, 2010 at 6:24 am
> Winter/cold temps are more volatile because air is dryer. Therefore standard deviation in winter is higher.
More than just that – the coldest region, the northern US, can get get air masses from the polar, arctic, Pacific, and Gulf of Mexico regions during the winter. During the summer the northern air masses generally don’t make it down between mid-June and mid_August.
It shouldn’t matter – the 5 sigma range widens in the winter. Umm, 5 sigma of exactly what? If it’s 5 sigma of all monthly temperatures, then that really sucks. If it’s 5 sigmas of all the Februaries, that’s another matter. It it’s 5 sigmas of daily temps, then there’s a decent chance it’s throwing away bad data (e.g. missing signs, though for Feb ’36, I could believe North Dakota be either -14 or +14 °F).
When I used to acquire large quantities of data electronically I used to use Chauvenet’s criterion for outlier rejection, basically you reject data that falls outside a probability (normal) of 1/(2n). In my datasets that amounted to 3.5 sigma. In the case of CRU data where they’re picking up data from all over the world it should help to get rid of the more egregious errors (like the Finnish dropped negative sign data for example). Seems prudent.
That’s a lot like what’s happening now. Not quite as bad yet.
What was so significant about this cold wave was that the 1930s had some of the mildest winters in the US history. 1936 was also one of the coldest years in the 1930s. And the winter was followed one of the warmest summers on record which brought on the 1936 North American heat wave.
That’s classic weather perturbed by volcanoes, and there were a lot of them in the ’30s.
http://www.volcano.si.edu/world/find_eruptions.cfm
So in line with the missing m for minus from airport data its exclude real data if it exceeds the preconcieved results? Or would the missing m ,have not mattered at CRU because all cold winter weather will now be 5 sigma points off the expected trend? More torture the data until it confesses?