Monthly Averages, Anomalies, and Uncertainties

Guest Post by Willis Eschenbach

I have long suspected a theoretical error in the way that some climate scientists estimate the uncertainty in anomaly data. I think that I’ve found clear evidence of the error in the Berkeley Earth Surface Temperature data. I say “I think”, because as always, there certainly may be something I’ve overlooked.

Figure 1 shows their graph of the Berkeley Earth data in question. The underlying data, including error estimates, can be downloaded from here.

B.E.S.T. annual land surface average tempFigure 1. Monthly temperature anomaly data graph from Berkeley Earth. It shows their results (black) and other datasets. ORIGINAL CAPTION: Land temperature with 1- and 10-year running averages. The shaded regions are the one- and two-standard deviation uncertainties calculated including both statistical and spatial sampling errors. Prior land results from the other groups are also plotted. The NASA GISS record had a land mask applied; the HadCRU curve is the simple land average, not the hemispheric-weighted one. SOURCE

So let me see if I can explain the error I suspected. I think that the error involved in taking the anomalies is not included in their reported total errors. Here’s how the process of calculating an anomaly works.

First, you take the actual readings, month by month. Then you take the average for each month. Here’s an example, using the temperatures in Anchorage, Alaska from 1950 to 1980.

anchorage raw data plus avgFigure 2. Anchorage temperatures, along with monthly averages.

To calculate the anomalies, from each monthly data point you subtract that month’s average. These monthly averages, called the “climatology”, are shown in the top row of Figure 2. After the month’s averages are subtracted from the actual data, whatever is left over is the “anomaly”, the difference between the actual data and the monthly average. For example, in January 1951 (top left in Figure 2) the Anchorage temperature is minus 14.9 degrees. The average for the month of January is minus 10.2 degrees. Thus the anomaly for January 1951 is -4.7 degrees—that month is 4.7 degrees colder than the average January.

What I have suspected for a while is that the error in the climatology itself is erroneously not taken into account when calculating the total error for a given month’s anomaly. Each of the numbers in the top row of Figure 2, the monthly averages that make up the climatology, has an associated error. That error has to be carried forwards when you subtract the monthly averages from the observational data. The final result, the anomaly of minus 4.5 degrees, contains two distinct sources of error.

One is error associated with that individual January 1951 average, -14.7°C. For example, the person taking the measurements may have consistently misread the thermometer, or the electronics might have drifted during that month.

The other source of error is the error in the monthly averages (the “climatology”) which are being subtracted from each value. Assuming the errors are independent, which of course may not be the case but is usually assumed, these two errors add “in quadrature”. This means that the final error is the square root of the sum of the squares of the errors.

One important corollary of this is that the final error estimate for a given month’s anomaly cannot be smaller than the error in the climatology for that month.

Now let me show you the Berkeley Earth results. To their credit, they have been very transparent and reported various details. Among the details in the data cited above are their estimate of the total, all-inclusive error for each month. And fortunately, their reported results also include the following information for each month:

estimated B.E.S.T. monthly average errorsFigure 3. Berkeley Earth estimated monthly land temperatures, along with their associated errors.

Since they are subtracting those values from each of the monthly temperatures to get the anomalies, the total Berkeley Earth monthly errors can never be smaller than those error values.

Here’s the problem. Figure 4 compares those monthly error values shown in Figure 3 to the actual reported total monthly errors for the 2012 monthly anomaly data from the dataset cited above:

error estimates in 2012 berkeley earth dataFigure 4. Error associated with the monthly average (light and dark blue) compared to the 2012 reported total error. All data from the Berkeley Earth dataset linked above.

The light blue months are months where the reported error associated with the monthly average is larger than the reported 2012 monthly error … I don’t see how that’s possible.

Where I first suspected the error (but have never been able to show it) is in the ocean data. The reported accuracy is far too great given the number of available observations, as I showed here. I suspect that the reason is that they have not carried forwards the error in the climatology, although that’s just a guess to try to explain the unbelievable reported errors in the ocean data.

Statistics gurus, what am I missing here? Has the Berkeley Earth analysis method somehow gotten around this roadblock? Am I misunderstanding their numbers? I’m self-taught in all this stuff and I’ve been wrong before, am I off the rails here? Always more to learn.

My best to all,

w.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
266 Comments
Inline Feedbacks
View all comments
gutowski
August 18, 2013 3:40 am
August 18, 2013 4:08 am

I see, one variable(T) and its anomalies are climate?

dmacleo
August 18, 2013 4:18 am

I’m not qualified to speak on the actual data/topic but wanted to say I am glad there are people always examining the data.
thanks willis.

AndyL
August 18, 2013 4:23 am

Great discussion.
Would it be possible to design a worked example to test out what everyone is saying? Just specifying it would probably help clarify what people mean by different types of error and how they are accounted for.

Nick Stokes
August 18, 2013 4:23 am

Willis Eschenbach says: August 18, 2013 at 12:39 am
“It’s not clear what you mean by “rather small”. I also don’t understand the part about 1/30 of the total. Take another look at the data in Figure 2. Each individual anomaly at any time is calculated by taking the observations (with an associated error) and subtracting from them the climatology (again with an associated error). The month-by-month standard error of the mean for the 30-year reference period ranges from 0.15 to 0.72°C, with an average of 0.41°C, without adjusting for autocorrelation. Is that “rather small”? Seems large to me.”

I mean small relative to the month to month variation. If you calculate a global or regional trend, say, then the error you will associate with that derives ultimately from the variance of the monthly readings about the clinatology. Anomalies have greater variance, because you subtract a 30-year mean. But that mean has a variance smaller by a factor of 30 than the individual errors, so the additional variance makes that 3.3% difference, approx.
But I see now that the figures that you have quoted in Fig 3 seem to be a measurement error. The climatology error is indeed quite large compared to those. So…
“If (as you say) I want to know whether January 2012 was hotter than February 2011, I need to know the true error of the anomalies. Otherwise, if the two months come up half a degree different, is that significant or not? I can’t say without knowing the total error of the anomalies.”
For the individual station, you don’t need anomalies at all for that, but adding any number, if it’s the same for both, won’t change the difference, or your uncertainty about it. If you’re aggregating anomalies in a global or regional average, the same is still true. The climatology has sampling error, but it’s always the same actual number added to or subtracted from each.
But coming back to Fig 3, that seems to be a weather measurement error (instrumental etc). You go through some arithmetic procedure to compute the anomaly, and they have calculated how the measurement error aggregates. It’s the error you would see if you could go back and re-measure exactly the same weather subject to instrument variation.
The anomaly error is a sampling error, which is of a different kind. It’s an error that would be reflected if you went back and measured a different set of weather, under the same climate conditions. So it is relevant whenever you want to express something that is a measure of climate, like temperature trend. Then what matters is how large it is compared to the other things that are making it hard to deduce climate from weather. That is, mainly, month to month variation.
“It also comes into play when comparing say the changes in the spring, summer, fall, and winter temperatures. We need to know the errors to understand the significance of what we find.”
Yes,you do. But now you’re thinking climate. And your uncertainty will be dominated by the monthly variation in temperature. Climatology error will be small relative to that, as will be instrumental uncertainty.

August 18, 2013 4:29 am

When going back to older data, not only is there an error associated with the reading, there is an addtional error associated with any calculated average. The averages where possibly calculated differently in different countries, but they did not have as many readings as you would today. So the averages will sometimes contain an additional error coming from the agreed way to approximate them.

Jeff Condon
August 18, 2013 4:30 am

Pat Frank,
“I carried that argument, Steve. If you don’t understand that after all that was written, then your view at best reflects incompetence.”
Without reopening a sore spot, I don’t remember it that way.
My opinion is that the BEST CI method is likely to be close but has the potential for big errors depending on the distribution of values.

Nick Stokes
August 18, 2013 4:34 am

Jeff Condon says: August 17, 2013 at 5:27 pm
Jeff, sorry about the slow response on this. It’s been a busy day here. I didn’t follow your series of posts very thoroughly initially, so I’m catching up. Hope to be able to make more sense soon.

David
August 18, 2013 4:54 am

It is a certainty that the measurement errors are systemic, and that is a can of worms, varying by instruments, by location changes, by nations, by mumber of stations, by station locations. It is also a certainty that the recorded official T is still being changed. The recorded data sets have been changed many times, and are still changing, so from which data set do you get your meausremnts and anomalies.
A far more accurate way of getting the measurement uncertainty is to observe it, instead of teasing it out through numeric sophistery. That is, observe the land based differences in anomaly trends, verses the satelite trend over the same land area. Otherwise it is fair to ask why the vast majority of the changes to measured T warm the present, and cool the past.
Of course, this is all academic, as all the reported disasters of CAGW are not happening. (Even NH sea ice is now strongly rebonding with changes in ocean currents and jet streams, besides which, no one ever gave observational evidence of world calamity with melting NH sea ice) ) So the3 disaster are not happening, and neither is the warming. In CAGW neither the “C” or the “W” (for the past 15 years) is happening. The entire SH never warmed much at all so the “G” is also missing. Very Shakeperian, “Much ado about nothing”

HaroldW
August 18, 2013 4:57 am

Willis: “the final error estimate for a given month’s anomaly cannot be smaller than the error in the climatology for that month.”
This statement is not true in general. A static bias term will be present in the climatology and the monthly (absolute) error estimates, but will not be present in the anomaly.
However, I don’t know if this is the case here.

John Norris
August 18, 2013 5:16 am

Pat Frank
“shield irradiance, ground albedo (including winter snow), and wind speed all impact the measurement accuracy of otherwise well-functioning air temperature sensors. ….”
Okay, so those all sound like results of an imperfect sensor system so I’d categorize that as error type #1 and then would choose to include it in uncertainty, unlike error type #3.

Nick Stokes
August 18, 2013 5:44 am

Willis,
“When we “remove seasonality” by using a base period, as is quite common in climate science, we introduce a repeating series of twelve different errors. And contrary to your and Nick’s claims, that act of removing seasonality does indeed increase the standard error of the trend.”
You won’t change the trend for individual months. That’s because you’ve added the same number to each month, and if it fluctuates due to anomaly sampling, that won’t matter.
It will make a small difference to the annual trend. The arithmetic is – figure what difference the base errors would make to a 1-year trend, then divide by the number of years in the trend. It’s an end effect.

Nick Stokes
August 18, 2013 5:56 am

AndyL says: August 18, 2013 at 4:23 am
“Great discussion.
Would it be possible to design a worked example to test out what everyone is saying?”

It’s probably best to just think of a Monte Carlo. You’d set up an anomaly calc procedure and start by just generating measurement variation. You’d get the kind of errors shown in Fig 3.
Then you’d have to figure a distribution for monthly weather variation. Add that in to the Monte Carlo and you’d get the anomaly base error. But it makes sense only relative to the weather variation you have assumed.

Tom in Florida
August 18, 2013 5:57 am

This may not be relevant to this thread but I have always wondered why monthly temperature measurements are grouped by man made calendars. Wouldn’t it make more sense to compare daily temperatures over a period by using celestial starting and ending points so they are consistant over time?. The Earth is not at the same point relative to the Sun on January 1st every year, will this type of small adjustment make any difference? Perhaps full moon to full moon as a period to average?

AndyL
August 18, 2013 6:25 am

Nick Stokes says:
August 18, 2013 at 5:56 am
It’s probably best to just think of a Monte Carlo

I was thinking of a much more basic thought experiment.
Start with lots of devices that measure twice a day.
Each device has an assumed accuracy of x
Human reading is to the nearest 0.5 degree which introduces error y
Tmax and Tmin are averaged to give daily average which does zz to the error
Daily average is averaged again to give monthly accuracy
Interpolations to cover missing values
Outliers are removed etc etc
Give all these errors names so that everyone is clear they are talking about the same thing, then show how they are accounted for and whether or not they are significant

Bill Illis
August 18, 2013 6:34 am

How many Anchorage temperature datasets are there?
Berkeley Earth splits one station – Anchorage Merrill Field – into 16 different separate stations. What is the monthly average when you jack-knife a single station into 16 separate ones
http://berkeleyearth.lbl.gov/auto/Stations/TAVG/Figures/165577-TAVG-Comparison.pdf
http://berkeleyearth.lbl.gov/stations/165577
And then beyond Merrill Field, there are 12 other Anchorage stations – each with its own number of separate jack-knife stations – extending only back to 1916.
http://berkeleyearth.lbl.gov/station-list/?phrase=anchorage
It is just a mess and, there is no way, the methodology itself has not introduced significant systematic errors.

Richard M
August 18, 2013 6:55 am

Interesting discussion, but to me it’s attacking the gnat on the elephant’s butt instead of looking at the elephant. Personally, I’ve always felt the entire approach to generating adjusted historic temperature data sets was misguided (all of them).
What I see is potentially biased opinions being inserted into the data whereas they should be placed in the error bars. TOBS should NOT be used to adjust data, it should be used to created error bars. Same with UHI and all other modifiers to the current data. Leave the poor data alone. I suspect if this were done we would see an entirely different (and more accurate) view of historic temperatures.

August 18, 2013 7:12 am

Can anyone confirm or refute the accuracy of the following simple-minded explanation that I put together for myself to understand what the issue is? I may not be the only reader here who is not as comfortable with the jargon as the disputants are.
If you want to know whether Aprils are getting warmer, say the BEST proponents according to my understanding, it doesn’t much matter what the error is in the average-April number you subtract from the individual-April numbers to get the April anomaly. E.g., it doesn’t matter much whether those anomalies advance from year to year as 0.1, 0.2, 0.3, …, or as 0.5, 0.6, 0.7; the trend is still 0.1/year. So the error in the “climatology” (average of the April temperatures over some number of years) does not contribute to an error in the trend.
But as I (erroneously, no doubt) understand it, Mr. Eschenbach isn’t worrying about whether Aprils are getting warmer. He’s concerned with whether a given year’s March is warmer on a seasonally adjusted basis than its April (than which on an absolute, rather than a seasonally adjusted, basis March is usually cooler here in the Northern Hemisphere). That is, is that year’s March warmer than usual by a greater extent than that year’s April? If we think average March and April temperatures are usually 40 & 50 deg. F. respectively when actually they’re respectively 42 and 48, then the answer you’l come to when a given year’s values are respectively 41 and 49 will be affected by that error: “climatology” errors do matter to this question.
Is this anywhere near a statement of what the question before the house is? I may not be the only layman who would be grateful for an occasional translation of the issues into English.

August 18, 2013 7:20 am

The statement:
“If you want to know whether Aprils are getting warmer, say the BEST proponents according to my understanding, it doesn’t much matter what the error is in the average-April number you subtract from the individual-April numbers to get the April anomaly.”
seem debated. My focus is that on top of that, if some today measure april average as the average taken every second during the month, and earlier year was measured as the average of three readings per day, then there is a huge error added. Similarly, if different stations use different methods for their averaging, and then stations are used to adjust each other, errors are introduced.
— Mats —

Pamela Gray
August 18, 2013 7:23 am

My comment is very basic and has to do with linear trend. Linear trend is a straight line through a data series that is placed such that the data points above and below that line are at the smallest distance from that line over the entire series. Another statistic that can be generated from that distance is a single value of difference, which can be considered an “error” value. The greater that calculated value is the less one can say about the trend demonstrating a possible correlation between variables. The smaller that value is, the more confidence one can place in the data demonstrating a possible correlation. If one looks at any linear trend line through observed absolute or calculated anomaly data, I am eyeballing that the “error” calculation is quite large.
Please correct me if I am wrong in my simplified explanation, but sometimes we worry and fret over complicated statistical maths, when the most basic and telling ones are overlooked.

richardscourtney
August 18, 2013 7:32 am

Willis:
Your observation is good.
However, there is a more basic problem; viz.
there is no possible calibration for global temperature because the metric is meaningless and not defined.
This problem is not overcome by use of anomalies. I explain this as follows.
Each team preparing a global temperature time series uses a different method (i.e. different selection of measurement sites, different weightings to measurements, different interpolations between measurement sites, etc.). And each team often alters the method it uses such that past data is changed; see e.g. http://jonova.s3.amazonaws.com/graphs/giss/hansen-giss-1940-1980.gif
Hence, each determination of global temperature has no defined meaning: it is literally meaningless. And an anomaly obtained from a meaningless metric is meaningless.
If global temperature were defined then a determination of it would have a meaning which could be assessed if it could be compared to a calibration standard But global temperature is not a defined metric and so has no possible calibration standard.
A meaningless metric is meaningless, the errors of an undefined metric cannot determined with known accuracy, and the errors of an uncalibrated measurement cannot be known.
The errors of a measurement are meaningless and undefinable when they are obtained for a meaningless, undefined metric with no possibility of calibration.
Richard

highflight56433
August 18, 2013 7:39 am

There is on obvious problem with daily high/low averages, that is this: if the day time high is 72F for three hours vs 72 for 5 minutes, or the low is -15F for 5 hours vs 1 hour. See the problem with leaving duration our of the picture?

highflight56433
August 18, 2013 7:40 am

Repost with correct words: There is an obvious problem with daily high/low averages, that is this: if the day time high is 72F for three hours vs 72 for 5 minutes, or the low is -15F for 5 hours vs 1 hour. See the problem with leaving duration out of the picture?

Luther Wu
August 18, 2013 8:02 am

Nick Stokes says:
August 17, 2013 at 5:13 pm
An irony here is that skeptics have been nagging climate scientists to get the help of statisticians. So when a group of statisticians (at BEST) do get involved, what we hear here is “Climate science gets it wrong again”.
_________________________
It is very simple, Nick: Why don’t you show us what “Climate Science” gets right?

Jeff Condon
August 18, 2013 8:09 am

Thanks Nick

1 3 4 5 6 7 11