Monthly Averages, Anomalies, and Uncertainties

Guest Post by Willis Eschenbach

I have long suspected a theoretical error in the way that some climate scientists estimate the uncertainty in anomaly data. I think that I’ve found clear evidence of the error in the Berkeley Earth Surface Temperature data. I say “I think”, because as always, there certainly may be something I’ve overlooked.

Figure 1 shows their graph of the Berkeley Earth data in question. The underlying data, including error estimates, can be downloaded from here.

B.E.S.T. annual land surface average tempFigure 1. Monthly temperature anomaly data graph from Berkeley Earth. It shows their results (black) and other datasets. ORIGINAL CAPTION: Land temperature with 1- and 10-year running averages. The shaded regions are the one- and two-standard deviation uncertainties calculated including both statistical and spatial sampling errors. Prior land results from the other groups are also plotted. The NASA GISS record had a land mask applied; the HadCRU curve is the simple land average, not the hemispheric-weighted one. SOURCE

So let me see if I can explain the error I suspected. I think that the error involved in taking the anomalies is not included in their reported total errors. Here’s how the process of calculating an anomaly works.

First, you take the actual readings, month by month. Then you take the average for each month. Here’s an example, using the temperatures in Anchorage, Alaska from 1950 to 1980.

anchorage raw data plus avgFigure 2. Anchorage temperatures, along with monthly averages.

To calculate the anomalies, from each monthly data point you subtract that month’s average. These monthly averages, called the “climatology”, are shown in the top row of Figure 2. After the month’s averages are subtracted from the actual data, whatever is left over is the “anomaly”, the difference between the actual data and the monthly average. For example, in January 1951 (top left in Figure 2) the Anchorage temperature is minus 14.9 degrees. The average for the month of January is minus 10.2 degrees. Thus the anomaly for January 1951 is -4.7 degrees—that month is 4.7 degrees colder than the average January.

What I have suspected for a while is that the error in the climatology itself is erroneously not taken into account when calculating the total error for a given month’s anomaly. Each of the numbers in the top row of Figure 2, the monthly averages that make up the climatology, has an associated error. That error has to be carried forwards when you subtract the monthly averages from the observational data. The final result, the anomaly of minus 4.5 degrees, contains two distinct sources of error.

One is error associated with that individual January 1951 average, -14.7°C. For example, the person taking the measurements may have consistently misread the thermometer, or the electronics might have drifted during that month.

The other source of error is the error in the monthly averages (the “climatology”) which are being subtracted from each value. Assuming the errors are independent, which of course may not be the case but is usually assumed, these two errors add “in quadrature”. This means that the final error is the square root of the sum of the squares of the errors.

One important corollary of this is that the final error estimate for a given month’s anomaly cannot be smaller than the error in the climatology for that month.

Now let me show you the Berkeley Earth results. To their credit, they have been very transparent and reported various details. Among the details in the data cited above are their estimate of the total, all-inclusive error for each month. And fortunately, their reported results also include the following information for each month:

estimated B.E.S.T. monthly average errorsFigure 3. Berkeley Earth estimated monthly land temperatures, along with their associated errors.

Since they are subtracting those values from each of the monthly temperatures to get the anomalies, the total Berkeley Earth monthly errors can never be smaller than those error values.

Here’s the problem. Figure 4 compares those monthly error values shown in Figure 3 to the actual reported total monthly errors for the 2012 monthly anomaly data from the dataset cited above:

error estimates in 2012 berkeley earth dataFigure 4. Error associated with the monthly average (light and dark blue) compared to the 2012 reported total error. All data from the Berkeley Earth dataset linked above.

The light blue months are months where the reported error associated with the monthly average is larger than the reported 2012 monthly error … I don’t see how that’s possible.

Where I first suspected the error (but have never been able to show it) is in the ocean data. The reported accuracy is far too great given the number of available observations, as I showed here. I suspect that the reason is that they have not carried forwards the error in the climatology, although that’s just a guess to try to explain the unbelievable reported errors in the ocean data.

Statistics gurus, what am I missing here? Has the Berkeley Earth analysis method somehow gotten around this roadblock? Am I misunderstanding their numbers? I’m self-taught in all this stuff and I’ve been wrong before, am I off the rails here? Always more to learn.

My best to all,

w.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

266 Comments
Inline Feedbacks
View all comments
kuhnkat
August 17, 2013 6:58 pm

“If you want to call this something different than temperatures then humpty dumpty has a place on the wall next to him.”
Nice to know where to find you Moshpup!! 8>)

August 17, 2013 7:01 pm

I carried that argument, Steve. If you don’t understand that after all that was written, then your view at best reflects incompetence.
huh? Pat you lost. repeatedly.

Robert of Ottawa
August 17, 2013 7:01 pm

Your reasoning is correct. Total error cannot be smaller than measurement error.

August 17, 2013 7:05 pm

For those who’d like to evaluate Steve Mosher’s view of the debate, my first post at Jeffid’s on systematic measurement uncertainty in the global air temperature record is here, and the continuation is here.
The exchanges stand on their own, and I don’t intend to reignite the debate here. However, Steve has made a personal attack against me; my complete defense is the record.

BarryW
August 17, 2013 7:08 pm

In essence, an anomaly is just creating an offset from some (somewhat) arbitrary base line. No different than setting an offset on an oscilloscope. The question I have is related to the multiple baselines used. The data from each month’s average is converted into an anomaly relative to that months baseline. Each month’s baseline will have a different associated error, hence won’t there be be a potential misalignment between different months? Even worse, if the range used for a particular month is biased relative to the entire dataset for that month (say a cold series of winters), wouldn’t it induce a bias in the anomaly for that month relative to the other months?

August 17, 2013 7:09 pm

Steve: “huh? Pat you lost. repeatedly.
If you really believe that, Steve, then you didn’t understand the substantive content then and still don’t understand it today.

Jeff Cagle
August 17, 2013 7:10 pm

Willis,
I join you in wondering about uncertainties. However, I don’t think the uncertainty in the baseline monthly average is a source of error here.
Here’s why: Since we are looking for trends — mostly secular trends using linear regression — we are only concerned with the errors that affect the confidence interval of the slope.
But the baseline monthly average has an effect only on the confidence interval of the intercept of our anomaly graph. So that might affect some sensationalist headlines (“Temperatures are up 2.0 deg C since 1850”, when the true value might be 1.8 deg C), but it will not affect most of them, for the simple reason that most of the headlines focus on the purported effects of forcing, expressed in terms of a slope: “We expect a 0.2 deg C rise per decade for the next two decades.” Those remain unchanged by an epsilon in the monthly baseline.
In other words, we can safely accept the baseline monthlies as givens, and look for trends from there.

August 17, 2013 7:11 pm

“Given the weather/climate is far from a stable system why are we using a moving average to calculate anomalies from. Surely when you take the impacts of natural forcing’s on temperature over a long period of time a moving average is not a good measure to be using to calculate an anomaly, even the same month over a long period of time?”
we dont use moving averages to calculate anomalies from.
1. Take every temperature recording at x,y,z and time t
2. remove the climate ( geographically based) at every x,y,z,t
you are left with a random residual called the weather
T = C + W
the temperature at any given time/location is a function of the climate ( C) and the weather W
C is expressed as a function of geography only ( and time )
The residual ( W) is then interpolated using kriging.
So for every time and place you have two fields. a climate field and a weather field
Anomalies are not even a part of our approach. In the end we have a field that is in temperature. there is no base period. at every time step we have used all the data for that time step to solve the field at that time step. no base period. no anomalies.
After its all said and done we integrate the field. other folks take averages of stations. we dont.
we construct a field and then integrate the field. That gives you a time series in temperature ( C)
Then, because other folks want to compare their anomalies to our temperatures we provide anomaly data. And we give you the climatology ( unlike hansen or spenser ) so that you can go back to the field in C

August 17, 2013 7:14 pm

BarryW, global temperature anomalies are typically referenced against a 30-year mean. GISS uses 1951-1980, CRU has typically used 1961-1990, e.g.

geran
August 17, 2013 7:16 pm

Steven Mosher says:
August 17, 2013 at 6:56 pm
Steve, thanks for the direct response. I sincerely respect what you have done for climate science. I only criticize you because I don’t want you to fail your calling. You see thru most of the false science, and you are repelled. You are to be praised for that.
Don’t spend your time putting down us “extremists” (aka “the “D” word). Just do your task to bring out the TRUTH.

Geoff Withnell
August 17, 2013 7:20 pm

From the NIST Engineering Statistics Handbook “Accuracy is a qualitative term referring to whether there is agreement between a measurement made on an object and its true (target or reference) value.” Since we do not have a reference value, and the true value is what we are trying to determine, accuracy statements are essentially meaningless in this case. Individual measurements can have uncertainty associated with them from the precision (repeatability/reproducibility) and the uncertainty of aggregate quantities such as averages calculated. But since the “true” or “reference” value is essentially unknowable, accuracy as such is not a useful term.

David Riser
August 17, 2013 7:24 pm

Steve,
The issue I have with your methods directly relates to the work that Anthony did on siting issues with temperature stations. Your methods follow NOAA and NASA bias (excerpt from your methodology: In most practical uses of Kriging it is necessary to estimate or approximate the covariance matrix in equation (9) based on the available data [1,2]. NOAA also requires the covariance matrix for their optimal interpolation method. We will adopt an approach to estimating the covariance matrix that preserves the natural spatial considerations provided by Kriging, but also shares characteristics with the local averaging approach adopted by NASA GISS [3,4]. If the variance of the underlying field changes slowly as a function of location, then the covariance function can be replaced with the correlation function, 𝑅𝑅􀵫𝑎𝑎⃑,𝑏𝑏􁈬⃑􀵯, which leads to the formulation that:”
In NASA’s data they have made an assumption that low temps are outliers which from reading the appendix BEST did as well. Because the extra weighting and the scalpel effects you essentially increased the apparent UHI effect increasing systematic error instead of reducing it. I am still not sold on the idea that statistics can magically remove error. If you went back through the data and did a recalc using the highest temps as an outlier based on the idea of UHI being a serious issue than perhaps your graph would turn out differently but I am still not sure that it would be an accurate representation of 𝜃(𝑡).
I find it interesting that the trend for BEST compared to the trend for the sat data over land is much steeper in slope. I am pretty sure that the SAT trend has fewer errors in it than any land based multi station system.
http://www.woodfortrees.org/plot/best-lower/from:1980/to:2010/plot/best-upper/from:1980/to:2010/plot/uah-land/from:1980/to:2010/plot/rss-land/from:1980/to:2010/plot/best/from:1980/to:2010/trend/plot/rss-land/from:1980/to:2010/trend/plot/uah-land/from:1980/to:2010/trend

dp
August 17, 2013 7:26 pm

Is it not the case that the time series used to calculate the average typically is older by a good margin than the data set under analysis? This is why we see modern anomalies compared to 1979-2000 data, for example.

u.k.(us)
August 17, 2013 7:33 pm

Steven Mosher says:
August 17, 2013 at 7:11 pm
“Then, because other folks want to compare their anomalies to our temperatures we provide anomaly data. And we give you the climatology ( unlike hansen or spenser ) so that you can go back to the field in C”
===============
Then other folks use it as an excuse to further their agenda, misguided as it might be.

August 17, 2013 7:37 pm

Steve: “T = C + W … the temperature at any given time/location is a function of the climate ( C) and the weather W
That’s the same methodological mistake made at CRU, explicitly revealed by Phil Brohan, & co, 2006.
Measured temperature is what’s being recorded. “Measured” is the critical adverb here.
In your formalism, Steve, actual measured temperature = T_m_i = C + W_i + e_i, where “i” is the given measurement, and e_i is the error in that measurement. Most of that error is the systematic measurement error of the temperature sensor itself. CRU completely ignores that error, and so does everyone else who has published a global average.
My own suspicion is that people ignore systematic measurement error because it’s large and it cannot be recovered from the surface air record for most of 1880-2013. Centennial systematic error could be estimated by rebuilding early temperature sensors and calibrating them against a high-accuracy standard, such as a two-point calibrated RM Young aspirated PRT probe (accurate to ~(+/-)0.1 C). To get a good estimate of error and accuracy one would need a number of each such sensor scattered globally about various climates, with data collection over at least 5 years. But that’s real work, isn’t it. And some important people may not like the answer.

Admin
August 17, 2013 7:47 pm

My understanding is you can’t treat a temperature series as independent, because temperatures are “sticky” – a cold year is much more likely to follow a previous cold year. So each point on say your monthly series is not truly independent from measurements of the same month in adjacent years. I don’t know what this does to the error calculation.

TalentKeyHole Mole
August 17, 2013 8:34 pm

Hello,
A very good discussion. Lots to learn and digest.
Yes. I asked a question earlier about the probability of ‘a’ particular temperature ‘occurring’ at ‘a’ particular thermometer, whether on — not actually ‘on’ but usually about 1 meter to 2 meter above the ground or in — within a meter below the ocean surface or to due to storminess was more near 5 meter below the ocean surface because of ocean wave dynamics on sea-going mercury thermometer.
I do hope that in the future, yes I really do, those who are ‘reading’ the mercury thermometer — such as “on” land — do employ an accurate chronometer in order to know the ‘real’ time, and longitude a very important number, of the “reading” of the mercury thermometer and do awake in the early morning hours without effects of lack of sleep and ‘associated’ affects of alcohol over consumption — sometime long ago referred to as “consumption” as the papers of the day would read “death by consumption” said papers having been written in the US ‘Prohibition Era.’
Cheers

August 17, 2013 8:50 pm

Pat Frank said August 17, 2013 at 7:37 pm

My own suspicion is that people ignore systematic measurement error because it’s large and it cannot be recovered from the surface air record for most of 1880-2013. Centennial systematic error could be estimated by rebuilding early temperature sensors and calibrating them against a high-accuracy standard, such as a two-point calibrated RM Young aspirated PRT probe (accurate to ~(+/-)0.1 C). To get a good estimate of error and accuracy one would need a number of each such sensor scattered globally about various climates, with data collection over at least 5 years. But that’s real work, isn’t it. And some important people may not like the answer.

Bingo!

Third Party
August 17, 2013 8:57 pm

Whatta bout the the handling of the 1/4 day/year shift of the months vs the solar input. There should be some 4 year period signal in the data that needs to be handled correctly.

John Andrews
August 17, 2013 9:12 pm

What no one has discussed is the error on climate projections.

August 17, 2013 9:26 pm

John Andrews, I have a manuscript about that submitted to a professional journal. Suffice to say now that a straight-forward analysis shows the projection uncertainty increases far faster than the projected global surface air temperature. Climate models have no predictive value.

August 17, 2013 9:28 pm

Dear Willis, I am afraid you are confused. What’s calculated at the end is the average temperature and the error of the *average* of course *is* smaller than the error of the individual averaged quantities simply because the average also contains 1/N, the division by the number of entries.
So while the error of the sum grows like sqrt(N) if you correctly add (let us assume) comparable errors in quadrature, the (statistical) error of the average goes down like 1/sqrt(N).

Rud Istvan
August 17, 2013 9:34 pm

Willis, speaking to you as a Ph.D level econometrician, BEST’ mission was hopeless and their elegant methods a waste of time. Therefore your mission to deconstruct any error therein is very difficult in the Japanese sense. Don’t fall for faux sceptic Muller’s hype.
Recall AW has shown many land records are contaminated by paint, siting, and worse- by an amount greater than two century’s worth of anomalies. Recall that sea records were less than sketchy prior to the satellite era (buckets, engine inlets, trade routes,…), yet oceans comprise 79% of Earths surface and a heat sink much greater than land. All the homogenizations in the world of bad data can only result in bad pseudodata. As shown many times by folks like Goddard, who documenting upward GISS homogenization biases over time.
Any BEST re-interpretation of bad/ biased data can only produce bad/ biased results, no matter how valid the fancy methods used. GIGO applies to Berkeley, to Muller, and to data dreck.
Good Global temp data came only with the sat era since 1979, UAH or RSS interpreted.
Trying to interpret others interpretations of bad data is rather like interpreting the interpreter of a Delphic oracle. The true meaning of a steaming pile of entrails is… (self snip).

Crispin in Waterloo
August 17, 2013 9:47 pm

Hmmm… Where to enter the mix.
As Geoff says, precision is not accuracy. The target shooting example was very useful. Let’s use two examples. First fix the rifle on a permanent stand. Shoot 6 times at the target. The grouping of the hits is a measure of the precision of the rifle.
Now have a person fire the rifle at a fixed target. The grouping of the hits is a combination of the precision of the rifle and the precision of the shooter.
The nearness of the centre of the group of hits to the bullseye is the accuracy of the shooter.
A thermometer may be inaccurate (or not) at different temperatures. It may be an imprecise instrument with poor repeatability, or not. It may be read incorrectly.
These three problems persist in all temperature records.
Because the thermometers are not located in the same place there is no way to increase any particular readings precision or accuracy. Because all readings are independent, grouping them can’t yield any answer that is more precise. It would be like claiming the accuracy of 1000 shooters is greater if we average the position of the centers of all the groups of hits and that the calculated result is more precise than the rifle. All systematic and experimental errors accumulate and are carried forward.

AlexS
August 17, 2013 9:51 pm

“The raw data represents itself as temperatures recorded in C
Using that data we estimate the field in C
If you want to call this something different than temperatures then humpty dumpty has a place on the wall next to him.”
The raw data is the problem , you don’t have enough, neither in quantity, location and time(history) for the small differences.