Monthly Averages, Anomalies, and Uncertainties

Guest Post by Willis Eschenbach

I have long suspected a theoretical error in the way that some climate scientists estimate the uncertainty in anomaly data. I think that I’ve found clear evidence of the error in the Berkeley Earth Surface Temperature data. I say “I think”, because as always, there certainly may be something I’ve overlooked.

Figure 1 shows their graph of the Berkeley Earth data in question. The underlying data, including error estimates, can be downloaded from here.

B.E.S.T. annual land surface average tempFigure 1. Monthly temperature anomaly data graph from Berkeley Earth. It shows their results (black) and other datasets. ORIGINAL CAPTION: Land temperature with 1- and 10-year running averages. The shaded regions are the one- and two-standard deviation uncertainties calculated including both statistical and spatial sampling errors. Prior land results from the other groups are also plotted. The NASA GISS record had a land mask applied; the HadCRU curve is the simple land average, not the hemispheric-weighted one. SOURCE

So let me see if I can explain the error I suspected. I think that the error involved in taking the anomalies is not included in their reported total errors. Here’s how the process of calculating an anomaly works.

First, you take the actual readings, month by month. Then you take the average for each month. Here’s an example, using the temperatures in Anchorage, Alaska from 1950 to 1980.

anchorage raw data plus avgFigure 2. Anchorage temperatures, along with monthly averages.

To calculate the anomalies, from each monthly data point you subtract that month’s average. These monthly averages, called the “climatology”, are shown in the top row of Figure 2. After the month’s averages are subtracted from the actual data, whatever is left over is the “anomaly”, the difference between the actual data and the monthly average. For example, in January 1951 (top left in Figure 2) the Anchorage temperature is minus 14.9 degrees. The average for the month of January is minus 10.2 degrees. Thus the anomaly for January 1951 is -4.7 degrees—that month is 4.7 degrees colder than the average January.

What I have suspected for a while is that the error in the climatology itself is erroneously not taken into account when calculating the total error for a given month’s anomaly. Each of the numbers in the top row of Figure 2, the monthly averages that make up the climatology, has an associated error. That error has to be carried forwards when you subtract the monthly averages from the observational data. The final result, the anomaly of minus 4.5 degrees, contains two distinct sources of error.

One is error associated with that individual January 1951 average, -14.7°C. For example, the person taking the measurements may have consistently misread the thermometer, or the electronics might have drifted during that month.

The other source of error is the error in the monthly averages (the “climatology”) which are being subtracted from each value. Assuming the errors are independent, which of course may not be the case but is usually assumed, these two errors add “in quadrature”. This means that the final error is the square root of the sum of the squares of the errors.

One important corollary of this is that the final error estimate for a given month’s anomaly cannot be smaller than the error in the climatology for that month.

Now let me show you the Berkeley Earth results. To their credit, they have been very transparent and reported various details. Among the details in the data cited above are their estimate of the total, all-inclusive error for each month. And fortunately, their reported results also include the following information for each month:

estimated B.E.S.T. monthly average errorsFigure 3. Berkeley Earth estimated monthly land temperatures, along with their associated errors.

Since they are subtracting those values from each of the monthly temperatures to get the anomalies, the total Berkeley Earth monthly errors can never be smaller than those error values.

Here’s the problem. Figure 4 compares those monthly error values shown in Figure 3 to the actual reported total monthly errors for the 2012 monthly anomaly data from the dataset cited above:

error estimates in 2012 berkeley earth dataFigure 4. Error associated with the monthly average (light and dark blue) compared to the 2012 reported total error. All data from the Berkeley Earth dataset linked above.

The light blue months are months where the reported error associated with the monthly average is larger than the reported 2012 monthly error … I don’t see how that’s possible.

Where I first suspected the error (but have never been able to show it) is in the ocean data. The reported accuracy is far too great given the number of available observations, as I showed here. I suspect that the reason is that they have not carried forwards the error in the climatology, although that’s just a guess to try to explain the unbelievable reported errors in the ocean data.

Statistics gurus, what am I missing here? Has the Berkeley Earth analysis method somehow gotten around this roadblock? Am I misunderstanding their numbers? I’m self-taught in all this stuff and I’ve been wrong before, am I off the rails here? Always more to learn.

My best to all,

w.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
266 Comments
Inline Feedbacks
View all comments
geran
August 17, 2013 4:56 pm

Nick, the TRUTH is the big picture–CO2 does not cause global warming. The attempt to distort the temp data does not contribute to the proper outcome.
Your choice, TRUTH or fiction?

August 17, 2013 5:00 pm

Nick is wrong, and so are you, Steve. The uncertainty in a mean is the root-sum-square of the errors of the individual entries going into the average. Nick is giving the average uncertainty in a single year, which is not the uncertainty in the mean.

August 17, 2013 5:07 pm

Since anomalies from different temperature ranges represent completely different values from an energy flux perspective, they cannot be compared, averaged, or trended. There is no physics argument I have ever seen that justifies doing so. So whatever mathematical errors their analysis may contain, it is sorta like pointing out to someone that they’ve put a band-aid on wrong while treating a severed limb.

A. Scott
August 17, 2013 5:10 pm

I agree with Nick …. I don’t know if he (or Willis) are right or wrong – but I do know denigrating comments absent supporting evidence do zero towards understanding the issue or finding answers.
Science should be about collaborative effort. And anytime you have knowledgeable folks willing to engage in discussion you should take advantage of it.

AlexS
August 17, 2013 5:12 pm

“In the end if you want to use absolute temperature willis then use them.”
They are not absolute temperatures because you don’t have a way measure them.

Nick Stokes
August 17, 2013 5:13 pm

Pat Frank says: August 17, 2013 at 4:53 pm
“Nick, the 1/30 uncertainty you cite is the average error in a single year of an average of 30 years. However, the uncertainty in the mean itself, the average value, is the single year errors added in quadrature, i.e., 30x the uncertainty you allow.”

The proposition that the standard error of the mean (of N iid random variables) is the individual error divided by sqrt(N-1) is ancient. What’s your alternative value? And yes, there are corrections for autocorrelation, but I don’t think that’s the point being made here.
An irony here is that skeptics have been nagging climate scientists to get the help of statisticians. So when a group of statisticians (at BEST) do get involved, what we hear here is “Climate science gets it wrong again”.

Jeff Condon
August 17, 2013 5:27 pm

Nick,
I wonder if you could address my critiques?

geran
August 17, 2013 5:39 pm

A. Scott says:
August 17, 2013 at 5:10 pm
Science should be about collaborative effort. And anytime you have knowledgeable folks willing to engage in discussion you should take advantage of it.
>>>>>
That is what WUWT is all about.
Welcome aboard.

August 17, 2013 5:45 pm

The errors I’m discussing aren’t iid, Nick. They’re systematic.
However, I did make a mistake: the uncertainty in the empirical mean of N values is sqrt{[sum-over-N-(errors)^2]/(N-1)}.
The uncertainty in any monthly anomaly of a 30-year climatology is the uncertainty of the mean plus the uncertainty in the monthly temperature added in quadrature, which is approximately 1.4x(mean uncertainty), not 1/30.
With a mean annual systematic station measurement uncertainty of ~(+/-)0.5 C, the approximate uncertainty in any annual anomaly is ~(+/-)0.7 C = (+/-)1-sigma, and there goes the pretext for alarm right out the inaccuracy window.

August 17, 2013 5:48 pm

Nick: “So when a group of statisticians (at BEST) do get involved, what we hear here is “Climate science gets it wrong again”.
When did statistics become science, Nick?

August 17, 2013 5:55 pm

Certainly, you are right here: an anomaly value of 0.10+/-0.18 is meaningless, but it is still put into the datastream.
The Argo data you question: as far as I can see the only way you can get the 0.001C values that are reported is if you take the data and use a “quadrature”. However, in order to use a quadrature to reduced error, I believe you have to be taking multiple readings of the same item using the same equipment in the same way. The Argo floats move, and there are 3500 of them. Each day they take readings at the same depth but of different water and different temperatures (even if off by a bit), and each one is mechanically different – same as all the land stations. The inherent error in each – as far as I can see – is the minimum of the instrumental reading at any given time. You cannot reduce the error estimate by the square root method because you are NOT bouncing around a steady state: all readings are not attempts to get to a constant truth.
The reduction in error here is like being at a firing range with two distant targets, one still and one moving. If you bang away at the still target, your grouping will eventually include the target center. The only variable – the influence on the “error” of your shot – is the shakiness of your hand. Now, try to hit a moving target. Here your variables are not just the shakiness of your hand, but your general targeting, wind conditions, elevation etc. Six shots at a moving target do not give you the same certainty of hitting the target as six shots at a stationary one.
The Argo floats and the land stations are dealing with a changing environment – a moving target. The reading of 14.5C +/- 0.5C means that the temperature could be 14.0C or 15.0C. The next reading of 13.5C +/- 0.5 means the temperature could be 13.0C or 14.0C. Note that the second reading has no application to the first reading. This principle applies to different stations also.
It is assumed that over multiple years temperatures will fluctuate similarly, so you try to read 14.5C and 13.5C at various times. True. So for those same attempts, multiple readings can be used to reduce the error for an average. But which ones? Did you try to re-read 14.5C or was it really 14.3C the second time?
If the observed parameter isn’t stable, then repeated measurements will not get you closer to the “true” value than the error of any individual measurement. With multiple stations READING DIFFERENT ITEMS, Argo or land, the combined error cannot be any better than the indiviudal station reading those same unstable item.
A statistical treatment of multiple stations in which you could say the trends of all are the same, and thus IMPOSE a trend on all, forcing individual stations to correct to that trend, could give you a lower error estimate of reality. But this would not be a “measured” feature, but an assumption-based calculation which might just as well reflect your imposed view as a feature of the environment.
The fact is that nature is messy and measurements are mostly crude. If what you seek to find is smaller than the tools at your disposal can handle, then you will not find nothing, you will find the culmination of the errors of your tools.
This is a fundamental fact of investigation: once you look, you will not find nothing but something.
In religious or spiritual circles it is a well-known phenomena and one which was firm enough for me to raise my kids in the Catholic Church: if you bring up a child to believe in nothing, he will not believe in nothing, but in ANYTHING. All the Children of God (and other cult) members showed the same thing: the absence of belief is only temporary. In science, bad data does not become dismissed, but becomes considered “good” data until it is forcefully overthrown.
The trend of adjustments in global tempertures and sea-level look to me like trend fulfillment as a result of insisting on finding “truth” in a mish-mash of partial truths, none of which represent multiple insights around the same, unchanging aspect of our universe.

u.k.(us)
August 17, 2013 6:07 pm

Nick Stokes says:
August 17, 2013 at 5:13 pm
An irony here is that skeptics have been nagging climate scientists to get the help of statisticians. So when a group of statisticians (at BEST) do get involved, what we hear here is “Climate science gets it wrong again”.
================
Ironing, not to mention nagging, has what to do with statistics ?

August 17, 2013 6:11 pm

The whole “averaging” process is problematic in itself and the anomalies are or could be meaningless, even over many, many measurements. The average of -10 and +10 and -5 and +5 are the same but the “climate” may be very different. If the average in one location changes as noted, there is no anomaly, but the climate changed. It also says nothing about the “mean” temperature since it the temperature were to stay at 20 degrees all day, and due to a wind shift drops to 10 for an hour (as happens in northern Canada coastal stations) then you get an “average” from the high low of 15 but a weighted average of 19.8 So how useful are these anomalies in telling us about climate? I have always wondered about this so I am taking this opportunity for this group to educate me. Thanks.

David Riser
August 17, 2013 6:17 pm

Nick,
Neither Muller or Rhode are statisticians, and based on reading their methodology I am certain that they designed best with the intention to prove CAGW. The methodology they are employing is sketchy at best. Please read http://www.scitechnol.com/GIGS/GIGS-1-103.php
and you will see what I mean.
v/r,
David Riser

Scott
August 17, 2013 6:18 pm

Background:
Being involved in supply chain planning of which one of the main inputs is a forecast based on historical usage. The forecast error is then used to help calculate the inventory safety stock requirement. The forecast error is the difference between the forecast and the actual result. Ie the anomaly.
Findings from supply chain management on calculating the forecast:
Using a moving average is better than nothing, however it is usually one of the worst of available statistical techniques for estimating a forecast except for some very stable systems. The drive to reduce the forecast error is one of the main aims of the supply chain manager to reduce the amount of safety stock needed in a system.
My points for discussion:
Given the weather/climate is far from a stable system why are we using a moving average to calculate anomalies from. Surely when you take the impacts of natural forcing’s on temperature over a long period of time a moving average is not a good measure to be using to calculate an anomaly, even the same month over a long period of time?
Therefore given the above, shouldn’t the error in the calculations be much larger given the simplistic measure as the datum? particularly the further back in time you go?

August 17, 2013 6:21 pm

“AlexS says:
August 17, 2013 at 5:12 pm
“In the end if you want to use absolute temperature willis then use them.”
They are not absolute temperatures because you don’t have a way measure them.”
The raw data represents itself as temperatures recorded in C
Using that data we estimate the field in C
If you want to call this something different than temperatures then humpty dumpty has a place on the wall next to him.

August 17, 2013 6:23 pm

“I also point out again, the the published uncertainties in global averaged air temperature never include the uncertainty due to systematic measurement error. ”
No pat those uncertainties do include the uncertainty due to all error sources including systematics. look at the nugget

August 17, 2013 6:28 pm

Steve, I’ve read the papers and assessed the method. Systematic measurement error is nowhere to be found.

geran
August 17, 2013 6:33 pm

Steven, could you please explain this to us underlings, lest we think you are “drunk blogging”?
“No pat those uncertainties do include the uncertainty due to all error sources including systematics. look at the nugget”
(Or, if I need to translate–Steefen oils you plea espalne to us usndresk , lest we think ysoru are dared belongings.)
Thanks

August 17, 2013 6:34 pm

“An irony here is that skeptics have been nagging climate scientists to get the help of statisticians. So when a group of statisticians (at BEST) do get involved, what we hear here is “Climate science gets it wrong again”.
Not only that but
1. We used a suggestion made a long time ago by willis: to scalpel
2. We used kriging as has been suggested many times on Climate audit
3. Our chief statistician is a friend of RomanM who worked with JeffID and he consulted
Roman and jeffs work. in fact we use a very similar approach in estimating the
entire field at once as opposed to having baseline periods
4. We tested the method using synthetic data as suggested many times on jeffids and climate audit and showed that the method was more accurate than GISS and CRU, as theory holds it should be.. and yes the uncertainty estimates held up.
And yet here again is pat frank repeating the same arguments he lost at lucia’s and jeffids
Its not ironic. its typical

Theo Goodwin
August 17, 2013 6:35 pm

Scott says:
August 17, 2013 at 6:18 pm
Good to hear from a pro. I want to add emphasis to your post. In your work, you make decisions about a system that is very well understood and that gives you feedback on a regular basis. By contrast, the BEST people or anyone working on the same data have little understanding of what their data points represent and receive no feedback at all.

August 17, 2013 6:47 pm

Steve: “And yet here again is pat frank repeating the same arguments he lost at lucia’s and jeffids. … Its not ironic. its typical
I carried that argument, Steve. If you don’t understand that after all that was written, then your view at best reflects incompetence.

geran
August 17, 2013 6:49 pm

Steven Mosher says:
August 17, 2013 at 6:34 pm
Not only that but
1. We used a suggestion made a long time ago by willis: to scalpel
2. We used kriging as has been suggested many times on Climate audit
3. Our chief statistician is a friend of RomanM who worked with JeffID and he consulted
Roman and jeffs work. in fact we use a very similar approach in estimating the
entire field at once as opposed to having baseline periods
4. We tested the method using synthetic data as suggested many times on jeffids and climate audit and showed that the method was more accurate than GISS and CRU, as theory holds it should be.. and yes the uncertainty estimates held up.
>>>>>
Not only that but—-you still didn’t get it right!
(Hint–Nah, it would not be accepted….)

u.k.(us)
August 17, 2013 6:53 pm

Steven Mosher says:
August 17, 2013 at 6:34 pm
“Its not ironic. its typical”
================
Care to enlighten us ?
We’re all ears, that is why we are here.

August 17, 2013 6:56 pm

geran.
when i pestered gavin for hansens code one thing he said stuck with me.
“you never be satisfied steve. you’ll just keep asking stats questions you could
research yourself, you’ll ask questions about the code, and you’ll never publish
anything”
so I told him. No. give me the code and the data. I know how to read. Ill never bug
you again. Ill try to improve your work to come up with a better answer, because better
anwers matter. Im not out to waste your time and Im not begging for an free education.
just free the data and code.
That said. I dont expect everyone to share my willingness to actually do the work. So I will give you a few pointers.
http://en.wikipedia.org/wiki/Variogram
http://www.scitechnol.com/GIGS/GIGS-1-103a.pdf
see the discussion about the unexplained variance when the correlation length goes to zero.