Monthly Averages, Anomalies, and Uncertainties

Guest Post by Willis Eschenbach

I have long suspected a theoretical error in the way that some climate scientists estimate the uncertainty in anomaly data. I think that I’ve found clear evidence of the error in the Berkeley Earth Surface Temperature data. I say “I think”, because as always, there certainly may be something I’ve overlooked.

Figure 1 shows their graph of the Berkeley Earth data in question. The underlying data, including error estimates, can be downloaded from here.

B.E.S.T. annual land surface average tempFigure 1. Monthly temperature anomaly data graph from Berkeley Earth. It shows their results (black) and other datasets. ORIGINAL CAPTION: Land temperature with 1- and 10-year running averages. The shaded regions are the one- and two-standard deviation uncertainties calculated including both statistical and spatial sampling errors. Prior land results from the other groups are also plotted. The NASA GISS record had a land mask applied; the HadCRU curve is the simple land average, not the hemispheric-weighted one. SOURCE

So let me see if I can explain the error I suspected. I think that the error involved in taking the anomalies is not included in their reported total errors. Here’s how the process of calculating an anomaly works.

First, you take the actual readings, month by month. Then you take the average for each month. Here’s an example, using the temperatures in Anchorage, Alaska from 1950 to 1980.

anchorage raw data plus avgFigure 2. Anchorage temperatures, along with monthly averages.

To calculate the anomalies, from each monthly data point you subtract that month’s average. These monthly averages, called the “climatology”, are shown in the top row of Figure 2. After the month’s averages are subtracted from the actual data, whatever is left over is the “anomaly”, the difference between the actual data and the monthly average. For example, in January 1951 (top left in Figure 2) the Anchorage temperature is minus 14.9 degrees. The average for the month of January is minus 10.2 degrees. Thus the anomaly for January 1951 is -4.7 degreesā€”that month is 4.7 degrees colder than the average January.

What I have suspected for a while is that the error in the climatology itself is erroneously not taken into account when calculating the total error for a given month’s anomaly. Each of the numbers in the top row of Figure 2, the monthly averages that make up the climatology, has an associated error. That error has to be carried forwards when you subtract the monthly averages from the observational data. The final result, the anomaly of minus 4.5 degrees, contains two distinct sources of error.

One is error associated with that individual January 1951 average, -14.7Ā°C. For example, the person taking the measurements may have consistently misread the thermometer, or the electronics might have drifted during that month.

The other source of error is the error in the monthly averages (the “climatology”) which are being subtracted from each value. Assuming the errors are independent, which of course may not be the case but is usually assumed, these two errors add “in quadrature”. This means that the final error is the square root of the sum of the squares of the errors.

One important corollary of this is that the final error estimate for a given month’s anomaly cannot be smaller than the error in the climatology for that month.

Now let me show you the Berkeley Earth results. To their credit, they have been very transparent and reported various details. Among the details in the data cited above are their estimate of the total, all-inclusive error for each month. And fortunately, their reported results also include the following information for each month:

estimated B.E.S.T. monthly average errorsFigure 3. Berkeley Earth estimated monthly land temperatures, along with their associated errors.

Since they are subtracting those values from each of the monthly temperatures to get the anomalies, the total Berkeley Earth monthly errors can never be smaller than those error values.

Here’s the problem. Figure 4 compares those monthly error values shown in Figure 3 to the actual reported total monthly errors for the 2012 monthly anomaly data from the dataset cited above:

error estimates in 2012 berkeley earth dataFigure 4. Error associated with the monthly average (light and dark blue) compared to the 2012 reported total error. All data from the Berkeley Earth datasetĀ linked above.

The light blue months are months where the reported error associated with the monthly average is larger than the reported 2012 monthly error ā€¦ I don’t see how that’s possible.

Where I first suspected the error (but have never been able to show it) is in the ocean data. The reported accuracy is far too great given the number of available observations, as I showed here. I suspect that the reason is that they have not carried forwards the error in the climatology, although that’s just a guess to try to explain the unbelievable reported errors in the ocean data.

Statistics gurus, what am I missing here? Has the Berkeley Earth analysis method somehow gotten around this roadblock? Am I misunderstanding their numbers? I’m self-taught in all this stuff and I’ve been wrong before, am I off the rails here? Always more to learn.

My best to all,

w.

0 0 votes
Article Rating
266 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Dr Burns
August 17, 2013 2:12 pm

Jones, CRU: “The global and hemispheric averages are now given to a precision of three decimal places” http://cdiac.ornl.gov/trends/temp/jonescru/jones.html details here http://www.metoffice.gov.uk/hadobs/hadcrut3/HadCRUT3_accepted.pdf
He assumes that more data increases accuracy. On this basis one could calculate very accurate global temperature by getting people around the world to hold their fingers in the air and take estimates.
Even recording accuracy was only +/- 0.5 deg C until recently.

geran
August 17, 2013 2:21 pm

First: You need “perfect” temp data.
Then, the anomalies are just basic arithmetic.
Second: See “First”.
Soooo Willis, you are definitely on the right track in questioning the uncertainties.

David Riser
August 17, 2013 2:23 pm

Willis,
Interesting post, made me read their “Robert Rohde, Richard A. Muller, et al. (2013) Berkeley Earth Temperature Averaging Process.” which is kind of disturbing in how they treated the data. After reading that I am pretty much going to ignore BEST from here on out. But relevant to your post is this note buried in their paper, which by the way is supposed to make the data more accurate through a mathematical process. Not sure you can do that without putting your own biases into the data but its what they have done.
“Note, however, that the uncertainty associated with the absolute temperature value is larger than the uncertainty associated with the changes, i.e. the anomalies. The increased error results from the large range of variations in bi from roughly 30Ā°C at the tropics to about -50Ā°C in Antarctica, as well as the rapid spatial changes associated with variations in surface elevation. For temperature differences, the C(x) term cancels (it doesnā€™t depend on time) and that leads to much smaller uncertainties for anomaly estimates than for the absolute temperatures.”
I hope this helps.
v/r,
David Riser

dearieme
August 17, 2013 2:23 pm

“precision” != “accuracy”
http://en.wikipedia.org/wiki/Accuracy_and_precision
Of course imprecision and inaccuracy may be more suitable concepts for Climate Science. As may dishonesty.

August 17, 2013 2:37 pm

The light blue months are months where the reported error associated with the monthly average is larger than the reported 2012 monthly error ā€¦ I donā€™t see how thatā€™s possible.

Nor do I. It looks unreasonable.
But have you asked BEST directly?
What do they say?

Steve Fitzpatrick
August 17, 2013 2:43 pm

Willis,
On one station you are correct. But there are lots of stations, the land averages tend to average out the uncertainties.

Jeff Condon
August 17, 2013 2:46 pm

Willis,
I’m not sure how they are reporting monthly error, but there is a serious logical problem in the published Berkley error reporting which uses an incorrectly modified version of the jackknife method for calculation.
http://noconsensus.wordpress.com/2011/11/20/problems-with-berkeley-weighted-jackknife-method/
http://noconsensus.wordpress.com/2011/11/01/more-best-confidence-interval-discussion/
http://noconsensus.wordpress.com/2011/10/30/overconfidence-error-in-best/
The authors have failed to either attempt a correction or even respond with any kind of discussion on the matter.
The error you report, is exactly the type of thing which could happen by their broken jackknife method.

Nick Stokes
August 17, 2013 2:52 pm

Willis,
The uncertainty of an anomaly was discussed in a recent thread at Climate Audit and one at Lucia’s. That concerned the Marcott estimates, and I calculated an exact value here.
The point that caused discussion there is that, yes, there is error in the climatology which affects uncertainty. It’s rather small, since it is the error in a 30 year mean being added to the error of a single month, so the extra error is prima facie (in that case) about 1/30 of the total. That’s a version of the reason why at least 30 years is used to calculate anomalies.
The wrinkle discussed in those threads is that within the anomaly base period, the random variation of the values and the anomaly average are not independent, and the error is actually reduced, but increased elsewhere.
But in another sense the error that you are talking about matters less, because it is the same number over all years. Put another way, often when you are talking about anomalies, you don’t care so much about the climatology (or the base years). You want to know whether a particular year was hotter than other years, or to calculate a trend. The error you are discussing won’t affect trends.

August 17, 2013 2:53 pm

The shaded regions are the one- and two-standard deviation uncertainties calculated including both statistical and spatial sampling errors.
It’s always bothered me that they can quantify spatial sampling uncertainty.
Tom Karl on the subject,
Long-term (50 to 100 years) and short-term (10 to 30 years) global and hemispheric trends of temperature have an inherent unknown error due to incomplete and nonrandom spatial sampling.
http://journals.ametsoc.org/doi/abs/10.1175/1520-0442(1994)007%3C1144%3AGAHTTU%3E2.0.CO%3B2

August 17, 2013 2:55 pm

“For example, in January 1951 (top left in Figure 2) the Anchorage temperature is minus 14.7 degrees.”
Am I really this dense… isn’t the actual figure in figure 2…. -14.9Ā° and therefore the anomoly is -4.7Ā° ==>> not that it matters in relation to the discussion but… just thought I’d mention it?
[No, I’m really that dense … fixed, thanks. -w.]

August 17, 2013 3:08 pm

You’re right, Willis. The total uncertainty in the anomalies must include the uncertainty in the climatology average and the uncertainty in each individual monthly average temperatures. The uncertainty in the monthly average temperature must include the uncertainty in the daily averages, and the uncertainty in the daily average must include the errors in the individual measurements. They all add in quadrature — square root of the sum of the squares.
The CRU, GISS, and BEST people assume that the temperature measurement errors are random and uncorrelated. This allows them to ignore individual measurement errors, because random errors diminish as 1/sqrt(N) in an average. When N is thousands of measurements per year, they happily decrement the uncertainty into unimportance.
However, sensor field calibration studies show large systematic air temperature measurement errors that cannot be decremented away. These errors do not appear in the analyses by CRU, GISS or BEST. It’s as though systematic error does not exist for these scientists.
I’ve published a thoroughly peer-reviewed paper on this exact problem: http://meteo.lcd.lu/globalwarming/Frank/uncertainty_in%20global_average_temperature_2010.pdf (869.8 KB)
I’ve also published on the total inadequacy of the CRU air temperature measurement error analysis: http://multi-science.metapress.com/content/t8x847248t411126/fulltext.pdf (1 MB)
These papers were originally submitted as a single manuscript to the Journal of Applied Meteorology and Climatology where, after three rounds of review taking a full year, the editor found a pretext to reject. His reasoning was that a careful field calibration experiment is no more than a single-site error, implying that instrumental calibration experiments reveal nothing about the accuracy of similar instruments deployed elsewhere. One might call his view cultural-relativistic science, asserting that basic work done in one region cannot be valid elsewhere.
I had an email conversation with Phil Brohan of CRU about these papers. He could not refute the analysis, but was dismissive anyway.

August 17, 2013 3:08 pm

Itā€™s rather small, since it is the error in a 30 year mean being added to the error of a single month, so the extra error is prima facie (in that case) about 1/30 of the total.

Let us say I have been driving an automobile for 30 years, & carefully recording the mileage & the amount of fuel used (but nothing else).
What is the formula to derive the error of my odometer from the mileage & fuel used?
If I fuel up twice as often (& thus record twice as many mileage & fuel entries), do my figures get more or less accurate? By what percentage?
How can I calculate my neighbour’s milage from this?

geran
August 17, 2013 3:12 pm

Nick Stokes says:
August 17, 2013 at 2:52 pm
The wrinkle discussed in those threads is that within the anomaly base period, the random variation of the values and the anomaly average are not independent, and the error is actually reduced, but increased elsewhere.
>>>>>
We all needed a good laugh, thanks, Nick.

chris y
August 17, 2013 3:18 pm

Nick Stokes- you say
“Itā€™s rather small, since it is the error in a 30 year mean being added to the error of a single month, so the extra error is prima facie (in that case) about 1/30 of the total.”
How’s that again? I think you mean 1/sqrt(30).
I hope.

Nick Stokes
August 17, 2013 3:31 pm

chris y says: August 17, 2013 at 3:18 pm
“Howā€™s that again? I think you mean 1/sqrt(30).”

No. As Pat Frank says, the errors add in quadrature. The deviation in variance is about 1/30. And then the error in the sqrt of that is not the sqrt of the error, but about 1/2. So arguably it is 1/60, but it depends on the variance ratio.

GlynnMhor
August 17, 2013 3:34 pm

So, if I understand this correctly, the anomaly is a comparison between the monthly value for each station and the average of all the stations for that month over whatever number of years?
Or is it the monthly value for each station compared to the average of that station for that month over whatever number of years?
In the latter case systematic error is present and cancels out (being equal in both sets of figures) in both the monthly and the monthly ‘average over years’ values, whereas in the former case systematic errors common to any one station would be random over all the stations (and thus cancelling with increasing number of samples), but still present for the data from any one station.

richard verney
August 17, 2013 3:35 pm

dearieme says:
August 17, 2013 at 2:23 pm
//////////////////////////////
And:
incompetence; and
statistical and mathmatical illiteracy

Nick Stokes
August 17, 2013 3:52 pm

Willis,
As far as I can tell, in Fig 3 you quote the error of individual monthly errors. But the error in climatology is the error in the mean of 30, which is much less. So I don’t think your Fig 4 works.

Brad
August 17, 2013 3:52 pm

“the final error estimate for a given monthā€™s anomaly cannot be smaller than the error in the climatology for that month.”
There’s a trick in robotics where we get around low accuracy sensors by using kalman filters. Basically it’s the math needed to reduce your error by combining multiple low accuracy sensors.
Now I see no evidence BEST is using a related trick but its relevant to the conversation because it lets you get a total error lower than the piece part error.

Nick Stokes
August 17, 2013 3:53 pm

Correction “quote the error” ==> “quote the value”

geran
August 17, 2013 4:14 pm

Nick Stokes says:
August 17, 2013 at 3:53 pm
Correction ā€œquote the errorā€ ==> ā€œquote the valueā€
>>>>>>
“Quoth the raven”
(Nick, get to the TRUTH, not spin, and you will not be such an easy target.)

August 17, 2013 4:21 pm

“But in another sense the error that you are talking about matters less, because it is the same number over all years. Put another way, often when you are talking about anomalies, you donā€™t care so much about the climatology (or the base years). You want to know whether a particular year was hotter than other years, or to calculate a trend. The error you are discussing wonā€™t affect trends.”
Precisely.
Also if people want to go back to absolute temperature with our series they can. We solve the field for temperature. In our mind we would never take anomalies except to compare with other series. Or we’d take the anomaly over the whole period not just thirty years and then adjust it accordingly to line up with other series. So, anomalies are really only important when you want to compare to other people ( like roy spencer ) or ( jim hansen) who only publish anomalies. Anomalies are also useful if you are calculating trends in monthly data.
In the end if you want to use absolute temperature willis then use them. But when you try to calculate a trend then you might want to remove seasonality.

Nick Stokes
August 17, 2013 4:43 pm

geran says: August 17, 2013 at 4:14 pm
“(Nick, get to the TRUTH, not spin, and you will not be such an easy target.)”

Well, geran, if you want to make an actual contribution, you could explain the TRUTH to us. About Monthly Averages, Anomalies, and Uncertainties, that is.

August 17, 2013 4:53 pm

Nick, the 1/30 uncertainty you cite is the average error in a single year of an average of 30 years. However, the uncertainty in the mean itself, the average value, is the single year errors added in quadrature, i.e., 30x the uncertainty you allow. I cover that point in the first of the two papers I linked above.
I also point out again, the the published uncertainties in global averaged air temperature never include the uncertainty due to systematic measurement error. That average annual uncertainty is easily (+/-)0.5 C, not including the uncertainty in the climatology mean (which is also not included in the published record).
All the standard published global air temperature records are a false-precision crock.

geran
August 17, 2013 4:56 pm

Nick, the TRUTH is the big picture–CO2 does not cause global warming. The attempt to distort the temp data does not contribute to the proper outcome.
Your choice, TRUTH or fiction?

August 17, 2013 5:00 pm

Nick is wrong, and so are you, Steve. The uncertainty in a mean is the root-sum-square of the errors of the individual entries going into the average. Nick is giving the average uncertainty in a single year, which is not the uncertainty in the mean.

August 17, 2013 5:07 pm

Since anomalies from different temperature ranges represent completely different values from an energy flux perspective, they cannot be compared, averaged, or trended. There is no physics argument I have ever seen that justifies doing so. So whatever mathematical errors their analysis may contain, it is sorta like pointing out to someone that they’ve put a band-aid on wrong while treating a severed limb.

A. Scott
August 17, 2013 5:10 pm

I agree with Nick …. I don’t know if he (or Willis) are right or wrong – but I do know denigrating comments absent supporting evidence do zero towards understanding the issue or finding answers.
Science should be about collaborative effort. And anytime you have knowledgeable folks willing to engage in discussion you should take advantage of it.

AlexS
August 17, 2013 5:12 pm

“In the end if you want to use absolute temperature willis then use them.”
They are not absolute temperatures because you don’t have a way measure them.

Nick Stokes
August 17, 2013 5:13 pm

Pat Frank says: August 17, 2013 at 4:53 pm
“Nick, the 1/30 uncertainty you cite is the average error in a single year of an average of 30 years. However, the uncertainty in the mean itself, the average value, is the single year errors added in quadrature, i.e., 30x the uncertainty you allow.”

The proposition that the standard error of the mean (of N iid random variables) is the individual error divided by sqrt(N-1) is ancient. What’s your alternative value? And yes, there are corrections for autocorrelation, but I don’t think that’s the point being made here.
An irony here is that skeptics have been nagging climate scientists to get the help of statisticians. So when a group of statisticians (at BEST) do get involved, what we hear here is “Climate science gets it wrong again”.

Jeff Condon
August 17, 2013 5:27 pm

Nick,
I wonder if you could address my critiques?

geran
August 17, 2013 5:39 pm

A. Scott says:
August 17, 2013 at 5:10 pm
Science should be about collaborative effort. And anytime you have knowledgeable folks willing to engage in discussion you should take advantage of it.
>>>>>
That is what WUWT is all about.
Welcome aboard.

August 17, 2013 5:45 pm

The errors I’m discussing aren’t iid, Nick. They’re systematic.
However, I did make a mistake: the uncertainty in the empirical mean of N values is sqrt{[sum-over-N-(errors)^2]/(N-1)}.
The uncertainty in any monthly anomaly of a 30-year climatology is the uncertainty of the mean plus the uncertainty in the monthly temperature added in quadrature, which is approximately 1.4x(mean uncertainty), not 1/30.
With a mean annual systematic station measurement uncertainty of ~(+/-)0.5 C, the approximate uncertainty in any annual anomaly is ~(+/-)0.7 C = (+/-)1-sigma, and there goes the pretext for alarm right out the inaccuracy window.

August 17, 2013 5:48 pm

Nick: “So when a group of statisticians (at BEST) do get involved, what we hear here is ā€œClimate science gets it wrong againā€.
When did statistics become science, Nick?

August 17, 2013 5:55 pm

Certainly, you are right here: an anomaly value of 0.10+/-0.18 is meaningless, but it is still put into the datastream.
The Argo data you question: as far as I can see the only way you can get the 0.001C values that are reported is if you take the data and use a “quadrature”. However, in order to use a quadrature to reduced error, I believe you have to be taking multiple readings of the same item using the same equipment in the same way. The Argo floats move, and there are 3500 of them. Each day they take readings at the same depth but of different water and different temperatures (even if off by a bit), and each one is mechanically different – same as all the land stations. The inherent error in each – as far as I can see – is the minimum of the instrumental reading at any given time. You cannot reduce the error estimate by the square root method because you are NOT bouncing around a steady state: all readings are not attempts to get to a constant truth.
The reduction in error here is like being at a firing range with two distant targets, one still and one moving. If you bang away at the still target, your grouping will eventually include the target center. The only variable – the influence on the “error” of your shot – is the shakiness of your hand. Now, try to hit a moving target. Here your variables are not just the shakiness of your hand, but your general targeting, wind conditions, elevation etc. Six shots at a moving target do not give you the same certainty of hitting the target as six shots at a stationary one.
The Argo floats and the land stations are dealing with a changing environment – a moving target. The reading of 14.5C +/- 0.5C means that the temperature could be 14.0C or 15.0C. The next reading of 13.5C +/- 0.5 means the temperature could be 13.0C or 14.0C. Note that the second reading has no application to the first reading. This principle applies to different stations also.
It is assumed that over multiple years temperatures will fluctuate similarly, so you try to read 14.5C and 13.5C at various times. True. So for those same attempts, multiple readings can be used to reduce the error for an average. But which ones? Did you try to re-read 14.5C or was it really 14.3C the second time?
If the observed parameter isn’t stable, then repeated measurements will not get you closer to the “true” value than the error of any individual measurement. With multiple stations READING DIFFERENT ITEMS, Argo or land, the combined error cannot be any better than the indiviudal station reading those same unstable item.
A statistical treatment of multiple stations in which you could say the trends of all are the same, and thus IMPOSE a trend on all, forcing individual stations to correct to that trend, could give you a lower error estimate of reality. But this would not be a “measured” feature, but an assumption-based calculation which might just as well reflect your imposed view as a feature of the environment.
The fact is that nature is messy and measurements are mostly crude. If what you seek to find is smaller than the tools at your disposal can handle, then you will not find nothing, you will find the culmination of the errors of your tools.
This is a fundamental fact of investigation: once you look, you will not find nothing but something.
In religious or spiritual circles it is a well-known phenomena and one which was firm enough for me to raise my kids in the Catholic Church: if you bring up a child to believe in nothing, he will not believe in nothing, but in ANYTHING. All the Children of God (and other cult) members showed the same thing: the absence of belief is only temporary. In science, bad data does not become dismissed, but becomes considered “good” data until it is forcefully overthrown.
The trend of adjustments in global tempertures and sea-level look to me like trend fulfillment as a result of insisting on finding “truth” in a mish-mash of partial truths, none of which represent multiple insights around the same, unchanging aspect of our universe.

u.k.(us)
August 17, 2013 6:07 pm

Nick Stokes says:
August 17, 2013 at 5:13 pm
An irony here is that skeptics have been nagging climate scientists to get the help of statisticians. So when a group of statisticians (at BEST) do get involved, what we hear here is ā€œClimate science gets it wrong againā€.
================
Ironing, not to mention nagging, has what to do with statistics ?

Wayne Delbeke
August 17, 2013 6:11 pm

The whole “averaging” process is problematic in itself and the anomalies are or could be meaningless, even over many, many measurements. The average of -10 and +10 and -5 and +5 are the same but the “climate” may be very different. If the average in one location changes as noted, there is no anomaly, but the climate changed. It also says nothing about the “mean” temperature since it the temperature were to stay at 20 degrees all day, and due to a wind shift drops to 10 for an hour (as happens in northern Canada coastal stations) then you get an “average” from the high low of 15 but a weighted average of 19.8 So how useful are these anomalies in telling us about climate? I have always wondered about this so I am taking this opportunity for this group to educate me. Thanks.

David Riser
August 17, 2013 6:17 pm

Nick,
Neither Muller or Rhode are statisticians, and based on reading their methodology I am certain that they designed best with the intention to prove CAGW. The methodology they are employing is sketchy at best. Please read http://www.scitechnol.com/GIGS/GIGS-1-103.php
and you will see what I mean.
v/r,
David Riser

Scott
August 17, 2013 6:18 pm

Background:
Being involved in supply chain planning of which one of the main inputs is a forecast based on historical usage. The forecast error is then used to help calculate the inventory safety stock requirement. The forecast error is the difference between the forecast and the actual result. Ie the anomaly.
Findings from supply chain management on calculating the forecast:
Using a moving average is better than nothing, however it is usually one of the worst of available statistical techniques for estimating a forecast except for some very stable systems. The drive to reduce the forecast error is one of the main aims of the supply chain manager to reduce the amount of safety stock needed in a system.
My points for discussion:
Given the weather/climate is far from a stable system why are we using a moving average to calculate anomalies from. Surely when you take the impacts of natural forcingā€™s on temperature over a long period of time a moving average is not a good measure to be using to calculate an anomaly, even the same month over a long period of time?
Therefore given the above, shouldnā€™t the error in the calculations be much larger given the simplistic measure as the datum? particularly the further back in time you go?

August 17, 2013 6:21 pm

“AlexS says:
August 17, 2013 at 5:12 pm
ā€œIn the end if you want to use absolute temperature willis then use them.ā€
They are not absolute temperatures because you donā€™t have a way measure them.”
The raw data represents itself as temperatures recorded in C
Using that data we estimate the field in C
If you want to call this something different than temperatures then humpty dumpty has a place on the wall next to him.

August 17, 2013 6:23 pm

“I also point out again, the the published uncertainties in global averaged air temperature never include the uncertainty due to systematic measurement error. ”
No pat those uncertainties do include the uncertainty due to all error sources including systematics. look at the nugget

August 17, 2013 6:28 pm

Steve, I’ve read the papers and assessed the method. Systematic measurement error is nowhere to be found.

geran
August 17, 2013 6:33 pm

Steven, could you please explain this to us underlings, lest we think you are “drunk blogging”?
“No pat those uncertainties do include the uncertainty due to all error sources including systematics. look at the nugget”
(Or, if I need to translate–Steefen oils you plea espalne to us usndresk , lest we think ysoru are dared belongings.)
Thanks

August 17, 2013 6:34 pm

“An irony here is that skeptics have been nagging climate scientists to get the help of statisticians. So when a group of statisticians (at BEST) do get involved, what we hear here is ā€œClimate science gets it wrong againā€.
Not only that but
1. We used a suggestion made a long time ago by willis: to scalpel
2. We used kriging as has been suggested many times on Climate audit
3. Our chief statistician is a friend of RomanM who worked with JeffID and he consulted
Roman and jeffs work. in fact we use a very similar approach in estimating the
entire field at once as opposed to having baseline periods
4. We tested the method using synthetic data as suggested many times on jeffids and climate audit and showed that the method was more accurate than GISS and CRU, as theory holds it should be.. and yes the uncertainty estimates held up.
And yet here again is pat frank repeating the same arguments he lost at lucia’s and jeffids
Its not ironic. its typical

Theo Goodwin
August 17, 2013 6:35 pm

Scott says:
August 17, 2013 at 6:18 pm
Good to hear from a pro. I want to add emphasis to your post. In your work, you make decisions about a system that is very well understood and that gives you feedback on a regular basis. By contrast, the BEST people or anyone working on the same data have little understanding of what their data points represent and receive no feedback at all.

August 17, 2013 6:47 pm

Steve: “And yet here again is pat frank repeating the same arguments he lost at luciaā€™s and jeffids. … Its not ironic. its typical
I carried that argument, Steve. If you don’t understand that after all that was written, then your view at best reflects incompetence.

geran
August 17, 2013 6:49 pm

Steven Mosher says:
August 17, 2013 at 6:34 pm
Not only that but
1. We used a suggestion made a long time ago by willis: to scalpel
2. We used kriging as has been suggested many times on Climate audit
3. Our chief statistician is a friend of RomanM who worked with JeffID and he consulted
Roman and jeffs work. in fact we use a very similar approach in estimating the
entire field at once as opposed to having baseline periods
4. We tested the method using synthetic data as suggested many times on jeffids and climate audit and showed that the method was more accurate than GISS and CRU, as theory holds it should be.. and yes the uncertainty estimates held up.
>>>>>
Not only that but—-you still didn’t get it right!
(Hint–Nah, it would not be accepted….)

u.k.(us)
August 17, 2013 6:53 pm

Steven Mosher says:
August 17, 2013 at 6:34 pm
“Its not ironic. its typical”
================
Care to enlighten us ?
We’re all ears, that is why we are here.

August 17, 2013 6:56 pm

geran.
when i pestered gavin for hansens code one thing he said stuck with me.
“you never be satisfied steve. you’ll just keep asking stats questions you could
research yourself, you’ll ask questions about the code, and you’ll never publish
anything”
so I told him. No. give me the code and the data. I know how to read. Ill never bug
you again. Ill try to improve your work to come up with a better answer, because better
anwers matter. Im not out to waste your time and Im not begging for an free education.
just free the data and code.
That said. I dont expect everyone to share my willingness to actually do the work. So I will give you a few pointers.
http://en.wikipedia.org/wiki/Variogram
http://www.scitechnol.com/GIGS/GIGS-1-103a.pdf
see the discussion about the unexplained variance when the correlation length goes to zero.

kuhnkat
August 17, 2013 6:58 pm

“If you want to call this something different than temperatures then humpty dumpty has a place on the wall next to him.”
Nice to know where to find you Moshpup!! 8>)

August 17, 2013 7:01 pm

I carried that argument, Steve. If you donā€™t understand that after all that was written, then your view at best reflects incompetence.
huh? Pat you lost. repeatedly.

Robert of Ottawa
August 17, 2013 7:01 pm

Your reasoning is correct. Total error cannot be smaller than measurement error.

August 17, 2013 7:05 pm

For those who’d like to evaluate Steve Mosher’s view of the debate, my first post at Jeffid’s on systematic measurement uncertainty in the global air temperature record is here, and the continuation is here.
The exchanges stand on their own, and I don’t intend to reignite the debate here. However, Steve has made a personal attack against me; my complete defense is the record.

BarryW
August 17, 2013 7:08 pm

In essence, an anomaly is just creating an offset from some (somewhat) arbitrary base line. No different than setting an offset on an oscilloscope. The question I have is related to the multiple baselines used. The data from each month’s average is converted into an anomaly relative to that months baseline. Each month’s baseline will have a different associated error, hence won’t there be be a potential misalignment between different months? Even worse, if the range used for a particular month is biased relative to the entire dataset for that month (say a cold series of winters), wouldn’t it induce a bias in the anomaly for that month relative to the other months?

August 17, 2013 7:09 pm

Steve: “huh? Pat you lost. repeatedly.
If you really believe that, Steve, then you didn’t understand the substantive content then and still don’t understand it today.

Jeff Cagle
August 17, 2013 7:10 pm

Willis,
I join you in wondering about uncertainties. However, I don’t think the uncertainty in the baseline monthly average is a source of error here.
Here’s why: Since we are looking for trends — mostly secular trends using linear regression — we are only concerned with the errors that affect the confidence interval of the slope.
But the baseline monthly average has an effect only on the confidence interval of the intercept of our anomaly graph. So that might affect some sensationalist headlines (“Temperatures are up 2.0 deg C since 1850”, when the true value might be 1.8 deg C), but it will not affect most of them, for the simple reason that most of the headlines focus on the purported effects of forcing, expressed in terms of a slope: “We expect a 0.2 deg C rise per decade for the next two decades.” Those remain unchanged by an epsilon in the monthly baseline.
In other words, we can safely accept the baseline monthlies as givens, and look for trends from there.

August 17, 2013 7:11 pm

“Given the weather/climate is far from a stable system why are we using a moving average to calculate anomalies from. Surely when you take the impacts of natural forcingā€™s on temperature over a long period of time a moving average is not a good measure to be using to calculate an anomaly, even the same month over a long period of time?”
we dont use moving averages to calculate anomalies from.
1. Take every temperature recording at x,y,z and time t
2. remove the climate ( geographically based) at every x,y,z,t
you are left with a random residual called the weather
T = C + W
the temperature at any given time/location is a function of the climate ( C) and the weather W
C is expressed as a function of geography only ( and time )
The residual ( W) is then interpolated using kriging.
So for every time and place you have two fields. a climate field and a weather field
Anomalies are not even a part of our approach. In the end we have a field that is in temperature. there is no base period. at every time step we have used all the data for that time step to solve the field at that time step. no base period. no anomalies.
After its all said and done we integrate the field. other folks take averages of stations. we dont.
we construct a field and then integrate the field. That gives you a time series in temperature ( C)
Then, because other folks want to compare their anomalies to our temperatures we provide anomaly data. And we give you the climatology ( unlike hansen or spenser ) so that you can go back to the field in C

August 17, 2013 7:14 pm

BarryW, global temperature anomalies are typically referenced against a 30-year mean. GISS uses 1951-1980, CRU has typically used 1961-1990, e.g.

geran
August 17, 2013 7:16 pm

Steven Mosher says:
August 17, 2013 at 6:56 pm
Steve, thanks for the direct response. I sincerely respect what you have done for climate science. I only criticize you because I don’t want you to fail your calling. You see thru most of the false science, and you are repelled. You are to be praised for that.
Don’t spend your time putting down us “extremists” (aka “the “D” word). Just do your task to bring out the TRUTH.

Geoff Withnell
August 17, 2013 7:20 pm

From the NIST Engineering Statistics Handbook “Accuracy is a qualitative term referring to whether there is agreement between a measurement made on an object and its true (target or reference) value.” Since we do not have a reference value, and the true value is what we are trying to determine, accuracy statements are essentially meaningless in this case. Individual measurements can have uncertainty associated with them from the precision (repeatability/reproducibility) and the uncertainty of aggregate quantities such as averages calculated. But since the “true” or “reference” value is essentially unknowable, accuracy as such is not a useful term.

David Riser
August 17, 2013 7:24 pm

Steve,
The issue I have with your methods directly relates to the work that Anthony did on siting issues with temperature stations. Your methods follow NOAA and NASA bias (excerpt from your methodology: In most practical uses of Kriging it is necessary to estimate or approximate the covariance matrix in equation (9) based on the available data [1,2]. NOAA also requires the covariance matrix for their optimal interpolation method. We will adopt an approach to estimating the covariance matrix that preserves the natural spatial considerations provided by Kriging, but also shares characteristics with the local averaging approach adopted by NASA GISS [3,4]. If the variance of the underlying field changes slowly as a function of location, then the covariance function can be replaced with the correlation function, š‘…š‘…ō€µ«š‘Žš‘Žāƒ‘,š‘š‘ōˆ¬āƒ‘ō€µÆ, which leads to the formulation that:”
In NASA’s data they have made an assumption that low temps are outliers which from reading the appendix BEST did as well. Because the extra weighting and the scalpel effects you essentially increased the apparent UHI effect increasing systematic error instead of reducing it. I am still not sold on the idea that statistics can magically remove error. If you went back through the data and did a recalc using the highest temps as an outlier based on the idea of UHI being a serious issue than perhaps your graph would turn out differently but I am still not sure that it would be an accurate representation of šœƒ(š‘”).
I find it interesting that the trend for BEST compared to the trend for the sat data over land is much steeper in slope. I am pretty sure that the SAT trend has fewer errors in it than any land based multi station system.
http://www.woodfortrees.org/plot/best-lower/from:1980/to:2010/plot/best-upper/from:1980/to:2010/plot/uah-land/from:1980/to:2010/plot/rss-land/from:1980/to:2010/plot/best/from:1980/to:2010/trend/plot/rss-land/from:1980/to:2010/trend/plot/uah-land/from:1980/to:2010/trend

dp
August 17, 2013 7:26 pm

Is it not the case that the time series used to calculate the average typically is older by a good margin than the data set under analysis? This is why we see modern anomalies compared to 1979-2000 data, for example.

u.k.(us)
August 17, 2013 7:33 pm

Steven Mosher says:
August 17, 2013 at 7:11 pm
“Then, because other folks want to compare their anomalies to our temperatures we provide anomaly data. And we give you the climatology ( unlike hansen or spenser ) so that you can go back to the field in C”
===============
Then other folks use it as an excuse to further their agenda, misguided as it might be.

August 17, 2013 7:37 pm

Steve: “T = C + W … the temperature at any given time/location is a function of the climate ( C) and the weather W
That’s the same methodological mistake made at CRU, explicitly revealed by Phil Brohan, & co, 2006.
Measured temperature is what’s being recorded. “Measured” is the critical adverb here.
In your formalism, Steve, actual measured temperature = T_m_i = C + W_i + e_i, where “i” is the given measurement, and e_i is the error in that measurement. Most of that error is the systematic measurement error of the temperature sensor itself. CRU completely ignores that error, and so does everyone else who has published a global average.
My own suspicion is that people ignore systematic measurement error because it’s large and it cannot be recovered from the surface air record for most of 1880-2013. Centennial systematic error could be estimated by rebuilding early temperature sensors and calibrating them against a high-accuracy standard, such as a two-point calibrated RM Young aspirated PRT probe (accurate to ~(+/-)0.1 C). To get a good estimate of error and accuracy one would need a number of each such sensor scattered globally about various climates, with data collection over at least 5 years. But that’s real work, isn’t it. And some important people may not like the answer.

Admin
August 17, 2013 7:47 pm

My understanding is you can’t treat a temperature series as independent, because temperatures are “sticky” – a cold year is much more likely to follow a previous cold year. So each point on say your monthly series is not truly independent from measurements of the same month in adjacent years. I don’t know what this does to the error calculation.

TalentKeyHole Mole
August 17, 2013 8:34 pm

Hello,
A very good discussion. Lots to learn and digest.
Yes. I asked a question earlier about the probability of ‘a’ particular temperature ‘occurring’ at ‘a’ particular thermometer, whether on — not actually ‘on’ but usually about 1 meter to 2 meter above the ground or in — within a meter below the ocean surface or to due to storminess was more near 5 meter below the ocean surface because of ocean wave dynamics on sea-going mercury thermometer.
I do hope that in the future, yes I really do, those who are ‘reading’ the mercury thermometer — such as “on” land — do employ an accurate chronometer in order to know the ‘real’ time, and longitude a very important number, of the “reading” of the mercury thermometer and do awake in the early morning hours without effects of lack of sleep and ‘associated’ affects of alcohol over consumption — sometime long ago referred to as “consumption” as the papers of the day would read “death by consumption” said papers having been written in the US ‘Prohibition Era.’
Cheers

August 17, 2013 8:50 pm

Pat Frank said @ August 17, 2013 at 7:37 pm

My own suspicion is that people ignore systematic measurement error because itā€™s large and it cannot be recovered from the surface air record for most of 1880-2013. Centennial systematic error could be estimated by rebuilding early temperature sensors and calibrating them against a high-accuracy standard, such as a two-point calibrated RM Young aspirated PRT probe (accurate to ~(+/-)0.1 C). To get a good estimate of error and accuracy one would need a number of each such sensor scattered globally about various climates, with data collection over at least 5 years. But thatā€™s real work, isnā€™t it. And some important people may not like the answer.

Bingo!

Third Party
August 17, 2013 8:57 pm

Whatta bout the the handling of the 1/4 day/year shift of the months vs the solar input. There should be some 4 year period signal in the data that needs to be handled correctly.

John Andrews
August 17, 2013 9:12 pm

What no one has discussed is the error on climate projections.

August 17, 2013 9:26 pm

John Andrews, I have a manuscript about that submitted to a professional journal. Suffice to say now that a straight-forward analysis shows the projection uncertainty increases far faster than the projected global surface air temperature. Climate models have no predictive value.

August 17, 2013 9:28 pm

Dear Willis, I am afraid you are confused. What’s calculated at the end is the average temperature and the error of the *average* of course *is* smaller than the error of the individual averaged quantities simply because the average also contains 1/N, the division by the number of entries.
So while the error of the sum grows like sqrt(N) if you correctly add (let us assume) comparable errors in quadrature, the (statistical) error of the average goes down like 1/sqrt(N).

August 17, 2013 9:34 pm

Willis, speaking to you as a Ph.D level econometrician, BEST’ mission was hopeless and their elegant methods a waste of time. Therefore your mission to deconstruct any error therein is very difficult in the Japanese sense. Don’t fall for faux sceptic Muller’s hype.
Recall AW has shown many land records are contaminated by paint, siting, and worse- by an amount greater than two century’s worth of anomalies. Recall that sea records were less than sketchy prior to the satellite era (buckets, engine inlets, trade routes,…), yet oceans comprise 79% of Earths surface and a heat sink much greater than land. All the homogenizations in the world of bad data can only result in bad pseudodata. As shown many times by folks like Goddard, who documenting upward GISS homogenization biases over time.
Any BEST re-interpretation of bad/ biased data can only produce bad/ biased results, no matter how valid the fancy methods used. GIGO applies to Berkeley, to Muller, and to data dreck.
Good Global temp data came only with the sat era since 1979, UAH or RSS interpreted.
Trying to interpret others interpretations of bad data is rather like interpreting the interpreter of a Delphic oracle. The true meaning of a steaming pile of entrails is… (self snip).

Crispin in Waterloo
August 17, 2013 9:47 pm

Hmmm… Where to enter the mix.
As Geoff says, precision is not accuracy. The target shooting example was very useful. Let’s use two examples. First fix the rifle on a permanent stand. Shoot 6 times at the target. The grouping of the hits is a measure of the precision of the rifle.
Now have a person fire the rifle at a fixed target. The grouping of the hits is a combination of the precision of the rifle and the precision of the shooter.
The nearness of the centre of the group of hits to the bullseye is the accuracy of the shooter.
A thermometer may be inaccurate (or not) at different temperatures. It may be an imprecise instrument with poor repeatability, or not. It may be read incorrectly.
These three problems persist in all temperature records.
Because the thermometers are not located in the same place there is no way to increase any particular readings precision or accuracy. Because all readings are independent, grouping them can’t yield any answer that is more precise. It would be like claiming the accuracy of 1000 shooters is greater if we average the position of the centers of all the groups of hits and that the calculated result is more precise than the rifle. All systematic and experimental errors accumulate and are carried forward.

AlexS
August 17, 2013 9:51 pm

“The raw data represents itself as temperatures recorded in C
Using that data we estimate the field in C
If you want to call this something different than temperatures then humpty dumpty has a place on the wall next to him.”
The raw data is the problem , you don’t have enough, neither in quantity, location and time(history) for the small differences.

u.k.(us)
August 17, 2013 9:55 pm

Crispin in Waterloo says:
August 17, 2013 at 9:47 pm
Hmmmā€¦ Where to enter the mix.
==============
You just opened a Pandora’s box, of bench shooters šŸ™‚

Crispin in Waterloo
August 17, 2013 10:03 pm

Lubos, is this true only in cases where the measurements are of the same thing?
“So while the error of the sum grows like sqrt(N) if you correctly add (let us assume) comparable errors in quadrature, the (statistical) error of the average goes down like 1/sqrt(N).”
I think there is confusion about the nature of the raw data. I think the above applies when each recording station has made measurements at all stations and the data sets are averaged. I don’t think it applies to a set of individual data sets, one from each locality. No calculated result made from imprecise data, where each has measured a different thing, can have a greater precision than the raw data.

John Norris
August 17, 2013 10:08 pm

If I may try to break it down a little different, there are three high level sources of error you are discussing. 1) temperature sensor error, which I think you can include siting errors like UHI, 2) human reading/writing documenting error, and then 3) climatalogical error, where a specific location may have just had a particularly warm or cold month or year. I donā€™t see #3 as real error; that is just actual variation in the data. If you magically fixed #1 and #2, #3 would be perfectly measured and reported data, despite it not being average. Thatā€™s not error, thatā€™s good data. Why would perfect data that is not average cause any uncertainty?
Sounds like fair game to me to average out #3. However #1 and #2 you canā€™t average out. That is error, and that would be cheating. The difficulty of course is establishing how much of that result is #1 and #2, and how much is #3.

August 17, 2013 10:09 pm

LuboÅ”, descent by 1/sqrt(N) is true only when the error is random.

August 17, 2013 10:21 pm

John Norris, shield irradiance, ground albedo (including winter snow), and wind speed all impact the measurement accuracy of otherwise well-functioning air temperature sensors. Calibration experiments under ideal site conditions show this. Even a PRT sensor inside a CRS (Stevenson) screen show ~(+/-)0.5 C average measurement error about an average bias, and that inaccuracy is not randomly distributed.

Rik
August 17, 2013 10:47 pm

Willis and all, shurely the main problem is not the error of the average. This must be small due to the large sampling. The main problem is that the statistics only catch random errors and we have no reason whatsoever to think that they are random. Many things have changed that is not random but systematic. Sea routes, built and paved environment, the increase and then decrease of land thermometers, cultivation…and on and on. These are not random errors! McKittrick showed a correlation to industrialization, pielke showed a correlation to land use. Correlations that shouldn’t be there if errors were random!

markx
August 17, 2013 11:02 pm

Doug Proctor says: August 17, 2013 at 5:55 pm
“……in order to use a quadrature to reduced error, I believe you have to be taking multiple readings of the same item using the same equipment in the same way. The Argo floats move, and there are 3500 of them. Each day they take readings at the same depth but of different water and different temperatures (even if off by a bit), and each one is mechanically different ā€“ same as all the land stations…..”
Nail hit right on head right here. These are NOT repeated measures of the same thing.

Sleepalot
August 17, 2013 11:16 pm

Mosher, I’m still waiting for your list of 39,000 stations for 1880 – or any particular year of your choosing for that matter.

Gail Combs
August 17, 2013 11:36 pm

Pat Frank says: @ August 17, 2013 at 7:37 pm
Measured temperature is whatā€™s being recorded. ā€œMeasuredā€ is the critical adverb here.
In your formalism, Steve, actual measured temperature = T_m_i = C + W_i + e_i, where ā€œiā€ is the given measurement, and e_i is the error in that measurement. Most of that error is the systematic measurement error of the temperature sensor itself. CRU completely ignores that error, and so does everyone else who has published a global average.
My own suspicion is that people ignore systematic measurement error because itā€™s large and it cannot be recovered from the surface air record for most of 1880-2013. Centennial systematic error could be estimated by rebuilding early temperature sensors and calibrating them against a high-accuracy standard….
>>>>>>>>>>>>>>>>>>>>>>>>>
Those are my thoughts exactly.
Also the statistics used are for repeated samples of the same thing where as the actual sample size is ONE. Temperature is one measurement at one location on earth during one point in time. It is not repeated measurements of the same location simultaneously with matched instruments. Heck it is often not even the same actual location or in the case of the ‘Official global temperature’ the same number of thermometers.
The ā€˜Station drop outā€™ problem
Thermometer Zombie Walk
From what I can see the Julian date in successive years is being treated as if it were repeat measurements of the same place with the same equipment at the same time of day (they don’t get that repeated either.) The assumption being that July 4th in podunk midworst should have the exact same temperature in 2013 as it did in 1931. However from the work that Anthony has done we know it does not. Aside from the random variables; clouds rain/fog/snow, cold fronts/warm fronts, droughts or whatever there are the systematic changes.
The cow pasture became a plowed field and then becomes a village which grows into a city. Someone like Asa Sheldon comes along with his oxen and men removes nearby Pemberton Hill and dumps it into the Back Bay salt marsh changing the microclimate. The thermometer is broken and replaced. The observer retires and is replaced. The stevenson screen is freshly white washed and then weathers, It is replaced by a latex or oil paint painted screen or a vinyl screen. Trees grow up near the weather station blocking the wind and are then cut down. The gravel parking lot is paved and air conditioning is added to the nearby building…
ALL of these are systematic and do not produce random errors. I know from watching my local rural temperature (now a local airport) that it is ‘Adjusted’ up by as much as 3F the next day after ‘homogenization’ to give the ‘Official Reading’ so there is that sort of ‘error’ too.
When I think of these sources of error and then look at the ‘Precision’ of the reported numbers fed to us as ‘Accurate’ estimates of global temperature all I can do is laugh. You might if you are real lucky get an error of +/-1Ā°C.
I am not a statistician though I have had some training in statistics. I have spent decades trying to reduce error in laboratory measurements of production batches or continuous processes in a QC lab setting.

Paul Vaughan
August 17, 2013 11:50 pm

As surely as you need to draw another breath to survive, Statisticians will always make the assumptions upon which their enterprise is theoretically founded.
The data are useful for exploratory purposes, but the whole enterprise of climate stat inference rests upon contextually fundamentally indefensible assumptions, notably random iid. The confidence intervals cannot sensibly be interpreted as meaning what pure theory would suggest they should mean.
Our understanding of climate is at best in the exploratory stage. It is only by understanding natural climate at a profoundly deeper level that bad assumptions can be weeded out of fundamentally corrupted attempts at climate stat inference.

Gail Combs
August 18, 2013 12:03 am

Here is another quicky analogy. If I am looking at widgets from a molding machine, and measure 200 samples on the same day from the same cavity I will have a standard deviation of A. If the machine has 5 cavities and I take a random sample for that machine the standard deviation B will be greater than A. If there are 5 machines in the factory (all using the same raw material) a random sample from the factory will have an standard deviation of C that is greater than B. If you then include all the machines in the company in 10 factories scattered across the country you get an standard deviation D which in general is much greater than C which is why customers will often designate product from one factory only as ‘Approved’
Remember this is with everyone in the company doing their darnest to try and stay within tolerance. In some cases all the product from one cavity will be rejected as out of tolerance.
As was said about Scott ‘s (August 17, 2013 at 6:18 pm) example, in industry we get feedback, with the temperature data you get potluck.

AndyG55
August 18, 2013 1:10 am

I’m with Pat on this one.
The error reduction of 1/sqrt(N) ONLY applies when you are measuring similar objects that are meant to be UNCHANGING.
The systematic error in a changing system INCREASES in the long term by about sqrt(2).
Now as the systematic error (or absolute error) of a thermometer graduated in 1C is 0.5C, then the systematic error of the average of a changing quantity tends towards 0.7C

Stephen Rasey
August 18, 2013 1:13 am

Willis, I’m with you. But I think the problem is even more basic. This is a link to a lengthy comment from Aug. 12, 2012 that identifies three sources of error in the calculation of the anomalies:

The process has been to first convert the Tmin and Tmax into a daily Tave, having no uncertainty associated with it. Everything from then on uses the Tave and the only uncertainty that is use past that point is what derives from the difference of Tave ā€“ Tbase to get an anomaly.
But if we go back to what we actually measured, the Tmin and Tmax, the uncertainty in the system really is large which must be factored into estimates of uncertainty in the slope of the calculated temperature trends.

We do not measure Tave. Tave is a calculation of two components we do measure: Tmin and Tmax. If Tmin and Tmax are separated by just 10 deg C, then the mean std error of a month’s Tave, 30 readings of Tmin and Tmax, must be at least (10/2)/sqrt(30) = or about 0.9 deg C. This is the error of the monthly anomaly against a known base.
But the base, too, has uncertainty, not from 30 years, of 30 Tave for each month, but 30*30 Tmins and Tmaxs. The 30 year average Base temperature each month should have a mean std error of at least 10/2/sqrt(30*30) or 5/30 or 0.16 deg C.
To get the anomally, you subtract Tave – Tbase, but Tave has an uncertainty of 0.9 and Tbase has an uncertainty of 0.16. Uncertainties add via root sum of squares, so the uncertainty of the anomally is 0.914. So the uncertainty of the Tbase can be neglected because the uncertainty in each month’s Tave dominates the result.
BEST compounds this error by slicing long temperature records into shorter ones, ignoring the absolute temperature readings and working only with short slopes. I have repeatedly objected to this from a Fourier Analysis, Information Theory approach, with its loss of low frequency content. The scalpel may be removing the removing the essential recalibration and keeping the instrument drift as real data.
But let’s just look at the implication of BEST slicing records from an uncertainty analysis point of view. The uncertainty of the slope of a linear regression is ( I think) proportional to 1/(square of time length of the series). Cut a 400 month time series into two 200 month series, and the uncertainty in the 200 month slope will be 4 times the uncertainty of the 400 month series. Sure, you now have two records and optimistically you might reduce uncertainty by sqrt(2). Slicing the record was a loosing game for the uncertainty in slope is at best increased by 2*sqrt(2) or almost a factor of 3.

Stephen Rasey
August 18, 2013 1:22 am

Correction:
If Tmin and Tmax are separated by just 10 deg C, then the mean std error of a monthā€™s Tave, 30 readings of Tmin and Tmax (each), must be at least (10/2)/sqrt(60) = or about 0.65 deg C.

August 18, 2013 1:49 am

Take it for what it is worth Willis, but I think you are right
You say that the final error estimate for a given monthā€™s anomaly cannot be smaller than the error in the climatology for that month.
This error seems to be the computed deviation from normal distribution from the monthly datasets.
There are of course other contributing factors like the daily reading error and the monthly averaging of the daily data, but they will only contribute to make the final error larger, not smaller, so I think you are correct.
I used statistics actively as a research scientist back in the 1990s so I know what I am talking about.
So if you have calculated the numbers right, I have not checked that, I think you are right.

Man Bearpig
August 18, 2013 1:50 am

There is something that has been bothering me about ‘error’ in statistics. in particular when standard deviations are used on ‘average’ calculations. I dont mean the average of a dataset, I mean an average of the averages of a number of datasets.
So you have a group of averages that are then averaged out again, for example the Artic Sea Ice Extent – On one of the charts you see the +/- 2 Standard Deviations. But what is the SD calculated from. Is it the total individual data points or is it the average of the data points ? Or daily average temperatures (h+l)/2
If I give an analogy of why this is not right it may help understand my question.
If each year the height of each schoolchild on their 3rd birthday is taken at each school in a county and the average is calculated (which would be around 37 inches) .. This would include those that were say 33 inches to 42 inches which would (probably) be in the normal deviation … However, when the averages are sent to a central state center and the standard deviation is calculated on the averages, then the distribution will be much MUCH smaller and those at the extremes of the distribution would be considered abnormally short or tall. This is probably not how height statistics are calculated, but for Artic Sea Ice and temperature I can not see how they would discern between a (daily high + daily low)/2 is not an average. Also, do they phisically go out and measure every single ice flow in the Arctic ?
I think what would be better in temperatures would be to take the daily highs and lows separately then do the statistics on those, but to collate them all together come up with an average and SD would not be truly representative of the statistics.
If I am wrong in these assumption, then i have leant something, but if i am right I hope someone else learns something.

Bert Walker
August 18, 2013 2:02 am

In 2011 W. Briggs did a highly critical review of the ā€œBerkeley Earth temperature averaging process.ā€ at: http://wmbriggs.com/blog/?p=4530
some of briggs points on using smoothed data as input: http://wmbriggs.com/blog/?p=735
and http://wmbriggs.com/blog/?p=195

Gail Combs
August 18, 2013 2:23 am

SIGHhhh….
This seems to be a discussion where the same word is used to mean two entirely different things (So what else is new in the Post Normal World.)
To those of us in industry the word ‘Error’ has a very specific meaning. It is the deviation from the ‘correct’ or ‘true’ value.

Definition:
A statistical error is the (unknown) difference between the retained value and the true value.
Context:
It is immediately associated with accuracy since accuracy is used to mean “the inverse of the total error, including bias and variance” (Kish, Survey Sampling, 1965). The larger the error, the lower the accuracy.
http://stats.oecd.org/glossary/detail.asp?ID=5058

Mosher on the other hand is talking about a different definition of error. Kriging is a method of Interpolation when there is not enough data. Within the concept of Kriging there is the nugget effect.

..The nugget effect can be attributed to measurement errors or spatial sources of variation at distances smaller than the sampling interval or both. Measurement error occurs because of the error inherent in measuring devices…
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//0031000000mq000000

That is the ‘Error” Mosher is talking about.
Kriging comes out of gold mining and other geological work where the geologist is trying to make the best guess with insufficient data. The ‘Error” in Kriging is the error of that guess, and really has nothing to do with the error of the data which Pat Frank and I and others are talking about.
It also seems to be ‘Controversial” to say the least. According to J W Merks ( Vice President, Quality Control Services, with the SGS Organization, a worldwide network of inspection companies that acts as referee between international trading partners) It is a fraud for producing data when you do not have any. see Geostatisics: From Human Error to Scientific Fraud http://www.geostatscam.com/
Also see the wikipedia talk section on Kriging: http://en.wikipedia.org/wiki/Talk:Kriging#Further_revision_proposal_by_Scheidtm
Hope that helps Willis.

AndyG55
August 18, 2013 2:57 am

I strongly suggest that people read this link.
http://www.rit.edu/~w-uphysi/uncertainties/Uncertaintiespart2.html
in particular part a)
We are making individual readings of a changing quantity.
The error in the average will equal the average of the error. and if we take say Tmax and Tmin, on a thermometer graduated in degrees C, then the error each time is +/- 0.5C….. No standard deviation involved .
so the error in Tavg (daily) is also 0.5C because we add, then divide by 2
so the error in Tavg (monthly) is also 0.5C
There is NO contraction or decrease of the error when averaging changing quantities.

johnmarshall
August 18, 2013 2:58 am

So you average a month then do a difference between that average and the data to show the anomaly. An anomaly of what? The average is 0ne month not the average temperature of the same month in a climate cycle, roughly 30 years. Meaningless really. You are assuming that your month of choice is truly average but none of them are since every month of a cycle will be slightly different.
This is not a method I would use since it assumes too much and gets the guessing index up.

August 18, 2013 3:03 am

Dear Crispin and Pat Frank,
yes, 1/sqrt(N) is the behavior of the statistical error, e.g. a part of the error from individual errors that are independent for each measurement.
Then there can be a systematic error that is “shared” by the quantities that are being averaged. The systematic error of the average of similar pieces of data doesn’t decrease with N but it is the same as it is for the individual entries – it can never be greater.

Dodgy Geezer
August 18, 2013 3:21 am

A. Scott says:

I agree with Nick ā€¦. I donā€™t know if he (or Willis) are right or wrong ā€“ but I do know denigrating comments absent supporting evidence do zero towards understanding the issue or finding answers.
Science should be about collaborative effort. And anytime you have knowledgeable folks willing to engage in discussion you should take advantage of it.

Indeed. We should be encouraging Nick and helping him critique this piece as much as possible. This is exactly how the scientific process works – you make a hypothesis and then get people to do their damnedest to find a flaw. Someone who disbelieves in your hypothesis is FAR better at that than any number of supporters…

Colonial
August 18, 2013 3:39 am

On August 17, 2013 at 10:09 PM, Pat Frank wrote:
LuboÅ”, descent by 1/sqrt(N) is true only when the error is random.
It’s my recollection that the condition for “descent by 1/sqrt(N)” is not that the error be random, but that the error be normally distributed. Given what we know about the propensities for readings to be integers or mid-range decimals (i.e., not xx.1 or xx.9), I find it hard to believe that the error is normally distributed. If the error isn’t normally distributed, “descent by 1/sqrt(N)” doesn’t apply, and the actual error will be larger.

gutowski
August 18, 2013 3:40 am
August 18, 2013 4:08 am

I see, one variable(T) and its anomalies are climate?

dmacleo
August 18, 2013 4:18 am

I’m not qualified to speak on the actual data/topic but wanted to say I am glad there are people always examining the data.
thanks willis.

AndyL
August 18, 2013 4:23 am

Great discussion.
Would it be possible to design a worked example to test out what everyone is saying? Just specifying it would probably help clarify what people mean by different types of error and how they are accounted for.

Nick Stokes
August 18, 2013 4:23 am

Willis Eschenbach says: August 18, 2013 at 12:39 am
“Itā€™s not clear what you mean by ā€œrather smallā€. I also donā€™t understand the part about 1/30 of the total. Take another look at the data in Figure 2. Each individual anomaly at any time is calculated by taking the observations (with an associated error) and subtracting from them the climatology (again with an associated error). The month-by-month standard error of the mean for the 30-year reference period ranges from 0.15 to 0.72Ā°C, with an average of 0.41Ā°C, without adjusting for autocorrelation. Is that ā€œrather smallā€? Seems large to me.”

I mean small relative to the month to month variation. If you calculate a global or regional trend, say, then the error you will associate with that derives ultimately from the variance of the monthly readings about the clinatology. Anomalies have greater variance, because you subtract a 30-year mean. But that mean has a variance smaller by a factor of 30 than the individual errors, so the additional variance makes that 3.3% difference, approx.
But I see now that the figures that you have quoted in Fig 3 seem to be a measurement error. The climatology error is indeed quite large compared to those. So…
“If (as you say) I want to know whether January 2012 was hotter than February 2011, I need to know the true error of the anomalies. Otherwise, if the two months come up half a degree different, is that significant or not? I canā€™t say without knowing the total error of the anomalies.”
For the individual station, you don’t need anomalies at all for that, but adding any number, if it’s the same for both, won’t change the difference, or your uncertainty about it. If you’re aggregating anomalies in a global or regional average, the same is still true. The climatology has sampling error, but it’s always the same actual number added to or subtracted from each.
But coming back to Fig 3, that seems to be a weather measurement error (instrumental etc). You go through some arithmetic procedure to compute the anomaly, and they have calculated how the measurement error aggregates. It’s the error you would see if you could go back and re-measure exactly the same weather subject to instrument variation.
The anomaly error is a sampling error, which is of a different kind. It’s an error that would be reflected if you went back and measured a different set of weather, under the same climate conditions. So it is relevant whenever you want to express something that is a measure of climate, like temperature trend. Then what matters is how large it is compared to the other things that are making it hard to deduce climate from weather. That is, mainly, month to month variation.
“It also comes into play when comparing say the changes in the spring, summer, fall, and winter temperatures. We need to know the errors to understand the significance of what we find.”
Yes,you do. But now you’re thinking climate. And your uncertainty will be dominated by the monthly variation in temperature. Climatology error will be small relative to that, as will be instrumental uncertainty.

August 18, 2013 4:29 am

When going back to older data, not only is there an error associated with the reading, there is an addtional error associated with any calculated average. The averages where possibly calculated differently in different countries, but they did not have as many readings as you would today. So the averages will sometimes contain an additional error coming from the agreed way to approximate them.

Jeff Condon
August 18, 2013 4:30 am

Pat Frank,
“I carried that argument, Steve. If you donā€™t understand that after all that was written, then your view at best reflects incompetence.”
Without reopening a sore spot, I don’t remember it that way.
My opinion is that the BEST CI method is likely to be close but has the potential for big errors depending on the distribution of values.

Nick Stokes
August 18, 2013 4:34 am

Jeff Condon says: August 17, 2013 at 5:27 pm
Jeff, sorry about the slow response on this. It’s been a busy day here. I didn’t follow your series of posts very thoroughly initially, so I’m catching up. Hope to be able to make more sense soon.

David
August 18, 2013 4:54 am

It is a certainty that the measurement errors are systemic, and that is a can of worms, varying by instruments, by location changes, by nations, by mumber of stations, by station locations. It is also a certainty that the recorded official T is still being changed. The recorded data sets have been changed many times, and are still changing, so from which data set do you get your meausremnts and anomalies.
A far more accurate way of getting the measurement uncertainty is to observe it, instead of teasing it out through numeric sophistery. That is, observe the land based differences in anomaly trends, verses the satelite trend over the same land area. Otherwise it is fair to ask why the vast majority of the changes to measured T warm the present, and cool the past.
Of course, this is all academic, as all the reported disasters of CAGW are not happening. (Even NH sea ice is now strongly rebonding with changes in ocean currents and jet streams, besides which, no one ever gave observational evidence of world calamity with melting NH sea ice) ) So the3 disaster are not happening, and neither is the warming. In CAGW neither the “C” or the “W” (for the past 15 years) is happening. The entire SH never warmed much at all so the “G” is also missing. Very Shakeperian, “Much ado about nothing”

HaroldW
August 18, 2013 4:57 am

Willis: “the final error estimate for a given monthā€™s anomaly cannot be smaller than the error in the climatology for that month.”
This statement is not true in general. A static bias term will be present in the climatology and the monthly (absolute) error estimates, but will not be present in the anomaly.
However, I don’t know if this is the case here.

John Norris
August 18, 2013 5:16 am

Pat Frank
“shield irradiance, ground albedo (including winter snow), and wind speed all impact the measurement accuracy of otherwise well-functioning air temperature sensors. ….”
Okay, so those all sound like results of an imperfect sensor system so I’d categorize that as error type #1 and then would choose to include it in uncertainty, unlike error type #3.

Nick Stokes
August 18, 2013 5:44 am

Willis,
“When we ā€œremove seasonalityā€ by using a base period, as is quite common in climate science, we introduce a repeating series of twelve different errors. And contrary to your and Nickā€™s claims, that act of removing seasonality does indeed increase the standard error of the trend.”
You won’t change the trend for individual months. That’s because you’ve added the same number to each month, and if it fluctuates due to anomaly sampling, that won’t matter.
It will make a small difference to the annual trend. The arithmetic is – figure what difference the base errors would make to a 1-year trend, then divide by the number of years in the trend. It’s an end effect.

Nick Stokes
August 18, 2013 5:56 am

AndyL says: August 18, 2013 at 4:23 am
“Great discussion.
Would it be possible to design a worked example to test out what everyone is saying?”

It’s probably best to just think of a Monte Carlo. You’d set up an anomaly calc procedure and start by just generating measurement variation. You’d get the kind of errors shown in Fig 3.
Then you’d have to figure a distribution for monthly weather variation. Add that in to the Monte Carlo and you’d get the anomaly base error. But it makes sense only relative to the weather variation you have assumed.

Tom in Florida
August 18, 2013 5:57 am

This may not be relevant to this thread but I have always wondered why monthly temperature measurements are grouped by man made calendars. Wouldn’t it make more sense to compare daily temperatures over a period by using celestial starting and ending points so they are consistant over time?. The Earth is not at the same point relative to the Sun on January 1st every year, will this type of small adjustment make any difference? Perhaps full moon to full moon as a period to average?

AndyL
August 18, 2013 6:25 am

Nick Stokes says:
August 18, 2013 at 5:56 am
Itā€™s probably best to just think of a Monte Carlo

I was thinking of a much more basic thought experiment.
Start with lots of devices that measure twice a day.
Each device has an assumed accuracy of x
Human reading is to the nearest 0.5 degree which introduces error y
Tmax and Tmin are averaged to give daily average which does zz to the error
Daily average is averaged again to give monthly accuracy
Interpolations to cover missing values
Outliers are removed etc etc
Give all these errors names so that everyone is clear they are talking about the same thing, then show how they are accounted for and whether or not they are significant

Bill Illis
August 18, 2013 6:34 am

How many Anchorage temperature datasets are there?
Berkeley Earth splits one station – Anchorage Merrill Field – into 16 different separate stations. What is the monthly average when you jack-knife a single station into 16 separate ones
http://berkeleyearth.lbl.gov/auto/Stations/TAVG/Figures/165577-TAVG-Comparison.pdf
http://berkeleyearth.lbl.gov/stations/165577
And then beyond Merrill Field, there are 12 other Anchorage stations – each with its own number of separate jack-knife stations – extending only back to 1916.
http://berkeleyearth.lbl.gov/station-list/?phrase=anchorage
It is just a mess and, there is no way, the methodology itself has not introduced significant systematic errors.

Richard M
August 18, 2013 6:55 am

Interesting discussion, but to me it’s attacking the gnat on the elephant’s butt instead of looking at the elephant. Personally, I’ve always felt the entire approach to generating adjusted historic temperature data sets was misguided (all of them).
What I see is potentially biased opinions being inserted into the data whereas they should be placed in the error bars. TOBS should NOT be used to adjust data, it should be used to created error bars. Same with UHI and all other modifiers to the current data. Leave the poor data alone. I suspect if this were done we would see an entirely different (and more accurate) view of historic temperatures.

August 18, 2013 7:12 am

Can anyone confirm or refute the accuracy of the following simple-minded explanation that I put together for myself to understand what the issue is? I may not be the only reader here who is not as comfortable with the jargon as the disputants are.
If you want to know whether Aprils are getting warmer, say the BEST proponents according to my understanding, it doesn’t much matter what the error is in the average-April number you subtract from the individual-April numbers to get the April anomaly. E.g., it doesn’t matter much whether those anomalies advance from year to year as 0.1, 0.2, 0.3, …, or as 0.5, 0.6, 0.7; the trend is still 0.1/year. So the error in the “climatology” (average of the April temperatures over some number of years) does not contribute to an error in the trend.
But as I (erroneously, no doubt) understand it, Mr. Eschenbach isn’t worrying about whether Aprils are getting warmer. He’s concerned with whether a given year’s March is warmer on a seasonally adjusted basis than its April (than which on an absolute, rather than a seasonally adjusted, basis March is usually cooler here in the Northern Hemisphere). That is, is that year’s March warmer than usual by a greater extent than that year’s April? If we think average March and April temperatures are usually 40 & 50 deg. F. respectively when actually they’re respectively 42 and 48, then the answer you’l come to when a given year’s values are respectively 41 and 49 will be affected by that error: “climatology” errors do matter to this question.
Is this anywhere near a statement of what the question before the house is? I may not be the only layman who would be grateful for an occasional translation of the issues into English.

August 18, 2013 7:20 am

The statement:
“If you want to know whether Aprils are getting warmer, say the BEST proponents according to my understanding, it doesnā€™t much matter what the error is in the average-April number you subtract from the individual-April numbers to get the April anomaly.”
seem debated. My focus is that on top of that, if some today measure april average as the average taken every second during the month, and earlier year was measured as the average of three readings per day, then there is a huge error added. Similarly, if different stations use different methods for their averaging, and then stations are used to adjust each other, errors are introduced.
— Mats —

Pamela Gray
August 18, 2013 7:23 am

My comment is very basic and has to do with linear trend. Linear trend is a straight line through a data series that is placed such that the data points above and below that line are at the smallest distance from that line over the entire series. Another statistic that can be generated from that distance is a single value of difference, which can be considered an “error” value. The greater that calculated value is the less one can say about the trend demonstrating a possible correlation between variables. The smaller that value is, the more confidence one can place in the data demonstrating a possible correlation. If one looks at any linear trend line through observed absolute or calculated anomaly data, I am eyeballing that the “error” calculation is quite large.
Please correct me if I am wrong in my simplified explanation, but sometimes we worry and fret over complicated statistical maths, when the most basic and telling ones are overlooked.

richardscourtney
August 18, 2013 7:32 am

Willis:
Your observation is good.
However, there is a more basic problem; viz.
there is no possible calibration for global temperature because the metric is meaningless and not defined.
This problem is not overcome by use of anomalies. I explain this as follows.
Each team preparing a global temperature time series uses a different method (i.e. different selection of measurement sites, different weightings to measurements, different interpolations between measurement sites, etc.). And each team often alters the method it uses such that past data is changed; see e.g. http://jonova.s3.amazonaws.com/graphs/giss/hansen-giss-1940-1980.gif
Hence, each determination of global temperature has no defined meaning: it is literally meaningless. And an anomaly obtained from a meaningless metric is meaningless.
If global temperature were defined then a determination of it would have a meaning which could be assessed if it could be compared to a calibration standard But global temperature is not a defined metric and so has no possible calibration standard.
A meaningless metric is meaningless, the errors of an undefined metric cannot determined with known accuracy, and the errors of an uncalibrated measurement cannot be known.
The errors of a measurement are meaningless and undefinable when they are obtained for a meaningless, undefined metric with no possibility of calibration.
Richard

highflight56433
August 18, 2013 7:39 am

There is on obvious problem with daily high/low averages, that is this: if the day time high is 72F for three hours vs 72 for 5 minutes, or the low is -15F for 5 hours vs 1 hour. See the problem with leaving duration our of the picture?

highflight56433
August 18, 2013 7:40 am

Repost with correct words: There is an obvious problem with daily high/low averages, that is this: if the day time high is 72F for three hours vs 72 for 5 minutes, or the low is -15F for 5 hours vs 1 hour. See the problem with leaving duration out of the picture?

Luther Wu
August 18, 2013 8:02 am

Nick Stokes says:
August 17, 2013 at 5:13 pm
An irony here is that skeptics have been nagging climate scientists to get the help of statisticians. So when a group of statisticians (at BEST) do get involved, what we hear here is ā€œClimate science gets it wrong againā€.
_________________________
It is very simple, Nick: Why don’t you show us what “Climate Science” gets right?

Jeff Condon
August 18, 2013 8:09 am

Thanks Nick

highflight56433
August 18, 2013 8:14 am

The entire method of taking daily averages is faulty. The current method is actually the median, as in middle value, not the true mean or average. If 24 readings are taken every hour over a given day period, then those 24 should be averaged rather than a the median value. Several other factors are not included: time of day, wind and overcast affect the temperature’s heating affect. Personally, I would take the entire years data set to come up with one average: 8760 data points per station given 24 readings per day.

LdB
August 18, 2013 8:47 am

Stokes says:
August 18, 2013 at 5:56 am
Itā€™s probably best to just think of a Monte Carlo
You need however to remember a basic problem with that … is climate deterministic? I think everyone agrees it is slightly chaotic and is probably the biggest source of error you are trying to work out how to handle.
Climate scientists actually need to do a little discussion with engineers who are well versed in slightly chaotic signals and signal processing. Brad above actually gave you one possibility by using an Unscented Kalman Filter (UKF … because it’s not linear) and I see a few in climate science have played with trying to adjust Lorenz (1996).
Until you people all realize you can’t solve these sorts of problems with statistics you are going to continue bang your head against the wall.
Willis you know I respect you and your tenacity but do you see the problem and that there is no way into this by statistics solely you need to deal with the slight chaos something statistics can’t do by itself. Willis spend some time talking to someone at a local university about signal processing chaotic signals. The other choice is someone involved in deep space radio comms such as flicker noise spectroscopy and the like.

August 18, 2013 9:02 am

LdB says:
August 18, 2013 at 8:47 am
“…Climate scientists actually need to do a little discussion with engineers who are well versed in slightly chaotic signals and signal processing….”
No meaning in applying any model onto a system if the errors in your signal are too high compared to your wanted output. Aiming the discussion on the average of arctic and antarctic, or northern and southern, or… instead of on arctic plus antarctic, northern plus southern, …, aims too much simplifications based on too much source and result measurement errors.
— Mats —

Pamela Gray
August 18, 2013 9:16 am

LdB, to further your discussion, and maybe not in the direction you intended (correct me if I am wrong), we filter out the very thing we should be studying. The noise. It is screaming the strength of natural variation as loud as it can but half the world is not listening. By studying it, we will be able to see that it is not random white noise. The signals of oceanic and atmospheric teleconnections and oscillations are abundantly clear.

Yancey Ward
August 18, 2013 9:40 am

There are so many problems in this field, you really need a time machine to get a meaningful answer I’m forced to say.
However, I would really like to second AndyL’s comment at August 18, 2013 at 6:25 am. A detailed example would be really helpful here.

Typhoon
August 18, 2013 9:42 am

This entire historical “mean global temperature” reconstruction comes across as an exercise in GIGO data processing:
1/ Only a small part of the planet was instrumented back in, say, 1800. This mean that massive interpolation, which is nothing more than guessing, must be done across large areas of the planet.
No amount of data processing can recreate physical measurements were non were originally made.
Such guesstimates are a type of systematic error/bias.
2/ How well known are the systematic errors/biases in the measuring devices used back in, say, 1800?
The accuracy, precision, and calibration drift over time? There is no a priori reason to assume that such systematic errors of different instruments are normally distributed and average out. Rather such systematic errors, assuming that they are independent, are added together in quadrature in standard error analysis – estimation.
http://www.ocean-sci.net/9/683/2013/os-9-683-2013.html
http://meetingorganizer.copernicus.org/EGU2009/EGU2009-5747-3.pdf
The weather station siting and other systematic bias issues investigated and documented by A. Watts et al.
3/ Given the planet is an open dynamical system far from thermodynamic equilibrium, does the measurement of an intensive quantity such as temperature to calculate a spatial average have any physical meaning? Should an extensive quantity such as heat content not be used instead?
Given the above, the claim that the “mean global temperature” back in 1800 is known to within +/- 0.1C [Hansen et al] not only strains credulity, but tosses it in the blender.

Joseph Murphy
August 18, 2013 9:48 am

Steven Mosher says:
August 17, 2013 at 4:21 pm
ā€œBut in another sense the error that you are talking about matters less, because it is the same number over all years. Put another way, often when you are talking about anomalies, you donā€™t care so much about the climatology (or the base years). You want to know whether a particular year was hotter than other years, or to calculate a trend. The error you are discussing wonā€™t affect trends.ā€
Precisely.
Also if people want to go back to absolute temperature with our series they can. We solve the field for temperature. In our mind we would never take anomalies except to compare with other series. Or weā€™d take the anomaly over the whole period not just thirty years and then adjust it accordingly to line up with other series. So, anomalies are really only important when you want to compare to other people ( like roy spencer ) or ( jim hansen) who only publish anomalies. Anomalies are also useful if you are calculating trends in monthly data.
<<<<<<<<<<<<<<<<<<<<<<<<<
From my non statistical mind I feel like a bait and switch has happened here. Many low resolution readings will lower the error bars if they are all trying to read the same thing, if today actually had a temperature. But, it does not. The day has many temperatures and we are trying to create a single one. Since each thermometer is measuring something different (literally) the trick of lowering the error bars does not seem to apply.

climatebeagle
August 18, 2013 10:29 am

What do the retroactive changes imply about the errors? If the temperature at Teigarhorn in Jan 1900 can change from 0.7°C to -0.5°C does that imply the errors on any stations monthly average is at least ±1.2°C?
http://endisnighnot.blogspot.co.uk/2013/08/the-past-is-getting-colder.html

Gene L
August 18, 2013 10:31 am

Seems to me that global temperature pretty much follows number of windmills: http://upload.wikimedia.org/wikipedia/commons/1/1a/Wind_generation-with_semilog_plot.png
/sarc

richardscourtney
August 18, 2013 10:44 am

climatebeagle:
At August 18, 2013 at 10:29 am you ask

What do the retroactive changes imply about the errors? If the temperature at Teigarhorn in Jan 1900 can change from 0.7Ā°C to -0.5Ā°C does that imply the errors on any stations monthly average is at least Ā±1.2Ā°C?
http://endisnighnot.blogspot.co.uk/2013/08/the-past-is-getting-colder.html

No, it means the errors are potentially infinite because the value of global temperature can be computed to be whatever you want it to be.
Please see my above post at August 18, 2013 at 7:32 am.
This link jumps to it
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1393835
Richard

August 18, 2013 11:05 am

LuboÅ”, “The systematic error of the average of similar pieces of data doesnā€™t decrease with N but it is the same as it is for the individual entries ā€“ it can never be greater.
That’s true only when the systematic error is constant. Systematic error is not necessarily constant when the problem is uncontrolled experimental variables. That is the case in air temperature measurements, for which the effects of insolation, wind speed, and albedo, among others, are all uncontrolled and variable. In these cases, systematic error can increase or decrease with N, but the direction is always unknown. So, one can only get an average of systematic error by carrying out a calibration experiment with a known standard, and recording the observed error. That error is then applied as an uncertainty in the measurements of the experimental unknown.
Gail Combs certainly knows this approach to experimental measurement error, and her discussion on this thread is exactly pertinent. Average systematic error is always an empirical quantity. The true magnitude of systematic error in a given experiment is unknown. Data containing systematic error can look and behave just like good-quality data. The contaminated data may correlate between laboratories, and may pass all manner of statistical tests. The only way to get a handle on it is to track down the sources and eliminate them (if possible). The typical way to deal with systematic error is by calibrating the instruments and carrying a known standard through the experiment, to make sure that any such error isn’t large enough to wreck the experiment.

Carrick
August 18, 2013 11:06 am

Willis Eschenbach:

I understand all of that, Steven, and thanks for the explanation. The part I donā€™t understand is, how can the error in your published climatology be GREATER than the error in your published anomaly. Thatā€™s the part that you and Nick havenā€™t explained (or if so, I sure missed it).

If I’m following your question, you can have larger errors on an absolute scale than on a relative scale. If there is an overall offset error that is constant across measurements, using one (or a subset) of measurements as a new base line that you subtract all measurements from, reduces this offset error.
There’s a bit of discussion in this long thread on Lucia’s blog that may be helpful.

August 18, 2013 11:16 am

Jeff, it’s there in your Reply Part 1, with your admission, finally, that, “I get that Pat hasnā€™t included weather noise in his final calculations for Table 1,2 and the figures…” That admission took the heart out of your critical position.

Crispin in Waterloo
August 18, 2013 11:24 am

@Lubos and the all other wise contributors
“yes, 1/sqrt(N) is the behavior of the statistical error, e.g. a part of the error from individual errors that are independent for each measurement.
“Then there can be a systematic error that is ā€œsharedā€ by the quantities that are being averaged. The systematic error of the average of similar pieces of data doesnā€™t decrease with N but it is the same as it is for the individual entries ā€“ it can never be greater.”
+++++++++
Thanks. I almost entirely agree. There could be systematic errors in the way the averaging is done so I hold open that caveat. I think the problem with temperatures is the instruments have different precisions and accuracies but there are ways to account for that. That does not contradict your point at all, however.
My second worries on this relate to the kinda loose manner in which ‘accuracy’ and ‘precision’ are being used in peer reviewed works. When one gets a lot more data points within a territory, extending the shooting range metaphor I sued above, it means getting a lot more shooters to use a different rifle each to take shots at their respective targets. None of that increase makes the shooters more accurate or precise and again, the precision of each rifle is not improved by having more of them.
My refinement of the perspective is this: Imaging you did not see any of the shooting take place. You do not know where the bullseye was for any shooter. All you have is a blank target with a large number of holes shot into it. The task at hand is to work out where the bullseye really was – the actual average temperature. The calculations demonstrated do not decrease the size of the error bars which is rooted in the precision of the rifle and the shooter and the accuracy of the shooters.
What is increased is the precision of the location of that point where we can confidently place the center of the error bar. We can have more confidence that it lies at exactly a certain point, to several significant digits, but in no way does this certain knowledge reduce the vertical height of the error bars. For temperatures it is still Ā±0.5Ā°C for most of the land record.
This point about confidence as opposed to knowledge is not being stated clearly. Claims for increased precision in our knowledge of the position of the center of the error bar is being claimed to be an increase in the precision of the calculated answer. They are two different things entirely. One is like the GPS coordinates of a car, the other is like the size of the car.
Here is another example in the form of a question:
If I measure the same mass piece with 1000 different calibrated scales each having a resolution of 20 g, how precisely and now accurately can I state that single piece’s true mass?
If I measure 1000 different mass pieces, one mass piece weighed once on each scale, with what precision and accuracy can I claim to know the average mass of all the pieces?
These are conceptually different problems. The latter question is the one that applies to temperature measurements at different stations. The final answer cannot be better than the best instrument but might be worse than the worst instrument because of various systematic errors in the processing of data, or experimental errors in acquiring, transcribing or archiving it.
Regards
Crispin

LdB
August 18, 2013 11:31 am

Pamela Gray says:
August 18, 2013 at 9:16 am
LdB, to further your discussion, and maybe not in the direction you intended (correct me if I am wrong), we filter out the very thing we should be studying. The noise. It is screaming the strength of natural variation as loud as it can but half the world is not listening.
You need to think about how you are filtering it out …. look at Nicks example above he is using a 30 year average. If there is a 100 year, 200 year, 400, 1000 year chaotic signals it goes straight thru any filter you construct. There are very likely to be fluctuations of that size because of the earths size and thermodynamic inertia.
The problem is you can’t keep stretching the time because you are killing the climate signal you are actually looking for. See the problem the low frequency chaotic noise is going to slide easily thru any filter you construct for climate change because it will have components of the same sort of timespans. They may not be large but they are probably going to be the largest error post filtering and there is no easy way to separate them.
That’s why it becomes a digital filtering exercise, if you look at J.D. Annan et al work with oceans that is sort of heading that direction but looks like they have troubles with some of the models.

JDN
August 18, 2013 11:37 am

@Wilis: Super resolution microscopy uses techniques where total error (measuring the position of a light emitter) is smaller than the measurement error (the diffraction limit of light).
See: http://en.wikipedia.org/wiki/Super_resolution_microscopy
The PALM & STORM methods use the aggregate measurement and infer true position from assumptions about the point spread function and the probability of having more than one emitter in the field.
I’m not sure how that affects your argument, but, these techniques have ben validated in subcellular structures in many ways and they are examples of what you were asking for.

August 18, 2013 11:37 am

Crispin in Waterloo says:
August 18, 2013 at 11:24 am
“…What is increased is the precision of the location of that point where we can confidently place the center of the error bar. We can have more confidence that it lies at exactly a certain point, to several significant digits, but in no way does this certain knowledge reduce the vertical height of the error bars. For temperatures it is still Ā±0.5Ā°C for most of the land record…”
Imaging the shooters do not have that different rifles. Instead, all rifles Had come from one factory “CRUT” (Center of Rifles Used for Testing). They are of some different models (CRUT1, Crut2, …), but with similar construction (like methods for averaging and compensating for errors). All rifles tend to go a little high to the right. You do not have a better clue of where the bullseye where, as long as there is a systematic error.
— Mats —

Theo Goodwin
August 18, 2013 11:57 am

Pat Frank says:
August 18, 2013 at 11:05 am
Great work, Pat Frank. Too bad that those whom you intend to help just cannot imagine that empirical matters are relevant to their statistical claims. Please keep up your good work because many benefit from it, as does empirical science generally.
To those who have no respect for empirical matters in the surface temperature record, especially BEST, I have one question. When are you going to accept Anthony’s five-fold classification of measurement stations, work your statistical magic on his numbers, and then address his claims about bias in the measurement records?

August 18, 2013 12:05 pm

On systematic errors.
The largely unstated assumption is that Tmin represents some climatological significant value, specifically how cool a location gets at night, even though Tmin typically occurs after dawn.
The value of Tmin is dependent on the time it occurs. There are several factors that affect the timing of Tmin, including early morning insolation, which is affected by aerosols and low level clouds, both of which are known to have declined over recent decades.
Therefore changes in Tmin have 2 components. One, the change in night time temperatures, is climatologically significant, arguably our signal. The other, the change in early morning insolation, is not climatologically significant.
I have previously written, the change in early morning insolation could be as much as half the change in Tmin over the last 60 years, and a major reason for the satellite – surface temperature divergence, as insolation changes only affect surface temperatures.

Pamela Gray
August 18, 2013 12:51 pm

There likely is not any kind of hard-sided cycle in climate. The same “cycle”, if you want to use that term, may have a periodicity with highly stretchable onset and offset period, not to mention a highly malleable duration. In addition, the cycles likely do not cancel out between the warm and cool phase. Any kind of filter at all, removes the most important part of the data series in my opinion. However, the raw data does need some kind of statistical work. That is why I have always wanted to see a three month moving average along with seasonal combined month averages and warm/cool changes, much like what is done with oceanic and atmospheric data sets.
Of even more importance is this fact, well accepted by climate researchers: the anthropogenic portion of the absolute temperature data or anomaly cannot be teased out of a global or regional average. It can be teased out of local data if a control match gold standard is available. And now we are back to calibration, of which there is none.

Nick Stokes
August 18, 2013 12:58 pm

Willis,
“The point of your whole post seems to be that the error is small. First, we donā€™t know that overall, and itā€™s certainly not small in the Anchorage data. “
I agree that anomaly base error is not small relative to measurement error. My point is that it is inappropriate to make that comparison alone. They are different kinds of error.
We have two distinct uncertainties – one about the weather that was, and one about the weather that might have been. We actually calculate the anomaly using the weather that was. It’s a weighted sum of a whole lot of monthly averages, in turn sums of daily readings, and our uncertainty about that is mainly in how well those numbers were measured. So if May 2012 Anchorage is in that sum, how uncertain are we about that figure. That collected uncertainty is what Fig 3 expresses.
The anomaly error you are writing about is not concerned with the weather that was. We know the weather that was in 1951-80 (subject to measurement error).
The weather that might have been is what we need to think about when making deductions about climate. For anomalies, what if 1951-80 had by chance been a run of hot years? Now, instead of just a bunch of numbers that we were wondering about whether we’d measured them right, we’re thinking of them as an instance of a stochastic process. And we need a model for that process. Once we have that, and not before, we can calculate the anomaly error that you describe. That is the weather that might have been.
But once we do that, we are dealing with the other uncertainties about the weather that might have been – basically estimated by month-month variation. Eyeballing Anchorage, that looks like about 2°C. That’s when you need to worry about anomaly base error, and it’s a fairly fixed (and small) proportion. For any calc in which you are treating the temps as an instance of a stochastic process, the anomaly base error is about 3% added to the monthly anomaly instance. It’s quite large relative to measurement error, but put another way, measurement error is a small part of your uncertainty about climate.

August 18, 2013 1:02 pm

This is a test. Have I been banned?

Coldlynx
August 18, 2013 1:12 pm

To allow future improvement of measuring technology would use of absolute degree hours be very useful to calculate monthly average. K x h summed for each month. More frequent reading will show up as a more exact value. Old values will still be usable. Add then the monthly variance when presenting the resulting trends to give the reader a sanity check to see if the trend is within normal climate variance.

August 18, 2013 1:12 pm

richardscourtney said @ August 18, 2013 at 7:32 am

Hence, each determination of global temperature has no defined meaning: it is literally meaningless. And an anomaly obtained from a meaningless metric is meaningless.

It amuses me that those engaging in this numerological exercise so often claim that medieval philosophers spent their time debating how many angels could dance on the head of a pin, a claim for which there is absolutely no evidence.

Carrick
August 18, 2013 1:17 pm

Since it’s been brought up a few times, there was a bit of discussion on Jeff’s blog of Pat Frank’s E&E paper. This is a good place to start.
Lucia had a follow-up here.
This is another place the basic issues surrounding measurement uncertainty in absolute versus relative temperature are explored.

Pamela Gray
August 18, 2013 1:42 pm

I think at issue here is also that an anomaly value taken from an average of data does not transfer the range of that raw data (which is not an error in the common understanding of the term, it is just the range of temperatures collected in a climate region – for example NWS’ recently improved delineation of weather pattern regions in the US – collected at all points at one time to the anomaly calculation. To be sure, the error in calibration is relatively small compared to the range of temperatures collected for a day within a single weather pattern region. Maybe raw data needs to be grouped into weather pattern regions so calculate that range and go from there. Is there a way to transfer the raw data range into the anomaly?

Jeff Condon
August 18, 2013 2:19 pm

Pat,
My critical position hasn’t changed. One thing you did do, that a lot of your critics don’t, is put your work out there and I hope people understand that is no small thing. It would be good to see more effort put into the data but we are not government paid climate citizens.

Pamela Gray
August 18, 2013 2:28 pm

Note to self – USE THE PREVIEW BUTTON!

X Anomaly
August 18, 2013 3:18 pm

“Is there a way to transfer the raw data range into the anomaly?”
,
Bob Tisdale calculates weekly anomalies all the time. If you had daily data (and lots of it), you could calculate daily anomalies. If you had hourly data, say 10 years worth, or 87600 points of data, then yeah sure, you could do hourly anomalies as well.
I always find it fascinating that with just about all climate data anomalies there is an underlying cycle /trend (diurnal, seasonal) that is removed so that we can see the departures from what is considered ‘normal’. Sea Ice is an interesting anomaly recently because now even the anomalies themselves have a cycle embedded in them (recurring loss of summer sea ice). Maybe removing this new ‘recurring loss’ sea ice from the data might highlight something interesting (i.e. an anomaly of an anomaly!!!).
As for all this stuff about errors its given me a head ache! I will only add that when removing an recurring underlying cycle in the data, such as calculating anomalies, the standard deviation for each month will be exactly the same whether it an anomaly or not. So in essence the (monthly) data are unchanged.

bw
August 18, 2013 3:42 pm

Nice responses by Proctor, Istvan, Riser and Frank. Most of the “data” were never intended to have any scientific validity for a global analysis. Hoffer’s conclusion seems apt.
Note also substantial boundry layer effects on surface thermometers reported by Pielke, Sr.
http://pielkeclimatesci.files.wordpress.com/2009/10/r-321.pdf
The pielke paper has already examined most of this subject and more. He recommends rejecting most of the data, and only using Ocean Heat Content for analysis.
Another conclusion is that the Daily Tmax would be more appropriate than the Tave to detect global scale temperaure changes

ZT
August 18, 2013 4:20 pm

In science (sorry to mention that largely taboo subject) you model known information and show that your treatment/theory/etc. can explain known observation. When you have convinced yourself (and others) of the validity of your treatment, you then make predictions. Applying this ‘scientific’ approach to the discussion here, one would have various options. For example, you could generate some synthetic temperature data for a set of geographically distributed measurement sites with known variations, trends, noise, etc. based on random numbers (aka Monte Carlo), and show that the error estimates and trends recovered by your analysis match those used in creating the synthetic input. Of you could could divide your input information in half and use half your input to show that it never falls outside the projected estimates obtained from the data input to your procedure, etc. (This might in fact turn out to be reasonable use of a GCM – see if the land temperature analysis can recover the inputs to the GCM…).
From what I’ve read here, the true believers (and funding recipients) are (as usual) trying to convince the unwashed masses by shouting rather than showing how it is that their analysis can magically reduce the errors in inputs. How about providing a demonstration? (And no – I don’t have to provide a demonstration – the climatologists are trying to ‘prove’ something – not me!).

August 18, 2013 5:01 pm

Interesting response.
Regarding the use of quadrature to reduce error in final (averaged) results: it seems a number agree with me that quadrature is appropriate only when there are multiple readings of the same fixed parameter using the same devices and the same procedures. Using quadrature when combining data from 1500+ land stations and 3500 mobile ARGO sea stations is therefore inappropriate: the error one should use in these is that of the average individual station, not the square root of the increase in data (1/10th the error with 100X the data?).
This is a fundamental consideration. Many commenters have not addressed this even as a response to a comment. I ask the learned statisticians: is this a correct position wrt statistical analyses of a parameter with varying values, using varying instruments?
I know that “adjustments” are made to address the varying instruments and procedures, but there are a limited number of issues here, and they are again the lack of “sameness” applies: thousands of stations that are independent, each having its own error potential.
It is only possible to use the quadrature procedure if you believe that every station, even if unique, is consistent in its error. If a station is sometimes high, sometimes low, even with the same actual temperature. If, instead, human or machine read, the +/- means that if the temperature event occurred again, it would be either reading as higher or lower within the band OR that a temperature event within the high/low would give the same reading, you are stuck with the minimum error reading of, at best, the P50 group.
What I see is a disconnect within the concept of error: we get an error-bar for temperature, sea-level or Mann’s dendrochronology, and then behave as if the center is the correct value down to hundredths of a part. We don’t respect that at any given time the actual event measurement may be at the top or at the bottom, that a strange, random walk may be occurring within the error bars that actually defines the true state of affairs. If this is so, then a small variation – like the ARGO deep sea numbers – would be statistically meaningless, because it would be smaller than the recognized random walk of the world/universe/temperature history of the Earth as a globe and in its parts.
Comments?

Crispin in Waterloo
August 18, 2013 6:21 pm

P
Final paragraph is quite right. I see it all the time in crazily correct climate claims (C^4).
@matsibengtsson
“All rifles tend to go a little high to the right. You do not have a better clue of where the bullseye where, as long as there is a systematic error.”
That is an accuracy problem – the shooter should have compensated for a known issue – it should have shown up in calibration. That begs the question about who is doing the calibrations and how well are they doing it. Think about the usefulness of calibrating really well and getting a really precise measurement of a temperature that is strongly affected by UHI.
Measurement systems require system-level, rational analysis. I see great opportunities for mathematical sleight-of-hand when passing casually back and forth between anomalies. Willis is really poking the right dog here.
When I see someone in the lab measure something with a thermocouple that is precise to 0.02 degrees and then [black box, hum-haw, kerfuffle, kersproing] Presto! Out comes a reading to 6.5 digit precision! Then I know there is a carpet bag in the cloak room.

David L. Hagen
August 18, 2013 6:23 pm

Pat Frank notes August 17, 2013 at 3:08 pm:

However, sensor field calibration studies show large systematic air temperature measurement errors that cannot be decremented away. These errors do not appear in the analyses by CRU, GISS or BEST.

GlynnMhor August 17, 2013 at 3:34 pm
re: ā€œsystematic error is present and cancels outā€
Pat Frank observes: Aug 18, 11:05 am

That is the case in air temperature measurements, for which the effects of insolation, wind speed, and albedo, among others, are all uncontrolled and variable. In these cases, systematic error can increase or decrease with N, but the direction is always unknown.

Watts et al. 2012 find

1) Class 1 & 2 stations 0.155 C/decade
2) Class 3, 4 & 5 stations 0.248 C/decade
3) NOAA final adjusted data 0.309 C/decade.

.
Further thoughts:
1) The Urban Heat Island effect (UHI) increases with population over time.
2) UHI increases with increasing energy use over time.
3) UHI for a station can change drastically with changing station microclimate.
E.g., by relocation nearer aasphalt, walls, airconditioners etc.
Or by adding airconditioners, or changing ground cover to concrete, cinder or asphalt.
4) UHI overall decreases as NOAA withdraws class 5 stations.
5) The UHI input to Type B error shows up in rising Tmin greater than Tmax.
6) NOAA adjustments to data adds the equivalent of ~0.61 C/decade Type B error.
7) With increasing population and relocating stations etc, UHI varies and generally increases with time.
The majority of such UHI errors appear to be Type B errors, which do NOT decline as 1/sqrt (N) Type A errors. It would help to distinguish and break Type A and Type B errors and trends.
With nonlinearly increasing time varying Type B errors, all the Type B errors will NOT cancel out when using a single long term average.
Willis
For the formal official international definitions and full equations, see JCGM 100:2008 and NIST TN1297 (1994) references below.
From my cursory review, I think time varying Type B variances need to be accounted for and distinguished from Type A variances to account for these varying UHI effect above, and how these in turn affect the mean used when calculating the anomalies. I will let you more mathematically included types dig into the details.
Regards
David
Uncertainty Analysis
Evaluation of measurement data – Guide to the expression of uncertainty in measurement. JCGM 100: 2008 BIPM (GUM 1995 with minor corrections) Corrected version 2010
Note the two categories of uncertainty:
A. those which are evaluated by statistical methods,
B. those which are evaluated by other means.
See the diagram on p53 D-2 Graphical illustration of values, error, and uncertainty.
Type B errors are often overlooked. E.g.

3.3.2 In practice, there are many possible sources of uncertainty in a measurement, including:
a) incomplete definition of the measurand;
b) imperfect reaIization of the definition of the measurand;
c) nonrepresentative sampling ā€” the sample measured may not represent the defined measurand;
d) inadequate knowledge of the effects of environmental conditions on the measurement or imperfect measurement of environmental conditions;
e) personal bias in reading analogue instruments;
f) finite instrument resolution or discrimination threshold;
g) inexact values of measurement standards and reference materials;
h) inexact values of constants and other parameters obtained from external sources and used in the data-reduction algorithm;
i) approximations and assumptions incorporated in the measurement method and procedure;
j) variations in repeated observations of the measurand under apparently identical conditions.
These sources are not necessarily independent, and some of sources a) to i) may contribute to source j). Of course, an unrecognized systematic effect cannot be taken into account in the evaluation of the uncertainty of the result of a measurement but contributes to its error.

Type B uncertainties including unknown unknowns could be comparable to Type A uncertainties.
Furthermore:

when all of the known or suspected components of error have been evaluated and the appropriate corrections have been applied, there still remains an uncertainty about the correctness of the stated result, that is, a doubt about how well the result of the measurement represents the value of the quantity being measured.

( See also Barry N. Taylor and Chris E. Kuyatt, Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results, NIST TN1297 PDF)
The Type A standard uncertainty of n measurements vary as the square root of ((1/(n*(n-1)) times the sum of the squares of the errors.)
See JCGM 100 2008 4.2.2 & 4.2.3, or TN1297 Appendix equation (A-5)
The Type B standard uncertainty of measurements depends on the distribution of the uncertainty but NOT on the number of measurements. See JCGM 100:2008 4.3.7 or TN1297 Appendix equation (A-7)
ā€œ5.1 The combined standard uncertainty of a measurement result,

suggested symbol uc, is taken to represent the estimated standard deviation of the result. It is obtained by combining the individual standard uncertainties ui (and covariances as appropriate), whether arising from a Type A evaluation or a Type B evaluation, using the usual method for combining standard deviations. This method, which is summarized in Appendix A [Eq. (A-3)], is often called the law of propagation of uncertainty and in common parlance the “root-sum-of-squares” (square root of the sum-of-the-squares) or “RSS” method of combining uncertainty components estimated as standard deviations.

PS JCGM 100:2008 0.7 ā€œThere is not always a simple correspondence between the classification into categories A or B and the previously used classification into ā€œrandomā€ and ā€œsystematicā€ uncertainties. The term ā€œsystematic uncertaintyā€ can be misleading and should be avoided.ā€

LdB
August 18, 2013 7:02 pm

@Pamela Gray says:
August 18, 2013 at 12:51 pm
Any kind of filter at all, removes the most important part of the data series in my opinion.
The problem is then you miss a very important science fact that a signals as opposed to noise can not be removed by any filter it needs a “specific filter” because a signal is not at all random.
This problem has a very important historic overtone .. look at the Holmdel horn antenna.
http://en.wikipedia.org/wiki/Holmdel_Horn_Antenna
What Arno Penzias and Robert Wilson were trying to work out is why they couldn’t silence and filter the noise out of the Holmdel horn antenna. They realized it was impossible because it was a signal and a signal is not random and you can not filter it out without understanding how the signal is distorting your data. It lead to one of the most important signals of all time the cosmic background radiation and the signal was everywhere.
To deal with a signal and filter it out you need to understand the signal and you can construct a filter to remove it because it isn’t a random effect it’s not 0.5 here and -0.5 there, it has a pattern and that pattern is not random.
The only way you could ignore the effect is if you were absolutely sure that the interfering signals were very small and you could even assign an error value to them so you would say that the background chaotic signals had a value of +- ? degree. For example in a normal radio scenario the CMBR is unimportant because the signal is so large compared to it but as you saw you start looking for signals from deep space and it became a problem. The CMBR level varies over frequency range and that itself also became important but that’s another story.
The same is probably true of the climate signal there will be different chaotic signals at different frequencies and so you will need to understand the background to remove them.
Eventually Pamela if climate science is right, and I suspect it is at least somewhat right, the climate signal will grow and you will be obviously able to separate it from the background (the radio equivalent here on earth) at the moment however you are in the deep space signal mode looking for a very small signal with other signals possibly of equal size mixed into the back of it.

Geoff Withnell
August 18, 2013 7:08 pm

Calibration is comparison to a standard. What is the standard? Remember, we are using the location of the bullet holes (the temperature readings from the instruments) to determine the location of the bullseye (the “true” temperature). So any calibration compensation would be based on what information? How do you know what direction to move the actual instrumental temperature readings to get them to center on the “correct temperature”?
Final paragraph is quite right. I see it all the time in crazily correct climate claims (C^4).
@matsibengtsson
ā€œAll rifles tend to go a little high to the right. You do not have a better clue of where the bullseye where, as long as there is a systematic error.ā€
That is an accuracy problem ā€“ the shooter should have compensated for a known issue ā€“ it should have shown up in calibration. That begs the question about who is doing the calibrations and how well are they doing it. Think about the usefulness of calibrating really well and getting a really precise measurement of a temperature that is strongly affected by UHI.
Measurement systems require system-level, rational analysis. I see great opportunities for mathematical sleight-of-hand when passing casually back and forth between anomalies. Willis is really poking the right dog here.
When I see someone in the lab measure something with a thermocouple that is precise to 0.02 degrees and then [black box, hum-haw, kerfuffle, kersproing] Presto! Out comes a reading to 6.5 digit precision! Then I know there is a carpet bag in the cloak room.

Jeff Cagle
August 18, 2013 7:15 pm

OK, so I want to retract my statement above that the errors in monthly averages would affect only the intercept. This was a clear case of having the wrong mental picture: since the errors in monthly averages are all different, they add a 12-month periodic error sequence to the anomalies.
However, I’ve done some experiments that convince me that any reasonable-sized error in the monthly averages will have negligible effect on the trends in anomalies.
Here are the experiments. Take the BEST Anchorage, AK data from 1900 on (no missing values from then on). Compute the trendline.
Now, generate a random set of 12 monthly errors. I used a normal distribution mu = 0, sigma = 0.1 on the theory that this is worst-case: since there is a 30-day sample period per month, even one year’s worth of data should have an error in the mean of less than 0.1 per month.
Now systematically add those random errors to the monthly anomalies and recompute trends. I did this for the following time periods:
1900 – 1902
1900 – 1910
1900 – 1920
1900 – 1930
for 100 runs for each time period.
Results:
1900 – 1902. Baseline trend: -0.0253. Average of trends with error: -0.0256. Std dev: 0.0020
1900 – 1910. Baseline trend: -0.0026. Average of trends with error: -0.0026. Std dev 8E-5
1900 – 1920. Baseline trend: 0.002859. Average of trends with error: 0.002861. Std dev 2E-5
1900 – 1930. Baseline trend: 0.00406. Average of trends with error: 0.00406. Std dev 9E-6.
I conclude from this that an error of size 0.1 in the monthly averages — and this is large given a 30-day sample period, I would imagine — is negligible in terms of error in trends.
Thoughts?

LdB
August 18, 2013 7:43 pm

The problem is simple Jeff and it is the problem Pamela above is facing: [] your massive assumption is the error is random.
I am going to introduce a new small error so the problem is your shooter is now shooting outdoors and the target is a long way away in the open outdoors.
Can you run your analysis in such a situation without also recording wind strength which will effect the result of the bullets flight?
Think hard about what happens to your filtering of the data from the above.
Do you see what happens the accuracy of the shot will vary with wind speed and that isn’t random you can’t filter it out you aren’t guaranteed to shoot on 50% windy days and 50% still days.
You do your technique and you end up with complete rubbish if I shot on all windy days you get one result if I shot on all still days I get a totally different. Your error term is totally dependent on another signal.
That’s the problem of interfering signal noise versus random noise. You are all assuming you only have random noise in the data but you need to be dam sure of that given the level of accuracy you are trying to work at.

Jeff Cagle
August 18, 2013 8:04 pm

@all: The time intervals above are “off-by-one.” They should read
1900 – 1901 (inclusive)
1900 – 1909
1900 – 1919
1900 – 1929
@LdB: I don’t know much about shooters and wind. However, a normal distribution of errors *for the monthly averages* is not a necessary assumption. The reason is simple, and has to do with why my experiment gives such consistent results.
Take the BEST data from 1900 to present and run a linear regression with your favorite stats package. I happen to be using JMP. Here is the output:
T(t)=Ī²_1 t+Ī²_0+Īµ_t
Results:
Ī²_1=0.0013 95% CI:[0.0009,0.0017]
r^2=0.029 Ļƒ=2.8609
Notice two things about this. First, the variability is huge — 2.86 even with 1359 observations. This means that although the confidence interval for the mean is very tight, the confidence interval for individual estimates is very wide.
The second thing to notice is an obvious corollary: r^2 is tiny, laughably so.
The accompanying scatter plot tells the tale. The variability is humongous (and is approximately normally distributed).
In light of that large variability, a small error in the monthly averages is meaningless. That’s really the story here. If we add in quadrature the natural variability of the anomalies together with the error in the monthly averages, we essentially get the natural variability.
So I could change my distribution to be anything at all and get pretty much the same result. I bet you 10 quatloos.

JFD
August 18, 2013 8:42 pm

Very interesting, Willis. Good work as usual. You stirred them up with this one. The first thing I learned as a process plant engineer is that you can’t average temperatures. The enthalpy changes with temperature. The earth’s temperature ranges from minus 75 F to 145F. One would have to make a substantial enthalpy correction as well when dealing with a gas, in this case mostly N2 +O2. Even with an enthalpy correction, the answer would still be subject to the two readings a day problem.
I do this from time to time and that is look at the 29 temperature station readings within a 15 mile radius of my home. The wind is dead calm across the area. The elevation is 200 feet plus or minus 20 feet. The temperature stations are mostly Rapid Fire, with a few Madis and a few Normal. The area is pine trees with openings for many homes, a few light manufacturing sites, paved roads and a concrete interstate highway, one city, one small town, several shopping centers and some pasture land for cattle. This time the readings in F were, starting at the highest:
1. 88, 88 =2
2. 84.2, 84 = 2
3. 83.8, 84, 83.9, 83.7, 83.3 = 5
4. 82.4, 82.2, 82.9, 82.4 = 4
5. 81.9, 81.2 = 2
6. 80.6, 80.1, 80.1, 80.7, 80.6 = 5
7. 79 = 1
8. 78.1, 78.4 = 2
9. 77, 77.9 = 2
10. 76.7 = 1
11. 73.2 = 1
You could throw out the two high readings and the two low readings and the spread is still almost 7F. You could weight average the readings and narrow the spread but the difference would still be about 4F.
All of the palaver about errors is simply bull palaver. Using homogenization and averaging a series of temperature readings is for people who haven’t had to work out in the weather, in the hot, in the cold and sometimes in the nice. Those who have know that weather is variable from season to season, year to year and decade to decade.

LdB
August 18, 2013 9:12 pm

Cagle
@LdB: I donā€™t know much about shooters and wind. However, a normal distribution of errors *for the monthly averages* is not a necessary assumption. The reason is simple, and has to do with why my experiment gives such consistent results.
But you are missing the point you are assuming a controlled enviroment and this is real world data you may be only getting consistancy based on a fallacy.
Lets extend the problem initially the shooters only shot on fine days because windy days makes there sights wobble so they avoid shooting on windy days. Later sights are improved and they shoot on windy days and sunny days suddenly your neat assumption goes to pieces.
Anthony’s own urban heat island argument is a classic in this sort of problem you need to understand the problem properly you can’t just assume you can average it away.
This is why particle physics, radio communications and telescope observatories study there backgrounds in detail they actually study it almost as much as they study the signals.
Some interesting stories on background noise controls recently and how they deal with them
Ethan Segal on why Earth telescopes fire lasers into space
http://scienceblogs.com/startswithabang/2013/07/24/why-observatories-shoot-lasers-at-the-universe/
Tommaso Doringo on which is more sensitive to the Higgs ATLAS or CMS
http://www.science20.com/quantum_diaries_survivor/atlas_vs_cms_higgs_results_which_experiment_has_more_sensitivity-113044

August 18, 2013 10:34 pm

Crispin in Waterloo says:
August 18, 2013 at 6:21 pm
“…That is an accuracy problem ā€“ the shooter should have compensated for a known issue ā€“ it should have shown up in calibration…”
Lots of dangerous assumptions there. They did not know, they Had belief in CRUT. Thus the assumption you made on finding the bullseye is faulty. And all attempts to find the removed bullseye fail due to the unhandled non ignorable systematic errors behind it. The more systematically bad CRUT models that were used, the worse your error become, since you assume their models were correct. If you had realised some models were wrong, and known which bulletholes came from which rifle, and adjusted for which models were the worst, you would be better of than assuming they all were the same. But you would still have to know the systematic error to get right.
The issue was not known at the time. All rifles Had come from one factory ā€œCRUTā€ (Center of Rifles Used for Testing), and it was believed all their models were correct. But all tests were done at the internal shooting range. Where the wind blow from up right, so all models at CRUT tended to shoot up to the right.
— Mats —

August 18, 2013 11:22 pm

Willis: On target!!
 
I know this has been discussed ad nauseam before, to my mind without viable solution;
And no, I am not requesting that the issue needs resolution. I’m just adding my nervousness to the overall worry about temperature and anomaly errors.
Every record or temperature starts as a temperature recorded during a 24 hour period called a day.
These 24 hour periods are problematic in themselves as ‘sunlight hours’ are retimed to better correlate with ‘business hours’ within localities. (I just plain take my watch off in Arizona and use the local clocks or I ask locals what time it is.)
 
Every one of these temperature recordings has it’s own error possibilities; only some of which are ever exactly recorded as meta-data.
 
Now I understand Willis, that for this thread you’re focusing on the monthly anomalies, but as other process analysts, e.g. Gail Combs, have mentioned errors are carried forward. Contributing errors towards a monthly anomaly error range are; station, instrument, individual and daily errors.
Adding to the error morass are the mass ‘adjustments’ made to temperature records, supposedly correcting for some of these sample errors. Every adjustment adds to the error measurement; unless each adjustment is verified and validated against a known error in a specific measurement.
 

“Nick Stokes says: August 17, 2013 at 2:52 pm
Willis,
…The error you are discussing wonā€™t affect trends. “

 
Trends? This hour? Todays? This week? This month? This year? Or that magic 30 year trend?
If an error is significant enough to affect the trend within a day, it affects the monthly and yearly trend. Raising the view of the trend to the highest levels do not eliminate nor correct errors, it only masks them.
Assuming that errors cancel out is an assumption requiring proof, for every instance. Dismissal of discussion fails to provide the proofs.
 
Or is the inference in this statement that errors do not matter unless the trend is affected?
Then why calculate an anomaly at all? Temperatures themselves will give similar trends, only there is no ‘average’ then, just the absolute of the temperature.
 
In a way, weather is presented that way already. e.g. Today’s high was 93°F (33.9°C). Our record high (97°F, 36°C) for this day was recorded in 1934. Frankly, I think this easily beats the alarmist statements that our ‘trend’ is xx°C higher and we are all going to die as it gets hotter. The latter statement informs me of nothing, while the former clearly tells me that today is within reason, normal. And that normal could be hotter, could be colder, could be, could be, could be; none of them disastrous.
 
Which leaves me with the suspicion that trends are the sneaky pie charts for climatology.
 

“@matsibengtsson
ā€œAll rifles tend to go a little high to the right. You do not have a better clue of where the bullseye where, as long as there is a systematic error.ā€

“That is an accuracy problem ā€“ the shooter should have compensated for a known issue ā€“ it should have shown up in calibration. That begs the question about who is doing the calibrations and how well are they doing it. Think about the usefulness of calibrating really well and getting a really precise measurement of a temperature that is strongly affected by UHI…”


 
From a shooter’s perspective, no.
 
Adjustment for shooting can be taken in two formats. Physical adjustment of the firearm, sights, optics, shooting stance, breathing, trigger squeeze… Or mental adjustment, as in noticing the range wind sock indicates the wind just shifted and ‘Kentucky windage’ in aiming is applied.
All of these ‘adjustments’ can be made ‘on the fly’ as so many do, or they can be noted in a journal for future reference in improving one’s ability to shoot well.
 
All of which ignores what ‘bench shooters’ do while shooting versus, say the Olympic competitions. The latter try and place their shots within the X ring (center, so to speak). The former avoid shooting out their aiming point, bullets in the X ring are immaterial; group size is everything. There are reasons why, but no reason to explain them here.
The point is there are different kinds of shooters, who shoot their targets entirely differently. What is a month’s worth of targets worth if different shooters are shooting, under different conditions on different days?
 
We’re back to that meta-data issue; without accurate detailed meta-data, the data itself is suspect. All adjustments, transforms, stats are just a form of mystic passes hoping that the end result is better than the beginning result.
 

“Tom in Florida says: August 18, 2013 at 5:57 am
This may not be relevant to this thread but I have always wondered why monthly temperature measurements are grouped by man made calendars. Wouldnā€™t it make more sense to compare daily temperatures over a period by using celestial starting and ending points so they are consistant over time?. The Earth is not at the same point relative to the Sun on January 1st every year, will this type of small adjustment make any difference? Perhaps full moon to full moon as a period to average?”

 
Time as mankind defines it into 24 hour periods, 365 days is a celestial calendar of sorts. Mankind marks their time using regular intervals now measured using a cesium fountain clock, e.g. NIST-F1. That doesn’t mean everything keeps the same measurements.
 
Your question echoes some other issues I have with anomalies. Climate, nor weather is influenced by mankind’s calendar. Climate is affected by earth’s conditions while whirling around our sun.
Exactly why are summer’s hottest days or winter’s coldest days after the solstices? Well, the current theory is about the local presence of water.
 
What is missing from the whole daily/weekly/monthly temperature anomaly scenario is any allowance for climate as it relates to seasonal progression. If this year’s winter ended six weeks early, doesn’t that send a spike into six weeks of temperatures?
 
A rogue six weeks of hot spring weather is not the question though. The question is, given what little we know about climate; how do we identify, quantify and account for all cycles in climate?
 
Truth is, we can’t yet. Our measurements are minimal and crude, our records maintenance brutal and harsh, our information demands outrageous. We have distinct issues with identifying short weather cycles let alone understanding short climate cycles.
 
Yup, mystic passes.
Keep working the issues Willis!
PS It is also great to see so many gifted math folks threshing this issue out! Yes, this also means Steve Mosher and Nick Stokes. Terrific discussion!

August 18, 2013 11:31 pm

As long as “temperatures” and their “anomalies” continue to be used to demonstrate (or not) “global warming”, our knickers will remain in the proverbial twist on this whole subject. Heat energy can be measured directly by satellites, so why persevere with temperatures, a man-made proxy, devised to determine how relatively hot or cold it is?

richardscourtney
August 19, 2013 12:30 am

Crispin in Waterloo:
At August 18, 2013 at 6:21 pm you say

That begs the question about who is doing the calibrations and how well are they doing it.

YES! Nobody isā€ doing the calibrationsā€ because no calibration is possible.
Please see my post at August 18, 2013 at 7:32 am. To save you needing to find it, I copy it here.

Willis:
Your observation is good.
However, there is a more basic problem; viz.
there is no possible calibration for global temperature because the metric is meaningless and not defined.
This problem is not overcome by use of anomalies. I explain this as follows.
Each team preparing a global temperature time series uses a different method (i.e. different selection of measurement sites, different weightings to measurements, different interpolations between measurement sites, etc.). And each team often alters the method it uses such that past data is changed;
see e.g. http://jonova.s3.amazonaws.com/graphs/giss/hansen-giss-1940-1980.gif
Hence, each determination of global temperature has no defined meaning: it is literally meaningless. And an anomaly obtained from a meaningless metric is meaningless.
If global temperature were defined then a determination of it would have a meaning which could be assessed if it could be compared to a calibration standard. But global temperature is not a defined metric and so has no possible calibration standard.
A meaningless metric is meaningless, the errors of an undefined metric cannot determined with known accuracy, and the errors of an uncalibrated measurement cannot be known.
The errors of a measurement are meaningless and undefinable when they are obtained for a meaningless, undefined metric with no possibility of calibration.
Richard

This thread is equivalent to discussion of the possible errors in claimed measurements of the length of Santaā€™s sleigh.
A rational discussion would be of why determinations of global temperature are not possible and ā€˜errorsā€™ of such determinations are misleading.
Richard

LdB
August 19, 2013 1:04 am

@richardscourtney says:
August 19, 2013 at 12:30 am
YES! Nobody isā€ doing the calibrationsā€ because no calibration is possible.
No Richard what you mean to say is you don’t know how to calibrate it because climate scientist seem to understand statistics and that technique is useless here you can’t filter your way to a result it doesn’t work in any other science either. There are ways to do it and other sciences do it often and on much more complicated and more technically challenging systems than climate.
@richardscourtney says:
August 19, 2013 at 12:30 am
This thread is equivalent to discussion of the possible errors in claimed measurements of the length of Santaā€™s sleigh.
That is true so if climate scientists stop trying to claim a measurement of Santa’s sleight to some unrealistic accuracy no one would have a problem. If what you claim is true the signal will grow to an obviously measurable problem in time … see there is an easy answer if you don’t want to do the work …. oh wait there is a political agenda here I forgot.
So back to the drawing board if you want to claim the length of Santa’s sleigh to some incredible accuracy (AKA the climate signal) then do your homework on the background signals and noise and eliminate them like every other non political science does and stop whining about it.

richardscourtney
August 19, 2013 1:19 am

LdB:
Thankyou for your reply at August 19, 2013 at 1:04 am
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394337
to my post at August 19, 2013 at 12:30 am
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394322
Unfortunately, you miss the point and (offensively) ‘put words in my mouth’ to state what you would have preferred me to have said. Then you discuss your ‘red herring’.
I stated my point as being

Each team preparing a global temperature time series uses a different method (i.e. different selection of measurement sites, different weightings to measurements, different interpolations between measurement sites, etc.). And each team often alters the method it uses such that past data is changed;
see e.g. http://jonova.s3.amazonaws.com/graphs/giss/hansen-giss-1940-1980.gif
Hence, each determination of global temperature has no defined meaning: it is literally meaningless. And an anomaly obtained from a meaningless metric is meaningless.

You say you want to determine the number of ‘angels on a a pin’, and I am saying there is no ‘pin’.
To prove me wrong you only need to state an agreed definition of ‘global temperature’ which does not alter from month-to-month (i.e. show me that the ‘pin’ exists).
A rational discussion of the error in the estimated value (i.e. the number of ‘angels’) is not possible until that agreed definition is stated.

Please comment on what I said and NOT what you wish I had said.
Richard

LdB
August 19, 2013 1:58 am

@richardscourtney says:
August 19, 2013 at 1:19 am
I see so your argument above there is no global temperature because such a thing does not exist or are you trying to say we are saying that such a thing does not exist?
I certainly think such a thing exists and it is no different to “rest mass” in science, we don’t know exactly what causes gravity but we can and certainly do parameterize rest mass to other well known and well understood variables.
So that argument itself is total rubbish just do the same trick parameterize the term “global temperature” to the other well understood variables rather than insisting on a specific absolute value.
Again do you think climate science is special and these sorts of things haven’t been seen before most of the world of particle physics is reparametrisations.

richardscourtney
August 19, 2013 2:21 am

LdB:
I concluded my post addressed to you at August 19, 2013 at 1:19 am
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394340
saying

Please comment on what I said and NOT what you wish I had said.

Your reply at August 19, 2013 at 1:58 am
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394353
DOES IT AGAIN!
It begins saying

I see so your argument above there is no global temperature because such a thing does not exist or are you trying to say we are saying that such a thing does not exist?

NO!
I am saying – and have repeatedly explained – that
1.
there is no agreed definition of global temperature
2.
each team that provides global temperature data uses a different definition
3.
each team that provides global temperature data often changes the definition it uses.
So, there is no DEFINED metric of global temperature.
Therefore, the metric(s) said to be global temperature are meaningless: they may be different next month and – history shows – they probably will be.
There can be NO meaningful determination of the error in a datum, and there cannot be a calibration standard, for a metric which does not have an agreed definition.
This is true whether or not ‘global temperature’ exists in the real world.
I am reaching the conclusion that your repeated misrepresentations and refusal to address the issue I am presenting are examples of you being deliberately obtuse.
Richard

August 19, 2013 3:24 am

@LdB Why don’t you answer Richard’s challenge and provide a reference to the standard definition of Earth’s temperature? It’s not enough to merely assert that there is one, or that one might be possible. You have made some interesting comments in this thread, but I’m with Richard on this until the angels and the pinhead are defined. As I understand this, there have been many statements of what Earth’s temperature is over the years with no two experts stating the same number. It’s also notable that the magnitude of this stated temperature has declined, rather than increased.

johnmarshall
August 19, 2013 3:32 am

Averaging a time series is bad statistical practice. What the anomaly is the average of an average of a time series. A double whammy.
It is impossible to get the true temperature of any object until that object is at thermodynamic equilibrium. The earth never is at this impossible state.

richardscourtney
August 19, 2013 4:55 am

The Pompous Git:
My post at August 19, 2013 at 1:19 am said to LdB

To prove me wrong you only need to state an agreed definition of ā€˜global temperatureā€™ which does not alter from month-to-month (i.e. show me that the ā€˜pinā€™ exists).

And your post at August 19, 2013 at 3:24 am asked LdB

Why donā€™t you answer Richardā€™s challenge and provide a reference to the standard definition of Earthā€™s temperature?

It is not possible for him to provide such a reference because as UCAR says
https://www2.ucar.edu/climate/faq/what-average-global-temperature-now

Since there is no universally accepted definition for Earthā€™s average temperature, several different groups around the world use slightly different methods for tracking the global average over time, including:
NASA Goddard Institute for Space Studies
NOAA National Climatic Data Center
UK Met Office Hadley Centre

{emphasis added: RSC}
Were LdB to state what he thinks is a “universally accepted definition for Earthā€™s average temperature” his statement would be wrong because at most his definition would only be accepted by one of NCAR, NASA and NOAA. And that definition would be transient because NCAR, NASA and NOAA each often changes the definition it uses.
My repeatedly state point is that
There can be NO meaningful determination of the error in a datum, and there cannot be a calibration standard, for a metric which does not have an agreed definition.
My point is true for every metric including global temperature.
Richard

LdB
August 19, 2013 5:15 am

richardscourtney says:
August 19, 2013 at 4:55 am
No wonder climate science is in a mess if they think like you and I may sound like a Pompous Git but perhaps just stop and listen.
You are talking about ENERGY not some nebulous concept … ENERGY is invariant it can not be created and destroyed and the particular question you are asking is the energy on the earth increasing or decreasing because of the addition of CO2 into the atmosphere.
I don’t care how you define your point to measure ENERGY the datum exists because ENERGY is defined clearly and precisely.
Your problem in climate science is people want to choose a different condition you are calling the metric of global temperature.
The answer is simple make them define that particular definition (we call it a frame of reference) to that of the global energy balance after all that is the question you are seeking.
Look at invariant mass and guess how we make them define it
http://en.wikipedia.org/wiki/Invariant_mass
See what we make scientists do:
The invariant mass, rest mass, intrinsic mass, proper mass, or (in the case of bound systems or objects observed in their center of momentum frame) simply mass, is a characteristic of the total energy and momentum of an object or a system of objects that is the same in all frames of reference related by Lorentz transformations.
Pick whatever frame of reference you like make up your own definitions, make up your own units do what you like because you MUST tie it to the total energy balance of the earth there by definition has to be a transform between any two frames of reference.
If you make every climate science group as part of there choosing whatever reference point they like to measure tie it to global energy balance you can create a translation of results GUARANTEED because energy can not be created or destroyed.
If the scientists in climate science haven’t worked that out by now they have serious issues and need to get some real physicists involved ENERGY is not a toy you can redefine.

LdB
August 19, 2013 5:19 am

The Pompous Git says:
August 19, 2013 at 3:24 am
@LdB Why donā€™t you answer Richardā€™s challenge and provide a reference to the standard definition of Earthā€™s temperature?
Pick whatever definition you like I don’t care provide me the relationship to global energy balance and I can convert any two definitions between each other … problem solved it’s not hard.

richardscourtney
August 19, 2013 5:34 am

LdB:
Your post addressed to me at August 19, 2013 at 5:15 am is yet another in your series of evasions and ‘red herrings’.
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394432
It makes no reference – and has no relation – to anything I have written in this thread (but adds another of your ad homs).
It makes no mention of temperature, global temperature, measurement theory or error estimation.
It says

You are talking about ENERGY not some nebulous concept ā€¦ ENERGY is invariant it can not be created and destroyed and the particular question you are asking is the energy on the earth increasing or decreasing because of the addition of CO2 into the atmosphere.

BOLLOCKS!
We are talking about temperature and NOT energy.

To be specific, we are talking about the possibility of error estimation in global temperature determinations. And that is what I have been discussing: I have made no mention of “energy” (it is an irrelevance) and your assertion that I have is a falsehood.
Temperature is NOT energy.
Your post seems to be an indication of desperation because I refuse to accept that you are so ignorant that you don’t know temperature is not energy.
Are you involved in compiling a global temperature data set and, if so, are you obtaining income from it?
Richard

richardscourtney
August 19, 2013 5:43 am

LdB:
Your post at August 19, 2013 at 5:19 am
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394434
addressed to The Pompous Git is yet another of your ‘red herrings’ and misrepresentations.
It says

Pick whatever definition you like I donā€™t care provide me the relationship to global energy balance and I can convert any two definitions between each other ā€¦ problem solved itā€™s not hard.

The required definition was of “global temperature”.
It was NOT about “global energy balance”.
And nobody can “convert” between “global temperature” and “global energy balance” because they can vary independently (e.g. because of ocean currents, spatial variation in surface temperatures, variations in ice formation or melting, etc.).
Richard

LdB
August 19, 2013 6:08 am

richardscourtney says:
August 19, 2013 at 5:34 am
BOLLOCKS!
We are talking about temperature and NOT energy.
Temperature is NOT energy
OMG Richard please don’t say another word
Here is google for you:
http://en.wikipedia.org/wiki/Temperature
Please read it before you make any more statements I am not trying to pick on you simply explaining and you obviously need some things explained.

LdB
August 19, 2013 6:43 am

Richard I found a reasonable link for you I am not being smart just making sure you follow how it works.
http://www.ohio.edu/mechanical/thermo/Intro/Chapt.1_6/Chapter3a.html
Chapter 3: The First Law of Thermodynamics for Closed Systems

Stephen Wilde
August 19, 2013 6:47 am

LdB said:
“ENERGY is invariant it can not be created and destroyed and the particular question you are asking is the energy on the earth increasing or decreasing because of the addition of CO2 into the atmosphere.”
More energy need not affect temperature.
Potential Energy is not registered as heat by thermometers.
If a gas molecule rises more of its energy converts from KE to PE and it cools.
Any molecule that absorbs energy so as to become warmer than its surroundings then becomes part of an expanded, less dense and lighter air parcel which rises against gravity until it cools once more so the net effect on temperature of radiative characteristics is zero but PE increases instead.
No need for a rise in surface temperature to push the atmosphere higher. One only needs the presence above the surface of molecules capable of carrying more energy in PE form.

richardscourtney
August 19, 2013 7:15 am

LdB:
It seems I owe you an apology. In my post to you at August 19, 2013 at 5:34 am
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394449
I wrote

I refuse to accept that you are so ignorant that you donā€™t know temperature is not energy.

Clearly, I was wrong to imply that you are being disingenuous because your posts at August 19, 2013 at 6:08 am and August 19, 2013 at 6:43 am
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394467
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394488
proclaim that you ARE so ignorant that you donā€™t know temperature is not energy.
I apologise for suggesting you were being disingenuous when you were merely demonstrating your ignorance of elementary physics.
Please read the comment of Stephen Wilde at August 19, 2013 at 6:47 am
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394490
It may initiate your education in elementary physics.
And, concerning your mention of thermodynamics, I cannot help you to understand thermodynamics until you gain sufficient knowledge of elementary physics for you to understand why temperature is not energy.
Richard

LdB
August 19, 2013 7:16 am

Stephen Wilde says:
August 19, 2013 at 6:47 am
Potential Energy is not registered as heat by thermometers.
If a gas molecule rises more of its energy converts from KE to PE and it cools.
You are dealing with earths energy balance Stephen cycling energy like that is irrelevant as you already worked out the energy is the same so it doesn’t matter.
PE or KE is irrelevant energy is energy if earth is retaining it is a problem because it can easily become temperature as you have already shown above.
The energy can’t hide that is why you make every group reconcile the full energy balance.
What happens at the moment yes like you are trying to do here you can slide the energy into somewhere you aren’t measuring your system is open … you have to close the system and everyone must close there system you enforce it .. no closure = no publish.
What then usual happens is several groups will pick certain frames of reference and others will do just small bits an tie back to these group positions rather than having to do a full balance themselves.
So yes you can change the energy from KE to PE but it matters not it will get accounted.
No more playing hide the heat or energy anymore šŸ™‚

Joseph Murphy
August 19, 2013 7:17 am

LdB says:
August 19, 2013 at 6:43 am
Richard I found a reasonable link for you I am not being smart just making sure you follow how it works.
http://www.ohio.edu/mechanical/thermo/Intro/Chapt.1_6/Chapter3a.html
Chapter 3: The First Law of Thermodynamics for Closed Systems
>>>>>>>>>>>>>>>>>>>>
I need to learn how to quote. But anyways, energy = temp – work, if I am not mistaken so no energy and temp are not the same thing. Honest question, why would we consider the earth a closed system?

Jeff Cagle
August 19, 2013 7:18 am

@LdB: You wrote,
But you are missing the point you are assuming a controlled enviroment and this is real world data you may be only getting consistancy based on a fallacy.
I think you’re misunderstanding the purpose of the experiment. I am not trying to use a linear model to predict trends in monthly anomalies. That would be futile: the r^2 value is 3%, which is ludicrous.
All I’m doing is to try to answer the question that Willis posed: Does an error in the monthly averages have an effect on secular trends?
The answer is, No.
Then you wrote, Lets extend the problem initially the shooters only shot on fine days because windy days makes there sights wobble so they avoid shooting on windy days. Later sights are improved and they shoot on windy days and sunny days suddenly your neat assumption goes to pieces.
Anthonyā€™s own urban heat island argument is a classic in this sort of problem you need to understand the problem properly you canā€™t just assume you can average it away.

In other words, you suggest that improvements in thermometers and UHI are two potentially confounding variables that affect trends? I agree. And if I were trying to predict trends, your objection would need to be fully accounted for.
However, I cannot think of a reason for either of those variables to alter the answer to Willis’ question. Can you? If you can, then construct a multivariate experiment with all three variables, and show the effect when you control for each.
It will significantly advance the discussion if you can quantify your objections rather than arguing by analogy. As it stands, your shooter analogy does not match the state of the data very well. Instead of a marksman (tight variance on shots) we have a novice (only hits the target — anywhere! — about 68% of the time, corresponding to a temp variance of 2.86 deg C). Instead of a wind (random large variance), we have a kind of “target-gremlin” (small variance with a pattern repeating every twelve shots, corresponding to an error in monthly averages of 0.1 deg C). Arguments by analogy are potentially misleading or confusing; straight math would be better.
So for example, if you think my estimate of 0.1C is too small, tell me why.

Jeff Cagle
August 19, 2013 7:21 am

(Sorry if this is a duplicate — WordPress is acting weirdly)
@LdB: You wrote,
But you are missing the point you are assuming a controlled enviroment and this is real world data you may be only getting consistancy based on a fallacy.
I think you’re misunderstanding the purpose of the experiment. I am not trying to use a linear model to predict trends in monthly anomalies. That would be futile: the r^2 value is 3%, which is ludicrous.
All I’m doing is to try to answer the question that Willis posed: Does an error in the monthly averages have an effect on secular trends?
The answer is, No.
Then you wrote, Lets extend the problem initially the shooters only shot on fine days because windy days makes there sights wobble so they avoid shooting on windy days. Later sights are improved and they shoot on windy days and sunny days suddenly your neat assumption goes to pieces.
Anthonyā€™s own urban heat island argument is a classic in this sort of problem you need to understand the problem properly you canā€™t just assume you can average it away.

In other words, you suggest that improvements in thermometers and UHI are two potentially confounding variables that affect trends? I agree. And if I were trying to predict trends, your objection would need to be fully accounted for.
However, I cannot think of a reason for either of those variables to alter the answer to Willis’ question. Can you? If you can, then construct a multivariate experiment with all three variables, and show the effect when you control for each.
It will significantly advance the discussion if you can quantify your objections rather than arguing by analogy. As it stands, your shooter analogy does not match the state of the data very well. Instead of a marksman (tight variance on shots) we have a novice (only hits the target — anywhere! — about 68% of the time, corresponding to a temp variance of 2.86 deg C). Instead of a wind (random large variance), we have a kind of “target-gremlin” (small variance with a pattern repeating every twelve shots, corresponding to an error in monthly averages of 0.1 deg C). Arguments by analogy are potentially misleading or confusing; straight math would be better.
So for example, if you think my estimate of 0.1C is too small, tell me why.

August 19, 2013 7:27 am

Man Bearpig says:
August 18, 2013 at 1:50 am
If each year the height of each schoolchild on their 3rd birthday is taken at each school in a county and the average is calculated (which would be around 37 inches) .. This would include those that were say 33 inches to 42 inches which would (probably) be in the normal deviation ā€¦ However, when the averages are sent to a central state center and the standard deviation is calculated on the averages, then the distribution will be much MUCH smaller and those at the extremes of the distribution would be considered abnormally short or tall.

You are wrong Man Berarpig, because you mix two notions, one is “Standard deviation” and the other is “Standard error of the mean”. The first one will not be smaller for the central state statistics than for the school. However, the standard error of the mean will decrease toward zero as the sample size increases. Standard deviation is a measure of the spread and Standard error of the mean is a measure of the accuracy of the mean in I finite sample size.

richardscourtney
August 19, 2013 7:45 am

LdB:
At August 19, 2013 at 7:16
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394500
you write

So yes you can change the energy from KE to PE but it matters not it will get accounted.

NO! Not by a temperature measurement it won’t.
Since you failed to understand the point made by Stephen Wilde, I offer you another example.
Consider ice cubes melting in a glass of water. The temperature in the glass is 0 deg.C. Add heat and the temperature stays at 0 deg.C bust some of the ice melts. The heat in the glass has changed but the temperature has not.
Ice melts and freezes at places over the surface of the Earth, too.
This thread is about assessment of errors in determinations of global temperature.
It is NOT about heat.
And I now understand why you could not comprehend my statement

There can be NO meaningful determination of the error in a datum, and there cannot be a calibration standard, for a metric which does not have an agreed definition.
This is true whether or not ā€˜global temperatureā€™ exists in the real world.

Your subsequent posts demonstrate that you don’t comprehend my statement because you need to buy a clue concerning the subject under discussion.
I offer you some kindly and sincere advice: remember the First Rule Of Holes and stop digging.
Richard

LdB
August 19, 2013 7:49 am

Joseph Murphy says:
August 19, 2013 at 7:17 am
I need to learn how to quote. But anyways, energy = temp ā€“ work, if I am not mistaken so no energy and temp are not the same thing. Honest question, why would we consider the earth a closed system
Sure so you just subtracting two things that are not the same and you got something else so an apple – orange = banana šŸ™‚ They are all the same thing or you couldn’t do what you just did.
Put “change of” in front of energy in your equation and you are right and yes work is also energy šŸ™‚
Earth isn’t closed however earths energy balance IS CLOSED BY DEFINITION
Earth’s Energy Balance extended definition:
Earth’s Energy balance describes how the incoming energy from the sun is used and returned to space. If incoming and outgoing energy are in balance, the earth’s temperature remains constant.

richardscourtney
August 19, 2013 8:00 am

LdB:
I saee you have rejected my advice and have continued to ‘dig’ by posting your comment at August 19, 2013 at 7:49 am
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394521
which says

If incoming and outgoing energy are in balance, the earthā€™s temperature remains constant.

NO!
If incoming and outgoing energy are in balance, the earthā€™s EFFECTIVE RADIATIVE TEMPERATURE remains constant.
Unless, of course, you want to claim that either
(a) global temperature is the same in glacial and interglacial periods
or
(b) the Sun is a unique g-type star because it is variable.
LdB, you have added another meter to the depth of your hole.
Richard

LdB
August 19, 2013 8:10 am

@Richardscourtney says:
August 19, 2013 at 7:45 am
NO! Not by a temperature measurement it wonā€™t.
Since you failed to understand the point made by Stephen Wilde, I offer you another example.
No you failed to understand my answer .. I understand what you think you are doing.
Temperature = energy … it is actually roughly the Kinetic energy of the molecules its a slight approximation but we can ignore for now.
PE = energy … as Stephen says.
You can argue whether PE is climate change or not that’s up to each groups view. So in Stephens example the excess energy is going into PE but at least we know that and account for it because he has to balance the full budget.
The likelyhood of that energy coming back out to haunt you the groups can argue about that but you aren’t arguing with “hiding energy” everything is out on the table.
You also get large amounts of energy disappear into chemical processes predominately photosynthesis by plants and that can come back to haunt you in the same way.
Now some groups may add those different energies into climate change some may not at least everyone has the same total energy and there is no missing energy.
Wherever you want to put the energy is up to each groups framework but anyone can easily follow what energy is where and that is how you start to build reference frames.

Pamela Gray
August 19, 2013 8:14 am

There is no way to separate out a natural climate signal from and anthropogenic signal because the two signals act exactly like each other. Natural climate is based on a statistical calculation from weather data, and so is an anthropogenic climate. They have the same up and down character and trends. You cannot filter one from the other and then say “this one” is anthropogenic and “that one” is natural. In human brains you cannot detect the auditory brainstem brainwave signal just by filtering through all the brainwave signals. You have to tickle the auditory nerve to fire on and off at regular intervals in order to then filter it out of background brainwaves. Same is true of weather pattern variations. You cannot simply take a stream of weather data and filter out such a tiny component unless you can cause some component of weather to fire at regular intervals. We would have to set up a large CO2 pump that we fire up at regular intervals into the background noise of weather (and do it thousands of times), and then remove the pumped in CO2 just as rapidly, to see if there is a tiny regular signal buried in the background.
But who cares. We haven’t thoroughly studied the weather signal and its weather pattern variations yet. HUGE assumptions are still made about weather pattern variations (erroneously referred to as “noise”). Which is why I propose several different ways of averaging, combining, and graphing the temperature data, aka weather pattern variations. A global number is such a low hanging fruit statistic for such a complicated thing. The data and graphs we have for ocean and atmospheric oscillations are WAY ahead in that department.

LdB
August 19, 2013 8:25 am

richardscourtney says:
August 19, 2013 at 8:00 am
NO!
If incoming and outgoing energy are in balance, the earthā€™s EFFECTIVE RADIATIVE TEMPERATURE remains constant.
Woot you got that bit that …. we can do a full closure if you simply now measure the suns output because that varies.
So now we drill each of the earth energies apart.
Full balance …. so you are monitoring energy in temperature change
Full balance ….. so you would be monitoring Stephens PE
Full balance ….. so you would be monitoring chemical uptakes ocean/plant etc
Full balance ….. so you monitor any thermal energy coming out of earth
I am not a climate scientist there are probably a lot more like winds etc
Different groups may argue which things are climate change some may want to stick Stephens PE in some wont they argue they are only interested in temperature.
It matters not you can easily work the views between those two reference frames.
So we have group A who says temperature is the only Climate change. Group B says no climate change is temperature and PE gain.
Group B publishes a report saying PE is increasing we have terrible climate change. Group A says no look the temperature is constant there is no climate change. See you can reconcile the two views and eventually groups will start talking about energy which is the reality of what you are trying to do.

E.M.Smith
Editor
August 19, 2013 9:30 am

The basic problem is that the warmistas use averaging as a panacea for errors. Averaging can only remove RANDOM error, not systematic error. Most of the error is systematic. Bad siting. Bad adjustment methods. Etc. You can sink months into trying to get them to understand that and get nowhere. I learned it in high school chemistry, but they didn’t, it would seem.
There is at least one degree F error bar being ignored.
The reality is that we can’t get 1/2 C warming measured. Period.
The other polite lie is that climate is a 30 year average of weather. It isn’t. There are 60 year weather cycles, so a 30 year average as “climatology” is just a very wide systematic error.

richardscourtney
August 19, 2013 9:39 am

LdB:
I am replying to your post at August 19, 2013 at 8:25 am
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394540
Firstly, I remind that this thread is about assessment of measurement errors in determinations of global temperature. It is NOT a forum intended for you to demonstrate your ignorance.
Your post makes no attempt to discuss my point which demonstrated you don’t understand the difference between the Earth’s global temperature and the Earth’s effective radiative temperature. Instead, it digs your hole deeper by another kilometer.
However, your post does have a slight relevance to this thread – or to be precise – refutation of it has a relevance.
You are asserting the the Earth’s thermal balance determines the Earth’s global temperature. That would only be true if the Earth had achieved thermal equilibrium, and the Earth NEVER achieves thermal equilibrium.
The pertinence of your claim to this thread is as follows.
1.
There has been a stasis in the Earth’s global temperature for more than 16 years.
2.
Climastrologists had predicted the Earth’s global temperature would rise over the period.
3.
Some climastrologists claim this failure of their prediction is because heat has gone into the deep ocean.
4.
The claim is improbable because there is no indication of the ‘missing heat’ in the oceans and no indication of how it got to deep ocean, but the claim is not impossible because there are possible ways the heat may have got there.
5.
The possibility of this claim alone demonstrates that your assertion is wrong over the time scales being assessed by determinations of global temperature: i.e. the thermal balance of the Earth does not determine global temperature over time scales less than centuries.
Hence, your assertion is (just) relevant to this thread because refutation of it (here listed as points 1 to 5) shows the importance of a clearly defined global temperature metric with sufficient accuracy to assess the change in global temperature over the time of the present century.
Richard

Pamela Gray
August 19, 2013 9:48 am

If we had regional 3-month moving averages of the kind we get with the various ENSO sections of the Pacific, we might have a much better understanding of “cycles”. That said, I hate the word “cycles”. I prefer weather pattern variations. It removes the connotation that weather cycles – or oscillations – can be mathematically cancelled out. We know from Bob’s work that La Nina does not cancel out El Nino effects. Or visa versa. Weather pattern variations do not cancel each other out either.
What we need is -again- a set of 3-month running averages on a regional basis (and I would use the broad regions in the US as defined under the ENSO temperature and precipitation boundaries arrived at through statistical means) to better understand these patterns. Basically, whatever is used should be the same type of averaged data sets used to report oceanic and atmospheric data.

richardscourtney
August 19, 2013 10:07 am

Gail Combs:
Thankyou very much for your comment at August 19, 2013 at 9:50 am which draws attention to your earlier comment at August 18, 2013 at 2:23 am
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1393740
I, too, write to draw attention to your earlier comment which I have linked in this post.
It is directly pertinent to my point repeatedly made in this thread; viz.
There can be NO meaningful determination of the error in a datum, and there cannot be a calibration standard, for a metric which does not have an agreed definition.
My point is true for every metric including global temperature which has no agreed definition.
Richard

Gail Combs
August 19, 2013 10:34 am

Richard,
I was trying to make the point that the word ‘ERROR’ has been redefined to mean something entirely different than what it means to a Quality Engineer. Determining and minimizing systematic ‘Error’ is a major job for QC Engineers so using ‘Nugget’ in a computer model to redefine ‘Error’ goes against the grain.
Here is the NEW definition again.

..The nugget effect can be attributed to measurement errors or spatial sources of variation at distances smaller than the sampling interval or both. Measurement error occurs because of the error inherent in measuring devicesā€¦
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//0031000000mq000000

No need to get off your duff and do the hard and frustrating work of chasing down systematic errors, just use a computer model for infilling data never measured and the ‘nugget’ for ‘estimating error’

August 19, 2013 10:47 am

Jeff, your critical position was that I had mistaken weather noise for error and that I had represented state uncertainty as an error. You publicly abandoned the first position as untenable after finally looking at the Figures and Tables in my paper, so I fail to see how your “critical position hasnā€™t changed.”

August 19, 2013 10:51 am

right on, Gail. There’s nothing like actually struggling with an instrument, to give one a handle on the intractability of systematic error. The only way to deal with it is to find and eliminate the sources and/or to calibrate its impact against known standards.

richardscourtney
August 19, 2013 10:53 am

Gail Combs:
Thankyou for your post at August 19, 2013 at 10:34 am which says to me

I was trying to make the point that the word ā€˜ERRORā€™ has been redefined to mean something entirely different than what it means to a Quality Engineer.

Yes, I understood that and the definition used by “a Quality Engineer” was the definition used by scientists. As I said, I think your point and explanation are very important which is why I wrote to draw attention to it.
Indeed, although I am writing to acknowledge your post to me, I think your point to be so important that I will use this post as an excuse to again post a link to your point for the benefit of any who have not yet read it.
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1393740
Again, thankyou.
Richard

prjindigo
August 19, 2013 10:57 am

So what I’m getting from all this long long discussion is that the predictions of yearly temperature increases fall well within the first sigma of the ERROR of the maths being used.
A farmer once told me math couldn’t be used to make labor easier “a squared up pile of bullshit may look neater but you still have to shovel it someplace”

Stephen Wilde
August 19, 2013 10:59 am

LdB.
The conversion of energy to and fro between KE and PE is a critical issue because in each case that conversion is a negative system response to the radiative characteristics of GHGs or any other forcing element other than more mass, more gravity or more insolation.
If a molecule absorbs more energy than those around it it will rise until the excess KE is cancelled by the conversion to PE.
If a molecule absorbs less energy than those around it it will fall until the KE deficit is restored by a conversion from PE.
The process of up and down convection automatically uses density differentials to sort molecules of differing thermal characteristics so that they alter position within the gravitational field which cancels the thermal effect by juggling between KE and PE.
So the atmosphere freely expands and contracts according to compositional variations but it all happens within the vertical column with no effect on surface temperature except perhaps a a regional or local redistribution of surface energy by a change in the global air circulation.
We see such redistributions of surface energy in the form of shifting climate zones and jet stream tracks but all the evidence is that sun and oceans cause similar effects that are magnitudes greater than the effect from changing GHG quantities.

E.M.Smith
Editor
August 19, 2013 12:19 pm

@Gail:
Oh dear. A whole ‘nother layer of jiggery pokery…
@Stephen Wilde:
At the point where he was equating temperature (KE by definition) with total energy, I wrote him off. Wasting your breath….

Gail Combs
August 19, 2013 12:26 pm

Richard and Pat,
I am not a statistician and I am ‘Computer Challenged’ Though I have been on teams who used computers to do design experiments and I ran computer assisted lab equipment.
What I am seeing is two sets of people. One set is engineers/applied scientists and the other set are theoretical and computer types.
To my mind the big question is: Is a temperature data point a single sample or can you, with statistical validity, group it with the temperature in the next city or hour or day?
The second question is IF you do such groupings what does that do to the error?
Here are three cities close together in NC (within 100 miles) with todays temp at 3:00 PM
Mebane NC (76.1 Ā°F) with Lat: 36.1Ā° N
Durham NC (73.6 Ā°F) with Lat: 36.0Ā° N
Raleigh NC (72.2 Ā°F) with Lat: 35.8Ā° N
We know anomalies are used to deal with such differences so lets look at yesterdays high, low, mean
This is what Weather Underground gives me for historical data for those cities (Not exactly what I actually wanted)
Burlington, NC ( 75.0 Ā°FĀ°F) ( 66.9 Ā°F) (70 Ā°F)
Raleigh-Durham Airport NC (80.1 Ā°F) (68.0 Ā°F) (74 Ā°F)
Raleigh-Durham Airport NC (80.1 Ā°F) (68.0 Ā°F) (74 Ā°F)
However note that today’s Mebane/Burlington is ~ 3Ā°F warmer while yesterday it was ~ 5Ā°F cooler. The computer and theoretical types will try to tell us they can perform magic and make the numbers the same. The engineers/applied scientists will start looking for WHY the numbers are different.
So back to the questions.
Is grouping these data points statistically valid and if you do group them does not the ERROR increase instead of decreasing?

richardscourtney
August 19, 2013 1:03 pm

Gail Combs:
Your post addressed to me and Pat at August 19, 2013 at 12:26 pm
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394718
asks

Is grouping these data points statistically valid and if you do group them does not the ERROR increase instead of decreasing?

The grouping(s) may or may not be valid and the grouping(s) may increase or decrease the error depending on why you grouped them and how you grouped them.
For example, each datum you have presented provides temperature with known accuracy and precision for one place and time. If that information is what you wanted then it would be wrong to ‘group’ the datum with any of the others: that would provide a metric with less accuracy, less precision and greater error.
But you may want some average temperature for the region over the two days. The mean, median and mode are each a valid average. Do you want the mean, median or mode of the grouped data? Or do you want some other average (e.g. a weighted mean)? There are an infinite number of possible averages.
The mean, median and mode are each a valid average.
Which – if any – of these averages is an appropriate indicator of whatever you wanted an average to indicate? Choose the wrong one and you get an increase in error because the wrong definition of ‘average’ was applied. Choose the right one and you get an ‘average’ which is a better indicator of what you wanted than any one of the measurements: it has more accuracy, more precision, and less error.
Hence, as I keep saying:
There can be NO meaningful determination of the error in a datum, and there cannot be a calibration standard, for a metric which does not have an agreed definition.
This point is true for every metric including global temperature which has no agreed definition.
Richard

Jeff Cagle
August 19, 2013 2:08 pm

Gail,
Your question goes to the project I’m working on, which is a look at the ARGO data.
The temperature of an object is supposed to be a measure of the average — there’s the “a” word — kinetic energy of the particles in that object.
For a homogeneous object (such as a well-mixed beaker) this is trivial: stick a thermometer in. Repeat if needed. Average.
For a heterogeneous object (such as the atmosphere) this is not trivial. The thermometer reading at each point gives a local measure of the average kinetic energy of the particles in the vicinity of that thermometer. So averaging temperatures in the numeric sense is really an incorrect procedure. Temperatures do not add to give a physically meaningful quantity. Instead, what we really want to do is to add up the total heat represented by each of the measurements, and present that as “total heat content.” I believe this is the approach taken by BEST (“Berkeley Earth Temperature Averaging Process”, p. 2)
The total heat content divided by an “average heat capacity” would then give an average temperature (actually, BEST divides by area, which is different than I would do it. But they have a lot more letters behind their names, so I don’t say this to criticize).
But now: how do deal with error in measurements? One approach — the one my project takes — is to divide up the atmosphere or ocean into grids, assume as a simplification that the grid elements are each homogeneous, and then use the measurements within the grids as *samples* of the temperature of each grid. Compute heat content for each sample, then average. The smaller the grid elements, the more accurate this procedure will be.
This gives a kind of — *kind of* — Monte Carlo estimate of the total heat content.
The real issue, and the one my project focuses on, is computing variances for this quantity.
This is a long-winded way to answer your question. If you group temps over a larger area, you get more samples — your error of the mean goes down. But if you group temps over a larger area, you get more error in the total heat estimate for that cell. It becomes less true that each sample represents the average kinetic energy of a homogeneous cell grid.

Bart
August 19, 2013 2:17 pm

FTA: “One important corollary of this is that the final error estimate for a given monthā€™s anomaly cannot be smaller than the error in the climatology for that month.”
Yep. Statistical sleight of hand has been SOP during the whole fiasco. You cannot reduce error below the long term error using short term data. Most basic statistical tests assume independence of measurements at some level. When the data are not genuinely independent, those tests produce garbage.

dp
August 19, 2013 2:58 pm

I just followed LdB’s link to Wikipedia’s fable about temperature where they say this:

Between two bodies with the same temperature no heat flows.

This is utterly impossible. Countervailing heat flows between two bodies with the same temperature. Neither is aware of the temperature of the other. No intelligence is expressed in radiation as the to energy state of objects around radiating objects. They don’t swap tales of heat exchange. They just radiate, and in turn, receive radiation. The rate of exchange is entirely dependent on the relative energy levels of the radiating objects.

Gail Combs
August 19, 2013 3:26 pm

Jeff Cagle,
I get what you mean about kinetic energy. (I assume you have to take into account the humidity)
How ever if I am looking for the true value of the temperature in a mix batch.
#1. There IS a true value.
#2. I need to make sure the instruments I use to take the test data are calibrated
#3. The more data points I take the better chance I have to ‘approach’ the true value.
#4. The more instruments I use and the more observers I uses the larger the error.
However if I recall from my very ancient stat courses, if I have a series of numbers
71
72
71
73
70
71
Then I can estimate the true value as 71.3 but I am not allowed to call it 71.3333.
If I am looking at collecting those numbers over a long time period I also have to take into account the systematic error. The stirring motor having a bearing wearing and causing the value 73, an observer having a parallax error problem, instrument drift and such.
However once you make it a dynamic process, with chemicals being added and finished product getting removed the statistics starts to get a lot more interesting (I worked for a chemical company doing continuous process)
You can get a better estimate of the true value using statistics however it can not make up numbers out of thin air and it can’t get rid of systematic error like UHI and more important you are not allowed to go changing the historic data.

cd
August 19, 2013 4:13 pm

I don’t know why poor S Mosher gets such a bad rap around here. He comes here, into the Lion’s Den if you will, and never patronises nor insults others.
I could be wrong here, but when he talks about removing seasonal trend I think he means accounting for cyclical changes in the global average through the year (wavelength = 6months).
I’m not sure why Willis is spending time on this. In MHO it’s rather a moot point and I think the post is a distraction.

cd
August 19, 2013 4:17 pm

Steven Mosher
Why using Kriging do you not integrate the spatial uncertainty into your estimates. Reading the methodology it appears that you use an indirect method to solving the Kriging system which may not give you the Kriging Variance. This may be more efficient but you lose one of the main benefits of the technique. The argument that clustering is taken care of in the estimates is correct, but there are other methods that can be used to do this also, but one of the main reasons for using Kriging is that it gives you an estimate of uncertainty that is derived from the same statistic that is used to produce your estimate.

cd
August 19, 2013 4:20 pm

Sorry to my post August 19, 2013 at 4:13 pm: 1/2 wavelength = 6 months

August 19, 2013 5:03 pm

Gail Combs says:
August 19, 2013 at 12:26 pm

To my mind the big question is: Is a temperature data point a single sample or can you, with statistical validity, group it with the temperature in the next city or hour or day?

I think when you start looking at the actual measured data, the answer this question raises buts the surface record set in a poor light.
For instance Merrill Field in Anchorage was mentioned above. NCDC summary of days lists 2 different stations for Merrill Field. Here’s a swim lane chart of the data (sorry for the format):
STATION_NUMBER 702735 702735
WBAN 99999 26409
LAT 61200 61217
LON -149833 -149855
ELEV +00420 +00418
NAME MERRILL FLD MERRILL FLD
CTRY US US
YR_1940 0 0
YR_1941 0 0
YR_1942 0 0
YR_1943 0 0
YR_1944 0 0
YR_1945 0 364
YR_1946 0 365
YR_1947 0 365
YR_1948 0 366
YR_1949 0 365
YR_1950 0 365
YR_1951 0 365
YR_1952 0 366
YR_1953 0 305
YR_1954 0 0
YR_1955 0 0
YR_1956 0 0
YR_1957 0 0
YR_1958 0 0
YR_1959 0 0
YR_1960 0 0
YR_1961 0 0
YR_1962 0 0
YR_1963 0 0
YR_1964 0 0
YR_1965 0 0
YR_1966 0 0
YR_1967 0 0
YR_1968 0 0
YR_1969 0 0
YR_1970 0 0
YR_1971 0 0
YR_1972 0 0
YR_1973 0 0
YR_1974 0 0
YR_1975 0 13
YR_1976 0 366
YR_1977 0 365
YR_1978 0 365
YR_1979 0 365
YR_1980 0 366
YR_1981 0 362
YR_1982 0 365
YR_1983 0 365
YR_1984 0 366
YR_1985 0 362
YR_1986 0 365
YR_1987 0 365
YR_1988 0 364
YR_1989 0 365
YR_1990 0 365
YR_1991 0 365
YR_1992 0 364
YR_1993 0 365
YR_1994 0 362
YR_1995 0 365
YR_1996 0 366
YR_1997 0 365
YR_1998 0 365
YR_1999 0 360
YR_2000 365 0
YR_2001 365 0
YR_2002 365 0
YR_2003 365 0
YR_2004 361 0
YR_2005 0 364
YR_2006 0 365
YR_2007 0 365
YR_2008 0 366
YR_2009 0 365
YR_2010 0 365
YR_2011 0 365
YR_2012 0 366
The count is the number of daily samples by year for each of the 2 stations.

August 19, 2013 5:09 pm

Here is the stations in the Anchorage area and the sample count for them.
STATION_NUMBER 702730 999999 702725 702725 702735 702735 997381 702720 999999 702720 702736 702700 702746
WBAN 26451 26451 26491 99999 99999 26409 99999 99999 26452 26401 99999 99999 26497
LAT 61175 61175 61179 61183 61200 61217 61233 61250 61250 61253 61267 61267 61416
LON -149993 -149993 -149961 -149967 -149833 -149855 -149883 -149800 -149800 -149794 -149650 -149650 -149507
ELEV +00402 +00402 +00402 +00220 +00420 +00418 +00030 +00590 +00631 +00649 +01150 +01150 +00293
YR_1940 0 0 0 0 0 0 0 0 0 0 0 0 0
YR_1941 0 0 0 0 0 0 0 294 0 0 0 0 0
YR_1942 0 0 0 0 0 0 0 365 0 0 0 0 0
YR_1943 0 0 0 0 0 0 0 365 0 0 0 0 0
YR_1944 0 0 0 0 0 0 0 366 0 0 0 0 0
YR_1945 0 0 0 0 0 364 0 365 0 0 0 0 0
YR_1946 0 0 0 0 0 365 0 365 0 0 0 0 0
YR_1947 0 0 0 0 0 365 0 365 0 0 0 0 0
YR_1948 0 0 0 0 0 366 0 366 0 0 0 0 0
YR_1949 0 0 0 0 0 365 0 365 0 0 0 0 0
YR_1950 0 0 0 0 0 365 0 365 0 0 0 0 0
YR_1951 0 0 0 0 0 365 0 365 0 0 0 0 0
YR_1952 0 0 0 0 0 366 0 366 0 0 0 0 0
YR_1953 0 60 0 0 0 305 0 365 70 0 0 0 0
YR_1954 0 365 0 0 0 0 0 365 365 0 0 0 0
YR_1955 0 365 0 0 0 0 0 365 365 0 0 0 0
YR_1956 0 366 0 0 0 0 0 366 104 0 0 0 0
YR_1957 0 365 0 0 0 0 0 365 0 0 0 0 0
YR_1958 0 365 0 0 0 0 0 365 0 0 0 0 0
YR_1959 0 365 0 0 0 0 0 365 0 0 0 0 0
YR_1960 0 366 0 0 0 0 0 366 0 0 0 0 0
YR_1961 0 365 0 0 0 0 0 365 0 0 0 0 0
YR_1962 0 365 0 0 0 0 0 365 0 0 0 0 0
YR_1963 0 365 0 0 0 0 0 365 0 0 0 0 0
YR_1964 0 364 0 0 0 0 0 366 0 0 0 0 0
YR_1965 0 365 0 0 0 0 0 365 0 0 0 0 0
YR_1966 0 365 0 0 0 0 0 365 0 0 0 0 0
YR_1967 0 365 0 0 0 0 0 365 0 0 0 0 0
YR_1968 0 366 0 0 0 0 0 366 0 0 0 0 0
YR_1969 0 365 0 0 0 0 0 365 0 0 0 0 0
YR_1970 0 365 0 0 0 0 0 365 0 0 0 0 0
YR_1971 0 365 0 0 0 0 0 0 0 0 0 0 0
YR_1972 0 366 0 0 0 0 0 0 0 0 0 0 0
YR_1973 364 0 0 0 0 0 0 364 0 0 0 30 0
YR_1974 365 0 0 0 0 0 0 365 0 0 0 355 0
YR_1975 365 0 0 0 0 13 0 365 0 0 0 355 0
YR_1976 366 0 0 0 0 366 0 366 0 0 19 198 0
YR_1977 365 0 0 0 0 365 0 365 0 0 232 257 0
YR_1978 365 0 0 0 0 365 0 365 0 0 277 281 0
YR_1979 365 0 0 0 0 365 0 365 0 0 325 325 0
YR_1980 366 0 0 0 0 366 0 366 0 0 358 358 0
YR_1981 365 0 0 0 0 362 0 365 0 0 336 336 0
YR_1982 365 0 0 0 0 365 0 365 0 0 359 359 0
YR_1983 365 0 0 0 0 365 0 365 0 0 194 357 0
YR_1984 366 0 0 0 0 366 0 366 0 0 0 366 0
YR_1985 365 0 0 0 0 362 0 365 0 0 0 365 0
YR_1986 365 0 0 0 0 365 0 365 0 0 0 365 0
YR_1987 365 0 0 0 0 365 0 365 0 0 0 365 0
YR_1988 366 0 0 0 0 364 0 366 0 0 0 366 0
YR_1989 365 0 0 0 0 365 0 365 0 0 0 365 0
YR_1990 365 0 0 0 0 365 0 365 0 0 0 365 0
YR_1991 365 0 0 0 0 365 0 365 0 0 0 365 0
YR_1992 366 0 0 0 0 364 0 366 0 0 0 366 0
YR_1993 365 0 0 8 0 365 0 363 0 0 0 355 0
YR_1994 365 0 0 332 0 362 0 365 0 0 0 269 0
YR_1995 365 0 0 313 0 365 0 365 0 0 0 162 0
YR_1996 366 0 0 323 0 366 0 359 0 0 0 0 0
YR_1997 365 0 0 316 0 365 0 365 0 0 0 0 0
YR_1998 365 0 0 339 0 365 0 365 0 0 0 0 0
YR_1999 365 0 0 363 0 360 0 363 0 0 0 0 0
YR_2000 366 0 0 366 365 0 0 366 0 0 0 0 0
YR_2001 365 0 0 365 365 0 0 365 0 0 0 0 0
YR_2002 365 0 362 0 365 0 0 362 0 0 0 0 0
YR_2003 365 0 365 0 365 0 0 365 0 0 0 0 0
YR_2004 366 0 358 0 361 0 0 366 0 0 0 0 0
YR_2005 365 0 365 0 0 364 184 365 0 0 0 0 0
YR_2006 365 0 365 0 0 365 356 0 0 364 0 0 365
YR_2007 365 0 365 0 0 365 365 0 0 365 0 0 362
YR_2008 366 0 366 0 0 366 366 0 0 366 0 0 362
YR_2009 365 0 365 0 0 365 347 0 0 365 0 0 352
YR_2010 365 0 365 0 0 365 355 0 0 365 0 0 359
YR_2011 365 0 365 0 0 365 365 0 0 365 0 0 360
YR_2012 366 0 366 0 0 366 364 0 0 366 0 0 336

1sky1
August 19, 2013 5:27 pm

Long-term systematic bias due to UHI and land-use changes, rather than sampling variablity of “anomalies,” is what afflicts BEST’s results the most. “Kriging” spreads that bias spatially, resulting in the highest trends and “scalpeling” produces the lowest low-frequency spectral content in their manufactured time series.

LdB
August 19, 2013 6:53 pm

@Stephen Wilde says:
August 19, 2013 at 10:59 am
The conversion of energy to and fro between KE and PE is a critical issue because in each case that conversion is a negative system response to the radiative characteristics of GHGs or any other forcing element other than more mass, more gravity or more insolation.
That may be true Stephen I don’t know and not something I really care about as I said you guys even reparamterised will still argue about what is and isn’t climate change you will break into groups I expect it.
The issue I was dealing with was trying to get people to realize this perceived problem of definition of temperature and measuring point and the like is all garbage.
These terms are all junk classic physics and people are trying to give some sort of precise meaning thinking that makes the problem better. I mean temperature is not even a real thing its a group of quantum properties that sort of looks a bit like kinetic energy but really at it’s basis it goes back to a historic property that it is a group of quantum properties that made a liquid expand up a tube when you heated it.
What they lost in their argument was that they couldn’t get a common reference because classic physics is wrong and junk has been for 100 years and you can’t define your way out of it. The most important thing that temperature = energy (because energy equals quantum information) got lost in the stupidity that is classic physics. It still stuns me that people don’t realize that fact because they lost it’s meaning in the mumbo jumbo garbage that is classic physics.
Energy is the universal reference frame that works everywhere in the universe that we know and all the hard science reference to it because of that fact. If climate science is wallowing down into inane arguments about definitions then it simply needs to reference itself to energy like every other science and that was the point I was making.

August 19, 2013 10:13 pm

Gail, good questions. Before getting to them, let me just clarify that my interest is in the minimalist physical accuracy of the measured temperatures themselves.
This means the accuracy of the thermometer (sensor) itself, under conditions of ideal siting and repair. The accuracy of any reported temperature cannot be any better than obtained under those conditions. UHI, siting irregularities, instrumental changes, etc., etc., only add to the minimum error due to the limit of instrumental accuracy. I’m interested in that maximum of accuracy and minimum of error for any given temperature measurement.
To your questions: First, any temperature measurement is unique and physically independent in terms of it being a separate measurement. It is like a single UV-visible spectrum: its inherent magnitude and properties depend in no way on any system you’ve measured before.
The magnitude and properties of the measured magnitude do depend in some way on the instrument one used, just like the UV-vis spectrum. Resolution, accuracy, linearity, and so on. Instrumental characteristics and errors don’t affect the properties of the observable. They affect the properties of the observation. Hence the need for calibration.
As with electronic spectra, one can combine temperatures from various places and in various ways — but one must keep very careful track of what one is doing, and why and what one trying to accomplish. One can linearly combine separate and independent electronic spectra to estimate the effects of an impurity, for example, but one cannot represent that composite *as* the spectrum of product plus impurity.
So, anyone can average temperatures from anywhere, but must keep careful track of exactly what that average means, and must report only that meaning. An average of temperatures from San Francisco and New York might be interesting to track and one could announce how that average changes, and how SF and NY depart from their common average over time.
The physical uncertainty in the average will depend on the accuracy of the measurements taken from each of the two thermometers. This uncertainty tells us about the error bars around the average — its reliability in other words. If the departure of an anomaly is less than the uncertainty in the average, then that anomaly has no obvious physical meaning with respect to anything going on with temperature at SF (or NY).
There is a second kind of uncertainty, which follows from your second question. This second uncertainty is not an error, but instead is a measure of the variability inherent in the system: state uncertainty.
Here’s an example: Suppose we can measure individual temperatures at SF and NY to infinite physical accuracy. We have this daily 2 pm temperature series (in C):
NY SF
25 15
22 17
18 18
25 16
28 19
Avg: 20.3(+/-)4.4 C
That (+/-)4.4 C is not an error, but it is an uncertainty. It’s a measure of how the day-by-day average bounces around. It’s a measure of the variation in the state of the system, where the system is SF + NY. Concisely, it’s state uncertainty.
Suppose after 100 years of data, the state uncertainty remains (+/-)4.4 C, then this magnitude is a measure of the inherent variability in the system at any given time over that century. Any annual SF-NY average takes its meaning only within the state variability of (+/-)4.4 C.
So, if one wanted to compare annual SF and NY anomalies vs. a centennial average of temperatures, one would have to take cognizance of the fact that the 20.3 C average by itself is not a true representation of the system. The true representation includes the inherent state variability. To evaluate the divergence of any one anomaly would require assessing it against the (+/-)4.4 C variability inherent to the system.
Turning off the infinite accuracy in our measurements, the uncertainty in any pair average temperature is the root-mean-square of the individual measurement errors. The uncertainty in the centennial average temperature is the r.m.s. of the measurement errors in all 73,000 individual measurements.
The uncertainty in any anomaly is the r.m.s. of the individual temperature measurement error and the uncertainty in the centennial average. I.e., the uncertainty in the anomaly is larger than the error in the measured temperature from which it’s derived. This uncertainty is a measure of the physical accuracy of the anomaly, the reliability of the anomaly — how much we can trust the number.
Finally, the physical meaning of an anomaly is determined by the combined physical accuracy and the state uncertainty.
That’s a long answer to your questions, but I hope it’s readable.
After all that, notice Jeff Cagle’s description of the ARGO SST methodology. No disrespect meant, but notice there’s not one word about the accuracy of the individual ARGO measurements.
ARGO buoys have never been field-calibrated. They’re calibrated on land, and then released. No one knows whether the temperatures they measure at sea reach the accuracy of the land calibrations. Re-calibrations on land show no drift, but this is not the same as knowing whether environmental exposure affects the temperature readings.
As an example using the more familiar land-station, the ARGO project is like lab-testing a land-station PRT after a year, finding that it hadn’t drifted, and deciding that therefore all the outside recorded temperatures were good to (+/-)0.1 C (lab accuracy).
Such a decision would ignore that the first-order inaccuracies in measured land temperatures all come from environmental effects that mostly impact the screen, rather than from sensor failures. Nearby drifting buoys that should record the same sea surface temperature are known to show a bias (difference) of ~0.15 C, and an r.m.s. divergence of (+/-)0.5 C. [1] Is this divergence included in SST uncertainty estimates, or is it just averaged away as 1/sqrt(N), as though it were mere Gaussian noise? Probably the latter, I’m sorry to say.
[1] W. J. Emery, et al., (2001) “Accuracy of in situ sea surface temperatures used to calibrate infrared satellite measurements” JGR 106(C2) 2387-2405.

August 19, 2013 10:47 pm

@ Pat
I take it that with the Argo buoys you are referring to something similar to what happened when several different temperature sensors were tested inside a Stephenson screen here in Oz. While all the sensors agreed extremely closely what the freezing and boiling points of water are when calibrated, they didn’t agree what the temperatures when placed inside the Stephenson screen. And position inside the screen affected the measured temperature as well.

Gail Combs
August 20, 2013 2:15 am

Pat Frank says: @ August 19, 2013 at 10:13 pm
…..
Thanks Pat,
I could follow your comment with out a problem and understand exactly what you are talking about.
As I said my Stat courses are decades old so I do not have the words. However after sitting through screaming matches between the analytical chemist who devised the test method and the mix room manager with out of spec batches and being the one having to straighten the mess out, I have a pretty good feel for error from the test method and variability inherent in the batch due to poor mixing or worse electrostatic forces that cause the batch to segregate the chemicals the more the batch is mixed.
Slapping some liquids in a tank and turning on a mixer or dumping some finely ground solids in a tumbler and turning it on does NOT mean you are going to get a uniform batch. Hence my cynicism when it comes to the ‘CO2 is well mixed in the atmosphere’ Assumption and the similar assumptions I see being made with temperature. That is temperature does not vary much over a wide area and a few points can effectively give you an accurate picture of the true temperature with the precision claimed. Past experience says that just doesn’t pass the smell test even though I can’t explain why.
And as LdB and others have said time and again temperature is the WRONG parameter to be measuring in the first place but we seem to be stuck with it.

cd
August 20, 2013 4:12 am

1sky1
Is your point about Kriging true? It depends, and is highly dependent on what type of kriging methdology employed.
But I agree that the issue should not revolve around which gridding system is best but whether or not the raw data is reliable and unbiased. I have noticed that in this field everyone is data processing mad ignoring the most important step in science: experimental setup.

richardscourtney
August 20, 2013 5:29 am

Friends:
LdB says at August 19, 2013 at 6:53 pm

quoted text The most important thing that temperature = energy (because energy equals quantum information) got lost in the stupidity that is classic physics.

{emphasis added: RSC}
I should have seen that coming. Another Myrrh has turned up.
Richard

richardscourtney
August 20, 2013 5:35 am

Pat Frank:
You provide an excellent post at August 19, 2013 at 10:13 pm.
I write to draw attention to it and provide this link which jumps to it for any who may have missed it
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1395124
Richard

LdB
August 20, 2013 5:56 am

@Gail Combs says:
August 20, 2013 at 2:15 am
And as LdB and others have said time and again temperature is the WRONG parameter to be measuring in the first place but we seem to be stuck with it
Haha true but hey if we have brave and crazy game to try … I can give you the proper definitions but you probably need a quick intro coarse at uni. The problem is climate scientist probably don’t want to go back to uni and to be fair they do manage to get it right using classic simplifications most times šŸ™‚
DEFINITIONS FOR THE BRAVE AND CRAZY (I think they are right classic physics always freaks me out … nah they are right I cheated and decided to copy Lasalle)
Energy of particle: energy associated with occupied quantum state of a single particle ā€“ many quantum levels available, but only one is occupied at any point in time
Thermal modes: those quantum interactions with energy gaps small enough that changes in temperature can affect a change in population of states
Non-thermal modes: those quantum interactions whose energy gaps between adjacent quantum states are too large for population to be affected by temperature
Energy expectation value : = average energy for a single moleculeā€¦ averaged across all possible quantum statesā€¦ and weighted by probability of each state
Total energy: NĀ·, where N = total number of particles
Energy: the capacity to do work
Internal energy of a system (U): combined energy of all the molecular states
Heat (q): thermal transfer of energy to/from the system to the surroundings. Occurs
through random collisions of neighboring molecules.
Temperature (T): parameter that describes the energy distribution across the quantum
states available to the system
Thermal energy: kT = average Boltzmann energy level of molecules in surroundings

Gail Combs
August 20, 2013 6:35 am

LdB says…
……………..
ARRGHhhh, Science Fiction Physics and Thermo! I passed those courses by the skin of my teeth and a lot of late nights studying.
When I was in school (as a chemist) we didn’t have to take Stat. at all.
>>>>>>>>>>>>>>>>>>>>>
cd says:
……t the issue should not revolve around which gridding system is best but whether or not the raw data is reliable and unbiased. I have noticed that in this field everyone is data processing mad ignoring the most important step in science: experimental setup.
………………
Correct. And I think the point J W Merks in Geostatisics: [Kriging] From Human Error to Scientific Fraud is trying to get across is you can not make a silk purse out of a sows ear.
It all goes back to the assumptions you make. You assume the data is reliable and unbiased you assume there is enough data to define the surface Kriging is describing.
Merks page Sampling paradox lists the problems we are trying to get at.
[QUOTE]
Mathematical Statistics………………. Geostatistics
Functional independence fundamental…Functional dependence ubiquitous
Weighted averages have variances…….Kriged estimates lack variances
Variances are statistically sound………..Kriging variances are pseudo variances
Spatial dependence verified……………..Spatial dependence assumed
Degrees of freedom indispensable ……Degrees of freedom dismissed
Unbiased confidence limits quantify risk..Unbiased confidence limits are lacking
Variograms display spatial dependence…Semi-variograms make pseudo science
Smoothing makes no statistical sense……Smoothing makes geostatistical sense
Mathematical statistics is a science ………Geostatistics is a scientific fraud
wonder about the nimble workings of geostatistical minds as degrees of freedom became a burden when a small set of measured data gives a large set of calculated distance-weighted averages-cum-kriged estimates.
[UNQUOTE]
That last line is exactly what I am trying to articulate.

August 20, 2013 8:52 am

PG, you’ve got it by the short hairs. šŸ™‚

cd
August 20, 2013 8:59 am

Gail
I don’t know where this hatred of a generally accepted statistical methodology comes from – its not an entity trying to profligate some view of the world. Kriging, or Spatial Linear Regression, as a statistician would probably refer to it, is like any other statistical method – just that…a method. It makes assumptions that work well with one dataset and less well with another (because the underlying mathematical assumptions aren’t met). But again there are always work arounds and hence the palette of Kriging methods. It is by far and away the most sophisticated and robust gridding method because it doesn’t make any assumption beyond what the experimentally derived bivariate statistic (the variogram) tells us. So where we have sparse data it does not manufacture a trend when interpolating between control points, if the range of spatial “correlation” is exceeded. That’s one of its key strengths – it doesn’t, if you do the experimental stage and statistical stage correctly, manufacture artifacts where there isn’t any information.
My argument is that the improved “accuracy” of using more robust data processing methods is far out weighed by the benefits one would get from getting more accurate observations.

August 20, 2013 9:07 am

Gail, summers as an undergrad, I worked in the analytical lab for a small detergent mixing house, now defunct, called Klix Chemicals. They made large batches of powdered and liquid detergents, mostly for janitorial supply and for the military (MILSPEC was really something; detailed rules for guidance of the inept) and occasionally saponified 4000 gallons of vegetable oil to make Castile soap. I got to titrate stuff for total alkalinity, assess [phosphate], etc., and wash windows and cloth patches to test the efficacy of our products. So, I have an idea of your experience. No screaming matches as I recall, but those big mixers were something.

Gail Combs
August 20, 2013 10:18 am

cd says: @ August 20, 2013 at 8:59 am
I donā€™t know where this hatred of a generally accepted statistical methodology….
>>>>>>>>>>>>>>>>>
First when ever I see “a generally accepted statistical methodology” a red flag goes up because that is the same as saying the science is settled. In my work experience it meant, we know we screwed up, we know we have been doing it wrong but we are not going to admit it. Instead we are going to take the average of all our factories and call that the ” generally accepted value”and hope our customers don’t catch on.
Again, I am ‘Computer Challenged’ with a minor bit of statistics training however I have worked with statistical computer programs in industry long enough to cringe after seeing the many ways they get used incorrectly by scientists. Also I may be a light weight but Merk is not. More important he is dealing not with Climate Science where there is no real way to go back and check whether the Kriging works but in an area where that checking can be done.
I am bring this view point up to the readers of WUWT because there seems to be a general acceptance of a method few have any knowledge of and I feel this is very dangerous.
Merk says of himself:

I am an author, a consultant, and a lecturer….
I worked at the Port of Rotterdam, the world’s largest port for bulk solids and liquids, and at the Port of Vancouver, Canada’s largest port in the Pacific Northwest. A background in analytical chemistry, chemical engineering, mining engineering, and mathematical statistics underpin my career in metrology, the science of measurement, as it applies to the international commodity trade in general, and to mineral exploration, mining, processing, smelting and refining in particular.
I was Vice President, Quality Control Services, with the SGS Organization, a worldwide network of inspection companies that acts as referee between international trading partners….
I performed technical audits for clients in Australia, Canada, Europe, South America and the USA.
I used the concept of bias detection limits for statistical risks as a measure for the power of Student’s t-test, the bias test par excellence. I defined the concept of probable ranges as a measure for the limits within which an observed bias is expected to fall. I conceptualized and evaluated a mechanical sampling system for cathode copper that became the de facto standard method at copper refineries in different parts of the world. I designed mechanical sampling systems and modules to routinely select pairs of interleaving primary samples from crushed ore and slurry flows.
In the early 1990s, I reported to the Canadian Institute of Mining, Metallurgy and Petroleum and to the Ontario Securities Commission that geostatistics [Kriging] is an invalid variant of mathematical statistics…..
link

OKAY, I am a rabble-rouser but without a discussions of pros and cons we don’t know if Merk is correct or not. Saying “.. hatred of a generally accepted statistical methodology…” doesn’t cut it as a discussion.
My husband who is trained as a physicist and does computer work mutters Nyquist Frequency. He had the care and feeding of the computers at MIT Lincoln Lab that was used by semologists and electrical engineers. He has a worse opinion of Kriging than I do. (I asked him to help me understand it.)
He says:
One of the problems that had to be over come was the belief that computers invariably represented real numbers accurately. He believed that although all the scientists in the group knew about the problem, that because of expediency the individual scientist might believe his work was immune to it.

Gail Combs
August 20, 2013 10:37 am

Pat Frank, yes those big mixes give you a real appreciation of the ‘Well Mixed’ concept don’t they. (Darn it NO, you can’t cut the mixing time by fifteen minutes so you can fit in another batch….)
Coming up with the correct sampling plan (and the correct parameters to measure) was always the biggest headache in QC.

Ken
August 20, 2013 12:03 pm

@cd >> Iā€™m not sure why Willis is spending time on this.
Because he wants to. Does there need to be another reason?

1sky1
August 20, 2013 1:18 pm

cd:
In many regions of the globe the only century-long station records available are from major cities, whose temperatures manifest various UHI effects. No matter what “kriging” method is used in those regions, the systematic, but highly localized, urban bias is spread much more widely. The uncorrupted regional average temperature signal is rarely known with any accuracy.
In many cases, due to sparse spatio-temporal coverage, BEST winds up manufacturing a “local” time-series employing very distant urban stations that are not even subject to the same weather regimes. The claim that their kriging algorithm consistently produces accurate time-series results in 3D space is entirely specious. You can’t get such results from absent and/or faulty data.

cd
August 20, 2013 1:37 pm

Ken – fair enough.

cd
August 20, 2013 1:55 pm

Gail
You misunderstand. Statistical methodologies are developed in order to deal with specific problems – there is only a judgement for the best method based on the central aims. It really doesn’t matter what your opinion is, or your husbands, nor does his qualifications. The only thing that matters is that you use the best tools for a specific problem. For example, using the arithmetic mean is not a good measure of central tendency in a log normal distribution but it doesn’t mean it isn’t a good approach in other circumstances.
I’m not going to get into the whole “after several years working as a mathematical genius (or this or that)…”. It carries no weight to your argument. For spatial interpolation there are many methods exact vs inexact, biased vs unbiased, those that honour regional gradient and those that don’t. Kriging is in my opinion the best, and by some measure. It may not give you the most interesting pictures but then that should not be the aim of the choice of algorithm.
Again there is no single Kriging method, it is more of a paradigm that approaches spatial regression from a particular angle. For example, the quoted critique you provided mentions pseudo-variance, and indeed routine types of Kriging do this, but if you want a reliable cdf, the indicator kriging of continuous variables would be the choice.
So, since we’re all wrong, what would your choice of gridding algorithm be.

cd
August 20, 2013 2:00 pm

Gail
BTW what has the Nyquist Frequency to do with Kriging unless you’re trying to Krige a signal or may be using an FFT-based algorithm for solving the large linear systems that can arise from Kriging using large numbers of controls. If you can sample at (or above) the Nyquist Frequency you really need do only a spline, Kriging would be overkill!
Perhaps you can expand.

cd
August 20, 2013 2:19 pm

1sky1
I’m not disagreeing with you. I don’t think BEST ever claimed to eradicate the UHI effect using Kriging. I don’t think you can. As I remember it, they did suggest it did resolve the problem of data clustering – which it does.

richardscourtney
August 20, 2013 2:24 pm

cd:
Your post to Gail Combs at August 20, 2013 at 1:55 pm
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1395592
makes a point and asks a question which I write to address.
You say

Statistical methodologies are developed in order to deal with specific problems ā€“ there is only a judgement for the best method based on the central aims.

Yes, I explained that in my post to Gail at August 19, 2013 at 1:03 pm
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394753
And that post from me also hinted at my answer to your question; viz.

So, since weā€™re all wrong, what would your choice of gridding algorithm be.

I would not have one.
I would admit that no meaningful result is obtainable.
And I would point out that a wrong result is more misleading than no result.
Richard

cd
August 20, 2013 2:58 pm

Richard
I don’t know what you mean by grouping data points together. Kriging works on controls, the only real data you have; conceptually each interpolation point lies on “known” correlation surfaces (defined via the variogram) centered about (and for) each control point. Therefore, there will be a unique solution that will satisfy all the surfaces for all the points given the statistical model (the variogram).
Now if you’re saying that the controls are not accurate measurements, then that is different argument to the one I am making.

1sky1
August 20, 2013 3:08 pm

cd:
I realize that you’re not disagreeing with me. What I’m pointing out to the general audience is the
flimsy basis of BEST’s claim that, after subjecting all available data to its “quality control” and “analysis,” there was no significant difference evident in trends between urban and “rural” records. Had they actually examined vetted century-long records, instead of resorting to piecemeal syntheses from scraps of data, the foolishness of such a claim would have been evident.
BTW, in any serious scientific study of spatial variability using discrete data, aliasing is no less a concern than in temporal analysis. The mathematically convenient property of smooth changes over widely homogenous fields has to be empirically established, instead of simply being decreed by analytic fiat.

richardscourtney
August 20, 2013 3:21 pm

cd:
Thankyou for your reply to me at August 20, 2013 at 2:58 pm
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1395639
I am answering the two points I think you want me to respond. If I have missed any then please tell me.
Firstly, you say to me

I donā€™t know what you mean by grouping data points together.

I was answering the question put to me by Gail Combs
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394718
and I was using her terminology. I understood her phrase “grouping data points” to mean ‘obtaining an average’. And my answer addressed that
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1394753
You also say to me

Now if youā€™re saying that the controls are not accurate measurements, then that is different argument to the one I am making.

No. I did not mention Kriging – or any other methodology – so the use of controls was not my point.
I was making a general point in answer to your question, and my point covers every type of estimating and averaging (including Kriging).
I remind that you asked

So, since weā€™re all wrong, what would your choice of gridding algorithm be.

and I answered

I would not have one.
I would admit that no meaningful result is obtainable.
And I would point out that a wrong result is more misleading than no result.

In other words, I would openly admit that there is no valid method available to obtain a meaningful average and, therefore, I would advise that no average should be calculated because any obtained result would be more misleading than coping with the absence of the desired average.
Richard

cd
August 20, 2013 3:48 pm

1sky1
Thanks for the reply. I agree with your points. However, is this practically possible. Either way their claim is a little stretched.
On your second point, and not wanting to go off on a different direction, I’m not sure exactly what you mean by homogeneous fields in relation to spatial data. But I can’t see why one would get hung up on aliasing when carrying out spatial interpolation; the aim is akin but different to signal processing. In the real world pragmatism is required, one must make some type of reasonable estimate based on limited and sparse data. In kriging the variogram can be seen just as a prior, even if an incomplete one. If the aim were to preserve signal integrity or to avoid processing artifacts resulting from aliasing, then one would require exhaustive sampling.
I think a lot of comments here seem to be asking a lot and applying standards that one might expect in a controlled laboratory environment to the global environment.

cd
August 20, 2013 3:58 pm

OK Richard we obviously got our wires crossed. I think there is no harm in trying, the danger is not in the estimates it’s in the reported high degrees of confidence.

LdB
August 20, 2013 4:55 pm

@cd says:
August 20, 2013 at 3:48 pm
I think a lot of comments here seem to be asking a lot and applying standards that one might expect in a controlled laboratory environment to the global environment.
Totally agree with you @cd and worse many seem to not understand what they are dealing with because they lost the realization that all this stuff is energy somewhere in the haze of classic physics.
I am sitting on the fence I think climate change probably is real but I am dammed if accept any of the accuracies that I am being asked to accept and the statistics is just making it worse.

1sky1
August 20, 2013 6:35 pm

cd:
Admittedly, vetted century-long station records from effectively non-urban sites are available only sparsely around globe. Nevertheless, such are required fo avoid seriously trend-biased estimates of GSAT. As a practical scientific matter, one must opt for geographically incomplete coverage by reliable station data at fixed locations over the illusion of continuous coverage provided by kriging variously corrupted data stitched together in time from ever-changing locations
This mandate is made all the more imperative by the empirical recognition that the temperature field is usually NOT spatially homogeneous (invariant, aside from a constant offset and a scale factor) over distances greater than a few hundred km, Nor is it isotropic (directionally independent), as assumed by BEST’s universal “correlation length”–i.e., their effective “variogram ” The real world is considerably more complex than that!
And then there’s temporal variability–a feature generally not treated adequately in geostatistics. That’s why I prefer to work with cross-spectral techniques in estimating regional temperature variations. Contrary to patent academic hubris, no reliable average time-history can be obtained in many regions around the globe for all the years prior to the satellite era.

RACookPE1978
Editor
August 20, 2013 6:53 pm

I am much more troubled by the assumption – apparently required through this entire conversation – that any given temperature (weather, that is) at any given place any time during the centuries can be represented by an “average” temperature WITH a “plus or minus “error” before any statistical processing can even begin.
Statistical Process Control, and hence its foundation of statistical processing even as basic as averages and standard deviations, MUST begin with repeated measurements of the same thing, or of similar things repeatedly measured the same way.
But temperature is NOT a standard “thing”. It is NOT static, nor done it change linearly, straightforwardly, or in the same direction every time. It is chaotic. It is NOT measured several times to “get an average”. Temperatures were (are!) measured twice a day. The measurements are never repeated: The next day – under the next day’s “weather” – gets two more unique measurements. Over time (a decade or a quarter century) there may become a trend in successive unique temperature measurements, but the NEVER repeat the same “weather”. Ever.
If I have 10,000 ball bearings coming down a chute every day, I can tell you what the standard deviation is of the set, what the average is, what the error might be in my measurement tool or in the grinding wheel or the bar stock. But I cannot do that for the daily temperatures. If that run of 10,000 1.000 cm ball bearings is combined with 5000 2.000 cm ball bearings, i don’t have 15,000 “average” ball bearings. I still have two sets of unique ball bearings.
With daily temperatures, the location matters and time-of-day matters. But “mushing” that unique data over ever wider and wider geographic regions to make it appear that the heating was widespread and dramatic only started Hansen’s original series of errors.

August 21, 2013 12:49 am

LdB said @ August 20, 2013 at 4:55 pm

I think a lot of comments here seem to be asking a lot and applying standards that one might expect in a controlled laboratory environment to the global environment.
Totally agree with you @cd and worse many seem to not understand what they are dealing with because they lost the realization that all this stuff is energy somewhere in the haze of classic physics.
I am sitting on the fence I think climate change probably is real but I am dammed if accept any of the accuracies that I am being asked to accept and the statistics is just making it worse.

We already knew that it’s all about enthalpy, or at least most of us did. We are “lost in a haze of classical physics” because we live in a classical physics world, not a quantum world. I have yet to find a thermometer in a superposition of states šŸ˜‰
Climate change is real and nobody around here doubts that. You are new here. Please respect the fact that most of us are here to learn — physics, quantum or classical, don’t figure in our day jobs. We have an excellent physics tutor (refresher for many) in RG Brown who teaches physics at Duke and wrote some texts, but most days he’s busy. If you relax and listen a bit more/longer, you might learn some stuff, too. The engineers and geologists have some fascinating insights into climate.

August 21, 2013 12:51 am

RACookPE1978 said @ August 20, 2013 at 6:53 pm

I am much more troubled by the assumption ā€“ apparently required through this entire conversation ā€“ that any given temperature (weather, that is) at any given place any time during the centuries can be represented by an ā€œaverageā€ temperature WITH a ā€œplus or minus ā€œerrorā€ before any statistical processing can even begin.
Statistical Process Control, and hence its foundation of statistical processing even as basic as averages and standard deviations, MUST begin with repeated measurements of the same thing, or of similar things repeatedly measured the same way.
But temperature is NOT a standard ā€œthingā€. It is NOT static, nor done it change linearly, straightforwardly, or in the same direction every time. It is chaotic. It is NOT measured several times to ā€œget an averageā€. Temperatures were (are!) measured twice a day. The measurements are never repeated: The next day ā€“ under the next dayā€™s ā€œweatherā€ ā€“ gets two more unique measurements.

As John Daley used to say: All science is mathematics, but not all mathematics is science. I believe that he shared your concern.

richardscourtney
August 21, 2013 1:24 am

RACookPE1978 and The Pompous Git:
Sincere thanks for your posts at August 20, 2013 at 6:53 pm
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1395783
and August 21, 2013 at 12:51 am
http://wattsupwiththat.com/2013/08/17/monthly-averages-anomalies-and-uncertainties/#comment-1395923
respectively.
YES! You add to what I have been trying to say throughout this thread; i.e.
There is no known valid method to obtain an average global temperature and that is why
(a) there are several methods used to obtain ā€˜global temperatureā€™
and
(b) those methods are each often changed.
Hence, any presented determination of global temperature is misleading. Any datum for global temperature presented for any past time could be different next month and ā€“ history shows ā€“ it probably will be.

Whatever anyone wants to call determinations of global temperature;
the determinations and the determined values of global temperature are certainly NOT science.
Richard

cd
August 21, 2013 3:33 am

1sky1
This mandate is made all the more imperative by the empirical recognition that the temperature field is usually NOT spatially homogeneous (invariant, aside from a constant offset and a scale factor) over distances greater than a few hundred km,
Sorry I’m probably being a bit slow here but this all sounds very significant but I don’t know what it means. Do you mean that temperature varies (NOT spatially homogeneous), would that not make it variant rather than invariant; I’m assuming that you mean transform invariant (scale, translation and rotation). But still not sure why you raise the point and what it actually means.
Nor is it isotropic (directionally independent), as assumed by BESTā€™s universal ā€œcorrelation lengthā€ā€“i.e., their effective ā€œvariogram ā€
My understanding was that they used a deterministic (functional) temperature sphere to detrend the data => stationary. This may have also catered for structural and/or geometric anisotropy. If so then one would expect an isotropic variogam model. Although admittedly to derive the residuals from a deterministic model seems a little contrived.
And then thereā€™s temporal variabilityā€“a feature generally not treated adequately in geostatistics
Agreed but then they were only trying to reduce a data array to a single global value.
Thatā€™s why I prefer to work with cross-spectral techniques in estimating regional temperature variations.
Are you referring to the Blackman-Tukey method? Why on Earth would you do this? I can only guess that you’re defining regions then computing the power density spectrum via the autocovariance (BTW the “inverse” of the variogram) and then comparing these for each region. Why is this better?

Frank de Jong
August 21, 2013 8:34 am

Willis,
I guess you could run a couple of simulations to support or falsify your point. Take some ground truth (sinusoidal, or a real temperature data set if you want to be realistic). Add different types of noise. Apply their algorithms for determining monthly averages etc. Calculate the (simulated) errors in anomaly and climatology using your ground truth. This should give you a reasonable estimate after a few trial runs.
Frank

1sky1
August 21, 2013 4:03 pm

cd:
The tacit assumption of wide-range spatial homogeneity (uniformity of stochastic variation) is the justification for the objectionable practice of combining anomalies from DIFFERENT stations at DIFFERENT time-intervals to produce a long-term “regional” time-series from mere segments of data. In reality, outside a relatively narrow range, the anomalies DIFFER substantially over both space and time in most cases. Their stochastic behavior often changes quite abruptly in transitional climate zones between maritime and continental regimes or where mountain ranges intervene. And these changes are by no means uniform across the power density spectrum. In other words, the total correlation–either spatial or temporal–is not a fully adequate measure in discerning important differences.

1sky1
August 21, 2013 4:17 pm

cd:
WordPress flashing prompted me to post before I completed my thoughts:
Cross-spectrum analysis reveals the entire linear relationship between any pair of records, including the coherence and relative phase in each spectral band. (BTW, It need not be calculated by the B-T algorithm.) It’s almost a sine qua non for analyzing real-world time-series, instead of simplistic academic ideas. Can’t take more time to explain.

1sky1
August 21, 2013 7:17 pm

cd:
Found some time for a very brief addendum:
Cross-spectrum analysis (which is not just a power-density comparison) is indispensable not only in delineating areas of effective homogeneity, where anomalies from different stations can be legitimately be used to synthesize a longer time-series, but in identifying corrupted station records with non-climatic components, whose indiscriminate inclusion in regional averages introduces a bias.in the results. Hope this helps your understanding.

cd
August 22, 2013 2:34 am

1sky1
Thanks for your reply.
I think you’ve got the wrong end of the stick (or perhaps I’m misunderstanding you).
justification for the objectionable practice of combining anomalies from DIFFERENT stations at DIFFERENT time-intervals
As far as I am aware…
The aim of using any gridding algorithm is to get a global mean for a single point in time (there is no temporal component – the data is assumed to be static), say for a month (using monthly average station values). You do this for each month in order to build up a time series. That is all.
Obviously the controls for each month’s gridding run may vary through time but in order to get each point in the time series you are not mixing anomalies as you suggest.
Their stochastic behavior often changes quite abruptly in transitional climate zones between maritime and continental regimes or where mountain ranges intervene
But as I said they used a deterministic model of climate to remove such differences and to effectively produce a stationary data set. It is common – in fact often necessary – practice to remove local trends before Kriging. However, there are types of Kriging that can account for local/abrupt changes.
I admit that BEST’s approach seamed less conventional.
And these changes are by no means uniform across the power density spectrum.
Again I don’t know why this has anything to do with gridding in the current context?

cd
August 22, 2013 2:46 am

1sky1
Replying to your last comment.
Cross-spectrum analysis (which is not just a power-density comparison) is indispensable not only in delineating areas of effective homogeneity, where anomalies from different stations can be legitimately be used to synthesize a longer time-series
Look I know why one might use cross-spectrum analysis? But what on Earth has this to do with gridding?
Are you suggesting that this should be done prior to gridding in order to vet stations in terms of suitability. I’m not going to go there, as I don’t believe you can do this statistically. It is an experimental problem that cannot be done remotely without a reliable base case for every station to compare with – you don;t have an array of these and if you did you’d just use these instead. In the end you just go round in circles commonly compounding the bias or creating a new one. Again, everyone is data processing crazy in this field.

1sky1
August 22, 2013 5:27 pm

cd:
I believe that you are grabbing the wrong end of the stick. Gridding to obtain a single “global” value at a point in time is not the avowed purpose. In fact, BEST claims that gridding can introduce artifacts, which are avoided by using krigging. Their purpose is to synthesize long time-series as a continuous function of spatial position, which can be integrated over to obtain regional and global average time-series. I am simply pointing out the many ways that the data base and their analytic presumptions are not up to this daunting task.

August 24, 2013 2:37 pm

A self-serving and inaccurate recommendation, Carrick