Uncertain Uncertainties

Guest Post by Willis Eschenbach

Well, I’ve been thinking for a while about how to explain what I think is wrong with how climate trend uncertainties are often calculated. Let me give it a shot.

Here, from a post at the CarbonBrief website, is an example of some trends and their claimed associated uncertainties. The uncertainties (95% confidence intervals in this instance) are indicated by the black “whisker bars” that extend below and above each data point.

Figure 1. Some observational and model temperature trends with their associated uncertainties.

To verify that I understand the graph, here is my own calculation of the Berkeley Earth trend and uncertainty.

Figure 2. My own calculation of the Berkeley Earth trend and uncertainty (95% confidence interval), from the Berkeley Earth data. Model data is taken directly from the ClimateBrief graphic.

So far, so good, I’ve replicated their Berkeley Earth results.

And how are that trend and the uncertainty calculated? It’s done mathematically using a method called “linear regression”. Below are the results of a linear regression, using the computer program R.

Figure 3. Berkeley Earth surface air temperature, with seasonal anomalies removed. The black/yellow line is the linear regression trend.

The trend is shown as the “Estimate” of the change in time listed as “time(tser)” in years, and the uncertainty per year is the “Std. Error” of the change in time.

This gives us an annual temperature trend of 0.18°C per decade (shown in the “Coefficients” as 1.809E-2 °C per year), with an associated decadal uncertainty of ±0.004°C per decade (shown as 3.895E-4°C per year)

So … what’s not to like?

Well, the black line in Figure 3 is not the record of the temperature. It’s the record of the temperature with the seasonal variations removed. Here’s an example of how we remove the seasonal variations, this time using the University of Alabama at Huntsville Microwave Sounding Unit (UAH MSU) lower troposphere temperature record.

Figure 4. UAH MSU lower troposphere temperature data (top panel), the average seasonal component (middle panel), and the residual with the seasonal component removed.

The seasonal component is calculated as the average temperature for each month. It repeats year after year for the length of the original dataset. The residual component, shown in the bottom panel, is the original data (top panel) minus the average seasonal variations (middle panel)

Now, this residual record(actual data minus seasonal variations) is very useful. It allows us to see minor variations from the average conditions for each month. For example, in the residual data in the bottom panel, we can see the temperature peaks showing the 1998, 2011, and 2016 El Ninos.

To summarize: the residual is the data minus the seasonal variations.

Not only that, but the residual trend of 0.18°C per decade shown in Figure 3 above is the trend of the data itself minus the trend of the seasonal variations. (The seasonal variations trend is close to but not exactly zero, because of the end effects based on exactly when the data starts and stops.)

So … what is the uncertainty of the residual trend?

Well, it’s not what is shown in Figure 3 above. Following the rules of uncertainty, the uncertainty of the difference of two values, each with an associated uncertainty, is the square root of the sum of the squares of the two uncertainties. But the uncertainty of the seasonal trend is quite small, typically on the order of 1e-6 or so. (This tiny uncertainty is due to the standard errors of the averages of each monthly value.)

So the uncertainty of the residual is basically equal to the uncertainty of the data itself.

And this is a much larger number than what is usually calculated via linear regression.

How much larger? Well, for the Berkeley Earth data, on the order of eight times as large.

To see this graphically, here’s Figure 2 again, but this time showing both the correct (red) and the incorrect (black) Berkeley Earth uncertainties.

Figure 5. As in Figure 2, but showing the actual uncertainty (95% confidence interval) for the Berkeley Earth data.

Here’s another example. Much is made of the difference in trends between the UAH MSU satellite-measured lower troposphere temperature trend and ground-based trends like the Berkeley Earth trend. Here are those two datasets, with their associated trends and the uncertainties (one standard deviation, also known as one-sigma (1σ) uncertainties) incorrectly calculated via linear regression of the data with the seasonal uncertainties removed.

Figure 6. UAH MSU lower troposphere temperatures and Berkeley Earth surface air temperatures, along with the trends showing the linear regression uncertainties.

Since the uncertainties (transparent red and blue triangles) don’t overlap, this would look like the two datasets have statistically different trends.

However, when we calculate the uncertainties correctly, we get a very different picture.

Figure 6. UAH MSU lower troposphere temperatures and Berkeley Earth surface air temperatures, along with the trends showing the correctly calculated uncertainties.

Since the one-sigma (1σ) uncertainties basically touch each other, we cannot say that the two trends are statistically different.

CODA: I’ve never taken a statistics class in my life. I am totally self-taught. So it’s possible my analysis is wrong. If you think it is, please quote the exact words that you think are wrong, and show (demonstrate, don’t simply claim) that they are wrong. I’m always happy to learn more.

As always, my best wishes to everyone.

w.

4.8 37 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

465 Comments
Inline Feedbacks
View all comments
June 27, 2023 5:11 am

Wow, after reading 73 comments here about different ways to use/interpret statistics- then I see in the msm that climate science is “settled”- it’s obvious that the msm authors are even dumber than I thought.

Reply to  Joseph Zorzin
June 27, 2023 7:15 am

Formal measurement uncertainty analysis is not an easy subject, and there can be multiple paths taken to a final result.

June 27, 2023 10:05 am

I can’t argue with the analysis but the underlying GIGO problem remains. The datasets are full of junk (Garbage In).

bdgwx
Reply to  Willis Eschenbach
June 27, 2023 7:25 pm

(A digression to explain 0.289. Suppose we have 12 monthly numbers, each with an uncertainty of ± 1. The mean is the sum of each of the 12 monthly numbers divided by 12. The uncertainty on one-twelfth of each of those monthly numbers is 1/12.
The uncertainties add in quadrature, so the overall uncertainty =
square root of (12 times 1/12^2) = sqrt(1/12) = .289
So average monthly uncertainties = annual uncertainties / .289
End of digression …)

You are of course correct (at least for the uncorrelated case). If the challenge is applied equally to you as it is to Bellman, Nick, myself, etc, you are going to be flayed alive for this post and all manner of erroneous reasoning and algebra mistakes will ensue in an effort to rational the challenge.

FWIW…It turns out that monthly uncertainties exhibit some correlation so the 1/sqrt(N) rule doesn’t apply exactly. Instead of having an effective degrees of freedom of 12 it is actually closer to 2 so u(month) = u(annual) / sqrt(1/2) or u(annual) = u(monthly) / sqrt(2). But that’s a digression for another time.

Reply to  Willis Eschenbach
June 28, 2023 7:52 am

(A digression to explain 0.289. Suppose we have 12 monthly numbers, each with an uncertainty of ± 1. The mean is the sum of each of the 12 monthly numbers divided by 12. The uncertainty on one-twelfth of each of those monthly numbers is 1/12.

Be careful here. The measurand you are declaring is “annual mean temperature”. The data used to calculate the uncertainty of the mean is basically the SEM of the distribution used to calculate the mean, that is, (σ/√12). According to TN 1900 that should be expanded using a t-factor with degrees of freedom of 11. (I get 2.201) This coverage expansion will give a confidence of 95% that the actual mean lays within that interval.

I have never learned R so I can’t address your calculations.

I can say for baseline monthly averages in Topeka, Kansas I have found expanded experimental standard uncertainties for Tmax to be from 0.7 – 1,.1 °C and Tmin to be from 0.4 – 0.7 °C.

Reply to  Jim Gorman
June 28, 2023 8:53 am

The data used to calculate the uncertainty of the mean is basically the SEM of the distribution used to calculate the mean, that is, (σ/√12). According to TN 1900 that should be expanded using a t-factor with degrees of freedom of 11. (I get 2.201) This coverage expansion will give a confidence of 95% that the actual mean lays within that interval.

Just not true. The uncertainties quoted by BEST are for the 95% confidence.

Reply to  Bellman
June 28, 2023 9:48 am

Ha, ha, haaaa. You are saying that NIST and the people who wrote the GUM know less than BEST. Good luck with that. You need to take it up with them. I would advise you to have more references than just what BEST claims for their uncertainty.

Reply to  Jim Gorman
June 28, 2023 10:17 am

Pathetic.

No. I am not criticizing NIST in anyway. I’m pointing out that your assumption that the uncertainty intervals were standard uncertainties is wrong. They are the 95% confidence interval – a fact you could have figured out for yourself if you just checked the files.

% Temperatures are in Celsius and reported as anomalies 
% relative to the Jan 1951-Dec 1980 average. Uncertainties represent the 95% confidence 
% interval for statistical and spatial undersampling effects as well as ocean biases.

The fact you seem to think NIST invented the idea of multiplying a standard error by a coverage factor, or they discovered the Student distribution, says more about your understanding than it does about NIST.

Reply to  Bellman
June 28, 2023 11:50 am

Nothing from BEST that you have posted describes the calculations for determining the experimental standard deviation of the data. Basically all you are doing is repeating your opinion that BEST has done it correctly.

Do you really think I have had no exposure to NIST in my career? Dude, I am an electrical engineer. Standards for measurements have been part of NIST long before I was born. My first real experience to NIST was using NIST calibrated resistor (including a correction chart) for measuring resistance in a Wheatstone Bridge.

Here is a question for you to answer about BEST. When they calculated their anomalies, did they add the expanded experimental standard uncertainty of the monthly data to the expanded experimental standard uncertainty of the baseline to achieve a combined expanded experimental standard uncertainty of the anomaly?

Reply to  Jim Gorman
June 28, 2023 12:36 pm

Basically all you are doing is repeating your opinion that BEST has done it correctly.

All I’m doing is pointing out you are wrong to claim the quoted BEST uncertainty was the standard uncertainty and therefore had top be doubled to get the 95% confidence interval. Now whilst admitting you haven’t actually examined their methods, you assert that they could be lying when they claim the figure is the 95% interval.

Do you really think I have had no exposure to NIST in my career?

I didn’t suggest anything of the sort. What I suggested is you have a poor grasp of the methods and think that there is something novel in using a Student distribution just because NIST do it in one example. You keep talking about the NIST method, or the NIST protocol as if it’s anything other than the standard way of calculating a confidence interval for a small sample size.

Here is a question for you to answer about BEST.

Why should I. I’m not an employee or representative for BEST. It’s not even the data set I use most of the time. I couldn’t possibly tell you how good their Jackknife uncertainty estimate is or any of their methods.

There’s nothing stopping you from reading the documentation, or writing to BEST and telling them why you think they are doing it wrong (just as you keep insisting I write to NIST). Better yet, you could produce your own data set and describe its uncertainty.

Reply to  Bellman
June 28, 2023 1:03 pm

Why should I. I’m not an employee or representative for BEST. I

Because you are using them as a reference to show how other assertions like mine are incorrect. If you can’t show chapter and verse as I do in my references, then all you are doing is the argumentative fallacy of Argument by Authority.

I am using TN 1900 that has the data and calculations explicitly shown. They reference their procedure by using the GUM as the base.

The results of my investigation agree fairly well with TN 1900 when using their methods. Guess what, they are two orders of magnitudes greater than many of the data bases claim. Something is amiss with adherence to international standards for calculating expanded experimental standard uncertainties.

This is quote from one of Willis’s posts.

In the analysis that Berkeley Earth conducts, the uncertainty on the mean temperature is approximately 0.03 °C (0.05 °F) for recent years.

In order to obtain an anomaly, something is subtracted from something else. IOW, (x – y). “x”‘ and “y” are considered random variables by the GUM’s definition. That means when they are subtracted, the uncertainties are added via RSS. Doing this with temperatures recorded as integers would be something like 20 ± 0.02. That makes no sense. It would mean “20” essentially has no uncertainty!

Reply to  Jim Gorman
June 28, 2023 1:17 pm

Because you are using them as a reference to show how other assertions like mine are incorrect.

The only assertion of yours I corrected here was your assertion that the BEST uncertainty was the standard error of the mean and not a 95% uncertainty interval. If you think you are correct and the BEST file is mistaken, then you show the evidence and point out the mistake to BEST.

If you can’t show chapter and verse as I do in my references, then all you are doing is the argumentative fallacy of Argument by Authority.

I missed the part where you showed chapter and verse to prove that the BEST uncertainty was a SEM, whereas I did quote the exact passage showing they claim it’s a 95% interval.

I am using TN 1900 that has the data and calculations explicitly shown.

TN1900 makes no mention of BEST data, nor does it attempt to show how to construct a global average anomaly data set.

Guess what, they are two orders of magnitudes greater than many of the data bases claim.

When will you publish your highly uncertain global data set?

Doing this with temperatures recorded as integers would be something like 20 ± 0.02. That makes no sense. It would mean “20” essentially has no uncertainty!

Sorry, you’ve completely lost me with this gibbering. How are you getting an anomaly of 20? Is this for the global average or just one station?

Reply to  Bellman
June 30, 2023 3:51 am

I see I didn’t get to this to provide an answer. I didn’t mean for the 20 to be an anomaly. I intended it to be an example of an integer absolute temperature used to calculate an anomaly.

You couldn’t subtract 20.1 ±0.5 from 20.5 ±0.5 and get an anomaly of 0.4 ± 0.02! The best you could get would be 0.4 ± √(0.5² + 0.5²) = 0.4 ± .7

Reply to  Jim Gorman
June 30, 2023 7:47 am

“You couldn’t subtract 20.1 ±0.5 from 20.5 ±0.5 and get an anomaly of 0.4 ± 0.02! ”

Indeed, but the uncertainty of a base period is not going to be the same as for a single year. On the usual assumptions of randomness etc, the uncertainty should be only 1 / √30, and when you add the uncertainties this will have a tiny impact on the overall uncertainty. Then you are combining thousands of these individual uncertainties to get a global average.

If there is a systematic component in your uncertainties this reduces the uncertainty in the anomaly, because it’s the same in both the base period and the current observations.

And, as I keep trying to point out, none of this should have any effect on the trend, as you are just removing a constant from each annual value.

Reply to  Bellman
June 28, 2023 1:36 pm

It is the 95% interval for where the population mean might lie – it is *NOT* the uncertainty interval for the population mean!

Do you just not get it? How closely you can calculate the population mean has nothing to do with the accuracy of that population mean.

It is the uncertainty of the data that determines teh uncertainty of the mean, how precisely you can calculate the mean does not! If by some miracle you could calculate the mean down the 1 x 10^-1000000 digit, it *still* wouldn’t tell you the accuracy of that mean!

If the mean turns out to be an infinitely repeating decimal does that mean that the accuracy of the mean is perfect? No uncertainty at all?

Reply to  Tim Gorman
June 28, 2023 2:40 pm

It is the 95% interval for where the population mean might lie

Not strictly correct, but I take it you accept it is the 95% interval and not the SEM. So I guess it’s time for you to move the goal posts again and say the real problem is it’s not taking in all possible systematic errors.

It is the uncertainty of the data that determines teh uncertainty of the mean

It’s a lot more than that.

If by some miracle you could calculate the mean down the 1 x 10^-1000000 digit, it *still* wouldn’t tell you the accuracy of that mean!

Indeed, which is why it would be a pointless thing to do.

If the mean turns out to be an infinitely repeating decimal does that mean that the accuracy of the mean is perfect?

Gibberish. How could the mean turn out to be any specific figure if you don’t know what it is. Nobody cares what the exact mean is to more than couple of decimal places, and I doubt you would ever be capable of determining to that level of precision or accuracy. All temperature data sets are an approximation. You want them to be as good an approximation as possible, but you are never going to get that good an approximation.

Reply to  Willis Eschenbach
June 28, 2023 10:28 am

I tell my students to not report data beyond the first sig fig of the uncertainty, i.e.  0.183°C ± 0.037°C per decade becomes  0.18°C ± 0.04°C per decade.


Reply to  Phil.
June 28, 2023 12:37 pm

Let’s see, 0.18 – 0.04 = 0.14. That’s pretty close to UAH and now NOAA STAR.

Reply to  Willis Eschenbach
June 29, 2023 9:37 pm

Why not just factor the monthly cycle out (ShMth = 1 to 12 as factor) and work with residuals. That will remove the monthly means, which are a cycle having zero uncertainty. If you want to re-scale, just add back on the grand mean. [You did this anyway in Figure 4.] lm (data ~ ShMth), residuals are then seasonally stationary.

If you are ticklish about within-month variation, calculate day of the year (1-366) for each observation and factor that out (Data ~ ShYrDay(factor)), then calculate average monthly anomalies.

You could go further and deduct the SOI signal, which has no trend.

As such factors are additive I see no reason to build a complicated model while exploring the data.

You could also build a model with everything, then determine the proportion of total sums of squares (you need to calculate that) accounted for by each of the components by extracting the analysis of variance table for the model. Then build the model again by entering terms that have the biggest effect in rank-order.

What I’m uncomfortable with is that both datasets are constrained to 0,0 at time=0. Perhaps steve_showmethedata maybe able to throw some light on that issue.

Cheers,

Bill Johnston

ferdberple
June 27, 2023 1:59 pm

Take 3 numbers in straight line that you know are wrong. They were tampered. But since they are on a straight line the statistics tells us there is zero uncertainty.

ferdberple
June 27, 2023 2:08 pm

If the seasonal component is calculated as the trend to be subtracted from the actual, why do this last step. The seasonal component is the average trend. The difference is the deviation, not the average.

Reply to  Willis Eschenbach
June 27, 2023 3:26 pm

We will never, for example, be able to detect the difference between two pencil leads that differ by a few thousandths of a millimeter, by using only an ordinary ruler in the normal way.

This is correct if the resolution of the instrument is the limiting factor. Essentially the resolution imposes a systematic error. If all your measurements are identical it’s probably becasue there is a rounding error which is always pushing you to the same value.

The same is true with the usual weather thermometers, which have an uncertainty of 0.5°C. Yes, averaging will give us better answers, but there is a limit to that process, just as with measuring a pencil lead with a ruler.

I’d agree that there will always be a limit, but I don’t agree it’s the same as the pencil example. The difference is you are no longer measuring the same thing and you are not trying to reproduce any one measurement. The uncertainty caused by the resolution is no longer a systematic error. If I measure one temperature it’s as likely to be rounded up or down, hence each measurement has a random rounding uncertainty, which should cancel when averaging many different temperatures.

They say their monthly measurements have an uncertainty of 0.17°C and the recent annual measurements have an uncertainty of 0.05 … one decimal point past the instrument resolution of 0.5°C.

Whilst it might be a reasonable rule of thumb, the problem here is that many think it’s the instrument uncertainty that cause the uncertainty in the global averages. In reality it’s much more due to the sampling, and all the other factors that go into the calculations. You could be measuring with devices that gave the temperature to the nearest 0.001°C. It still wouldn’t mean you have a much more accurate global average, if the range of temperatures are varying by 10°C.

Reply to  Bellman
June 27, 2023 5:37 pm

 If I measure one temperature it’s as likely to be rounded up or down, hence each measurement has a random rounding uncertainty, which should cancel when averaging many different temperatures.”

You just can’t get it, can you?

The error in each measurement *will have* systematic uncertainty. You cant cancel systematic uncertainty. And since each measurement taken by a different device will have different random error and systematic error contributions you cannot just assume cancellation of anything. You *can* assume there will be a partial cancellation and this is recognized by adding the uncertainty growth using quadrature addition. That still results in the growth of uncertainty as you add more measurements.

You keep falling back on the old, disproven meme of “all error is random, Gaussian, and cancels”. You just can’t help yourself. You say you don’t assume that but it comes through in everything you post.

Reply to  Tim Gorman
June 27, 2023 6:33 pm

You just can’t get it, can you?

Get what? A comment from you that doesn’t start with an ad hominem?

The error in each measurement *will have* systematic uncertainty.

True in all probability. But again you are just moving the goal posts. The point I was responding to was not talking about systematic errors. If you have systematic errors all bets are off. You can measure your pencil tip to 0.1mm, but if your instrument adds 1cm to the result you can’t say anything about the uncertainty.

But the point still stands that you are less likely to have a consistent systematic error measuring different things with different instruments, than if you measure the same thing with the same instrument.

And since each measurement taken by a different device will have different random error and systematic error contributions you cannot just assume cancellation of anything.

Why not. Random errors will tend to cancel, different systematic errors will tend to cancel. As the GUM says, the distinction between random and systematic uncertainties are not clear cut – depending on circumstances random uncertainties can be systematic and systematic can be random.

The obvious solution is to make sure you measure as many different things with as many different instruments as possible to reduce the possibility of a persistent bias.

That still results in the growth of uncertainty as you add more measurements.

Your inability to grasp your simple mistake is getting beyond boring. Just remember you are the easiest person to fool. Just try reading any site or book on the central limit theorem. The standard error for a sum is standard deviation times root N, and for an average it’s standard deviation divided by root N. People didn’t discover those rules just to annoy you – they are based on sound maths, and easily demonstrated.

You keep falling back on the old, disproven meme of “all error is random, Gaussian, and cancels”

Lie, lie, lie. If you repeat these lies again I might start to get a bit grumpy.

Reply to  Bellman
June 28, 2023 6:37 am

True in all probability.”

No, it is true in ALL cases. No measurement device is perfect. There will *always* be systematic uncertainty. All that matters is how large it is.

If you are trying to calculate averages out to the hundredths digit then systematic uncertainty in the hundredths digit will impact the final value. Even the Argo floats are considered to have an uncertainty of +/- 0.5C. How can you then calculate an average of them out to the hundredths digit since part of that +/- 0.5C *will* be a systematic bias? You can’t “cancel” systematic bias*, only random error.

Do *YOU* know how much of that +/- 0.5C is due to systematic bias?

Reply to  Tim Gorman
June 28, 2023 9:32 am

Of course you can ‘cancel’ systematic bias it’s called calibration!

Reply to  Phil.
June 28, 2023 9:55 am

What happens after the instrument leaves the cal lab?

Reply to  Phil.
June 28, 2023 1:10 pm

Field instruments never survive with calibration intact. Even in a Stevenson Screen, there will be systemic bias exposed even if the thermometer stays calibrated, which it won’t. That is why there are calibration intervals for all measuring devices, thermometers included.

Reply to  Jim Gorman
June 29, 2023 2:40 pm

Dear Jim,

Do you have any data? Have you ever observed the weather, You say: “That is why there are calibration intervals for all measuring devices, thermometers included” – can you document that?

A thermometer (and a PRT-probe) is actually a robust instrument. They deteriorate over periods of decades not weeks, and deterioration is visually apparent. Also, reset temperatures provide a cross-check from day to day.

More likely a thermometer will be broken in service than deteriorate in service, or become un-serviceable due to a bubble forming in the column (rough handling; or wind-shake), or the scale becomes un-readable.

All the best,

Bill

Reply to  Phil.
June 28, 2023 1:40 pm

Do you own a time machine? How do you cancel the systematic error introduced by component drift for temperatures taken 6 months after the last calibration? Can you go back in time and do a recalibration in the past?

Reply to  Tim Gorman
June 28, 2023 3:26 pm

You recalibrate on a regular basis and modify your strategy based on performance.

Reply to  Phil.
June 29, 2023 1:08 pm

Yep. But even the calibration drift determined at calibration time can’t be used on past data because you will not know the the path the calibration drift took. It could have been positive because of one component and then drifted by negative because of another component. All you can do is estimate what the uncertainty interval is for the instrument and take the readings with a grain of salt. They may have been in the + side of the uncertainty interval part of the time and in the – side of the uncertainty interval at other times. What you *can’t* do is assume that the stated values are 100% correct and ignore the measurement uncertainty – unless, of course, you are a climate scientist.

Reply to  Willis Eschenbach
June 27, 2023 3:47 pm

The same is true with the usual weather thermometers, which have an uncertainty of 0.5°C. Yes, averaging will give us better answers, but there is a limit to that process, just as with measuring a pencil lead with a ruler.

As Clyde Spencer pointed out above, the number of repetitions for air temperature measurements can never be greater than one.

The square root of one equals … one.

Claiming 50 milli-Kelvin is absurd.

Reply to  Willis Eschenbach
June 27, 2023 6:58 pm

I have been trying to explain some of the differences in computing an uncertainty. The 0.5C you mention is pertinent to single measurands. It is a Type B a priori minimum uncertainty for each reading taken. NOAA/NWS has some good documents on accuracy of different types of measuring devices. I can’t find the document that gave early 20th century accuracy but I’m pretty sure that when NWS was established, the figure was ±2.0F. Min/max is 1.0F as is ASOS, CRN has a 0.3C accuracy. These are all pertinent to single readings.

When defining a measurand as a monthly average calculated from daily readings, one is dealing with experimental situations where the variance in the data itself provides the uncertainty. It is no different than running an experiment in chemistry where multiple experiments to determine reaction products are done. They are all as similar as one can make them but there will still be differences, i.e., variance. The mean of the experiments can be used to define an experimental value, but the uncertainty in that mean is the expanded experimental uncertainty as defined by the GUM and NIST.

To summarize, one must define what the measurand is and what the distribution of data consists of. Is it a single item, measured multiple times, with the same device? In that case, the DISTRIBUTION of those readings allows one to calculate a mean that is the true value for that single item. Is the distribution of measurements made up of multiple single readings? Then the data distribution is used to determine the uncertainty as outlined in NIST TN 1900.

NIST recommendations and the GUM were developed so that a common base of how to calculate both readings and uncertainty are used so repeatable findings are available. Their recommendations should not be cast aside for the purpose of calculating unreasonable values. My biggest pet peeve is seeing early twentieth century temperatures recorded to the nearest units digit used to obtain anomalies of 1/100ths. It just ain’t possible to do that without using statistics to create new information out the clear blue sky.

Reply to  Jim Gorman
June 27, 2023 9:04 pm

It is no different than running an experiment in chemistry where multiple experiments to determine reaction products are done.

It is important to recognize that in some disciplines, controlled experiments can be performed in the laboratory and multiple measurements can be performed. However, meteorological controlled experiments don’t exist and usually one only has one opportunity to take sample measurements of a particular air mass.

Reply to  Clyde Spencer
June 28, 2023 6:49 am

And calibrations drift over time.

bdgwx
Reply to  Willis Eschenbach
June 27, 2023 7:09 pm

his is borne out by the Berkeley Earth numbers I discussed here. They say their monthly measurements have an uncertainty of 0.17°C and the recent annual measurements have an uncertainty of 0.05 … one decimal point past the instrument resolution of 0.5°C.

BEST monthly anomalies and uncertainties are here. Recently the monthly uncertainty is about 0.03-0.05 C. The annual uncertainty is about 0.03-0.04 C. The reason why the annual uncertainty does not scale as 1/sqrt(12) (or 1/sqrt(13) for a 13m centered mean) is because of correlation. There is a similar problem with the grid uncertainties. If you computed the grid average you might naively think the average scales as 1/sqrt(15984) but since individual grid cell uncertainties are also correlated there isn’t 15984 degrees of freedom in the grid. It’s a similar problem with the UAH grid as well. There are 9508 cells but the effective degrees of freedom is only 26. That’s why Spencer and Christy only scale their global average uncertainty by 1/sqrt(26) instead of 1/sqrt(9508).

BTW…I did a type A evaluation of uncertainty between GISTEMP, BEST, and HadCRUT and got about 0.06 C which is consistent with BEST’s estimates. Note that the multi-dataset type A evaluation would include the component of uncertainty arising from methodology whereas BEST’s estimate does not. In that regard we expect the type A uncertainty to be slightly higher and indeed it is.

bdgwx
Reply to  Willis Eschenbach
June 28, 2023 8:33 am

I think that makes sense. When I estimate the uncertainty on the trend of the seasonal values repeated from 1979/01 to 2022/12 I get ±0.34 C/decade using the AR(1) method.

I think what is more interesting is that you get ±0.037 C/decade for the anomaly trend from 1960. I get ±0.044 C/decade from 1979 with my method (based on AR(1)) while the Cowtan calculator (based on ARMA) says ±0.029 C/decade and ±0.018 C/decade from 1979 and 1960 respectively. You and I are in the same ballpark. The ARMA method seems to underestimate the uncertainty IMHO. But, I”m not an expert in this so I don’t know.

Reply to  Willis Eschenbach
June 29, 2023 3:35 pm

You do realize that gives an interval of 0.423 to -0.050. Which makes 0.183 irrelevant. Any value within the interval is possible. That is why you would want an uncertainty of something like 0.183 +/- 0.005

Reply to  Willis Eschenbach
June 28, 2023 7:39 am

The NIST uncertainty machine (please note use of the term “machine”) is just a computer program and still subject to GIGO.

Putting the average formula into it is not valid because the formula is not a valid measurement model. Thus any results from the machine are invalid.

Reply to  karlomonte
June 29, 2023 1:16 pm

And it *still* dpen’t propagate measurement uncerainty in a valid manner.

Reply to  Tim Gorman
June 29, 2023 1:34 pm

Today bg tried to tell Pat Frank (!) that he needed to go play with the uncertainty machine after accusing him using the wrong math.

The cahones attached to this person are not small.

bdgwx
Reply to  Willis Eschenbach
June 28, 2023 8:44 am

I will say that BEST doesn’t actually use the correlation method which is more of a bottom-up approach. They use jackknife resampling which is more of a top-down approach. And I believe they jackknife at both the monthly and annual level separately. The reason why the ratio of uncertainty is changing as depicted in the blue line in the bottom graph is likely due to the changing nature of the station counts, density, and coverage. Anyway, we can reverse engineer the correlation based on an analysis of their monthly and annual uncertainties, but that’s not how they’re actually doing it.

Reply to  bdgwx
June 28, 2023 6:44 am

The monthly uncertainty is about 0.03-0.05 C”

That is the variation in the STATED VALUES. It assumes that actual measurement uncertainty is insignificant or totally cancels.

It’s the old meme of “all uncertainty is random, Gaussian, and cancels” that appears in climate science over and over again.

Reply to  Willis Eschenbach
June 28, 2023 10:13 am

Yes Willis that’s why when choosing an instrument to measure a quantity I’d want a resolution that was significantly smaller than the range of values of interest. For measuring a pencil lead I’d use my micrometer (just did it, my pencil lead is 0.58mm (nominal 0.5mm), second time 0.56mm). 😉 You’d like to be able to measure a quantity with a distribution of values over about ten values.

Reply to  Phil.
June 28, 2023 1:05 pm

You sound like someone with a machinist’s background.

Reply to  Jim Gorman
June 28, 2023 3:18 pm

No, a scientist who’s run research labs for ~50 years and developed measurement systems.

Reply to  Willis Eschenbach
June 29, 2023 2:48 pm

No they don’t. Australian Celsius thermometers have 0.5 degC indexes. Instrument resolution (uncertainty) is therefore 1/2*0.5, rounds up to 0.3 degC.

Cheers,

Bill

Reply to  Bill Johnston
June 29, 2023 3:21 pm

And here Bill’s formal thermometer uncertainty analysis ends.

June 28, 2023 7:43 am

The trend is shown as the “Estimate” of the change in time listed as “time(tser)” in years, and the uncertainty per year is the “Std. Error” of the change in time.

What were the standard deviations of all the averages performed in this process?

Reply to  karlomonte
June 29, 2023 1:18 pm

In the cult of climate science, variance doesn’t add when you combine random variables so yo can just ignore standard deviation in all calculations.

Reply to  Tim Gorman
June 29, 2023 1:36 pm

Don’t forget that after ignoring all the variance, you then go back into historic data and reduce the values to eliminate “bias”.

Reply to  Tim Gorman
June 29, 2023 1:42 pm

In the cult of climate science, variance doesn’t add when you combine random variables

Are you ever going to figure out that you can combine random variables in more than one way?

When you add random variables the variances add. When you divide random variables by a constant the variance scales by the square of the constant. What happens when you take an average of random variables is left as an exercise for the reader.

Reply to  Bellman
June 29, 2023 5:42 pm

When you combine Minnesota temps in July with Brazil temps in July what are you doing?

Reply to  Tim Gorman
June 29, 2023 6:04 pm

What do you want to do with them? If you want to add them you add them. If you want to average them you average them. I’d have thought you’d be able to work that out by now.

If you were adding them, the variance would add, and you would be left with a meaningless sum becasue as we all know temperatures are intensive so their sum is meaningless.

If you want their average, which is slightly more meaningful, you have to add then divide by 2. var(M + B / 2) = (var(M) + var(B)) / 4.

Reply to  Bellman
June 30, 2023 12:08 pm

Are you ever going to figure out that you can combine random variables in more than one way?

That is not true. When you find the mean and variance of a distribution, you follow the GUM and standard statistical practice.

———————————————
Section 4.2.1 as follows.

q̅ = (1/n)Σqₖ

As you can see, the sum of the data points are divided by n. I think you and bdgwx have a misunderstanding concerning the definition of a functional relationship. The mean is ALREADY defined in statistical terms. You can not divide the mean by “n” again. As I show below, this is what you would need to do if you divide “s²(qₖ)” or “s²(q̅) by n more than once.

You will notice that the equations below utilize “n” in the computations. Even though “n” was used in the definition of q̅, the equations are not divided again by “n”. Your argument is that “s²(qₖ)/n” should be the appropriate mathematical operation since the mean is a sum divided by “n”. That’s hokum analysis.
———————————————

Section 4.2.2

The experimental variance of the observations, which estimates the variance σ² of the probability distribution of q, is given by

s²(qₖ) = (1/(n-1))Σ(qʲ – q̅)²

Please note, “s²(qₖ)” is not divided by “n” again, even though the mean was divided by “n”.

———————————————-

Section 4.2.3

The best estimate of σ²(q̅) = σ²/n , the variance of the mean, is given by

s²(q̅) = s²(qₖ)/n

Please note, “s²(q̅)” is not divided by “n” again, even though the mean was divided by “n”.

As to your last statement.

When you divide random variables by a constant the variance scales by the square of the constant. What happens when you take an average of random variables is left as an exercise for the reader.”

Anomalies are not calculated by dividing anything. The are calculated by “Tᵐᵒⁿᵗʰ minus Tᵇᵃˢᵉˡᶦⁿᵉ. Just plain old subtraction, no dividing, no multiplying. Consequently, their variances just add.

Now let’s discuss finding the average of two random variables which is where I believe you are getting the “dividing by a constant”. Look closely from the image I have uploaded from Dr. Taylor. It shows two distributions with means of X and Y. These are histograms. The x-axis contains the value of random variables. The y-axis contains the number of times the value of the random variable appears, its probability if you will.

Now if you add X and Y, you move the resulting mean to a new point on the x-axis, but the y-axis values are a combination of the two distributions and you end up with a new distribution with a new variance, i.e., .

If you then divide (X + Y) by 2, you don’t change the y-axis frequency values, you simply move the distribution to a different point on the x-axis. That operation does not change the variance which is determined by the frequency values. Only by changing the y-axis frequency values, i.e., the shape of the distribution, can you change the variance of the distribution.

random variable addition.jpg
Reply to  Jim Gorman
June 30, 2023 1:49 pm

That is not true

If X and Y are two random variables you can combine them by X + Y, XY, X / Y, (X + Y) / 2, or many other ways. Just saying it isn’t true doesn’t make it so.

q̅ = (1/n)Σqₖ

Yes, that’s the formula for an average. It says nothing about the uncertainty.

You can not divide the mean by “n” again.

Well you can, but I don’t see why you would want to.

Your argument is that “s²(qₖ)/n” should be the appropriate mathematical operation since the mean is a sum divided by “n”.

No. s²(qₖ)/n². To be clear that gives you the variance of qₖ / n.

s²(qₖ) = (1/(n-1))Σ(qʲ – q̅)²

Yes. That’s the formula for the variance of qₖ.

Please note, “s²(qₖ)” is not divided by “n” again, even though the mean was divided by “n”.””

Why would you. You are not at that point calculating the variance of the mean, but the variance of one element qₖ.

s²(q̅) = s²(qₖ)/n

Yes, that’s the variance of the mean of n lots of qₖ. It’s the sum of n qₖ’s divided by n^2, as I said.

s²(q̅) = (s²(qₖ,1) + s²(qₖ,2) … + s²(qₖ,n)) / n² = ns²(qₖ) / n² = s²(qₖ) / n

Please note, “s²(q̅)” is not divided by “n” again, even though the mean was divided by “n”””

Why would you. It’s the single division that results in the variance being smaller.

Of course the final step is to take the square root so this becomes

s(q̅) = s(qₖ) / √n



Reply to  Bellman
June 30, 2023 2:02 pm

Anomalies are not calculated by dividing anything.

If you are averaging anomalies or anything else you are dividing by something.

The are calculated by “Tᵐᵒⁿᵗʰ minus Tᵇᵃˢᵉˡᶦⁿᵉ. Just plain old subtraction, no dividing, no multiplying. Consequently, their variances just add.

Yes, that’s how you calculate one anomaly. But then you want to average them to get an average anomaly.

Now let’s discuss finding the average of two random variables which is where I believe you are getting the “dividing by a constant”.

You’ve already demonstrated that in all your examples from GUM.

These are histograms

They’re not. But let’s not split hairs.

If you then divide (X + Y) by 2, you don’t change the y-axis frequency values, you simply move the distribution to a different point on the x-axis.

No you don’t. You have to scale all points on the x-axis. Suppose X and Y are both throws on a 6-sided die. There is a 1/36 chance of both coming up as 6, which would be 12 on the X + y graph. On the (X+Y)/2 graph that would be a 6, so there has to be the same chance of hitting a 6 on the average graph as there is of hitting a 12 on the sum graph. The same for all values. The average graph has to be indentical to the sum graph, but shrunk by 1/2.

Reply to  Bellman
June 30, 2023 5:24 pm

“””””“These are histograms”

They’re not. But let’s not split hairs.”””””

Now you re being stupid. The x-axis are values even if the bins are infinitely small. The y-values are frequencies/probabilities.

Of course you don’t want split hairs because you can’t figure out an intelligent response.

It is obvious that the x-axis has the values or it wouldn’t be labeled X & Y, would it.

You are even using an example that doesn’t generate a normal distribution. Do you know the distribution you are basing your opinion on?

Graphing numerical values against their frequency is a histogram and is an indication of the probability.

Reply to  Jim Gorman
June 30, 2023 5:32 pm

Now you re being stupid. The x-axis are values even if the bins are infinitely small. The y-values are frequencies/probabilities.

Sorry, hadn’t realized Taylor actually counted an infinite number of real data points rather than just drew a normal distribution.

Of course you don’t want split hairs because you can’t figure out an intelligent response.

Says someone who’s ignored everything else I said, but wrote a whole comment to attack a throwaway remark.

It is obvious that the x-axis has the values or it wouldn’t be labeled X & Y, would it.

Not sure what that’s a response to.

You are even using an example that doesn’t generate a normal distribution.

Missing the point by a mile and obsessing over normal distributions again.

Geoff Sherrington
June 29, 2023 12:20 am

In my younger days I was part owner of an analytical chemistry laboratory that I equipped with the best of instruments that we could find.
Repeatedly, we were unable to alanlyse to an accuracy anything like as good as the manufacturers claimed.
Saldy, our competitor labe were quoting performance from the manufacturer’s literature, so we started to lose business because the other labs did not quote their own performance.
I became reposnsible for the accuracy of uranium analysis from the important new discovery of Ranger One. With the Australian atomic Energy Commission, we optimised uranium analysis with delayed neutons from immersion in the MOATA reactor. They were the best analystical results I ever saw. Not being able to buy a MOATA for myself, I got out of the business because it was costing me too much sleep.
The question that arisesis, how many people commenting here havce had dirty hands from working with concepts l;ike accuracy, precision, error, detection limits, uncertainty? I can suggest that you get different results from text book knowledge compared to working with the measurements.
Dr Bill Johnston writing here knows this and writes about it.
….
Attempting an overall summary (which is far too brief to be 100% correct) I suggest that the biggest blunder in assesing uncertainty is failibg to regard all possible sources of causative errors. It tried to make this p[oint with sea level work that fails to acknbowledge that movement of the walls of the oceans. You cannot calculate a meaningful uncertainy until you have measured the change in ocean volume with time caused by change in the walls and floors that make the vessels that the seas fill. To my knowl;edge, this has not yet been done.

There is also termionological inexactitude because different authors here have different impressions of what uncertainty means. Pat Frank has tried to explain it, plus the confusion with error, in regard to his paper on cumulative errors in climate models because of the errors in measuring cloud effects.
Geoff S

Reply to  Geoff Sherrington
June 29, 2023 7:31 am

Geoff—The problem you write about demonstrates why laboratory accreditation according to ISO 17025 is so important. Accredited labs are periodically audited and must provide detailed uncertainty analyses according to the GUM for any numeric values they measure. I can tell you from first-hand experience that these documents can run to many pages. Additionally, labs must perform inter-laboratory comparisons with other labs — here is where the rubber meets the road and problems can’t be easily hidden.

Just declaring “we are the best!” shouldn’t and doesn’t cut it.

Reply to  karlomonte
June 29, 2023 2:13 pm

Given the background you allege karlomonte, how can you possibly claim that the use of instruments that have calibration certificates from NATA certified labs, or in the olden days, were compared with Kew standards maintained by colonial meteorologists (whose was to test and certify instruments – chronometers, barometers and thermometers) were ‘inaccurate’ or not suitable for measuring the temperature of the air, which was their purpose.

While I have checked the calibration of instruments that may be suspect, colleagues and I had no reason to believe the instruments were not accurate or fit for purpose. Further, the whole issue of laboratory certification and instrument standards is relatively new. I don’t think too many postmasters at Albany in the early 1900s worried much about the state of their Stevenson screen or thermometers, there was no accreditation (or even training) until post-WWII (1946), or even as lste as when metrication happened in Oz in 1972.

As someone who is experienced at the business-end of using instruments, it becomes tiresome being lectured at by people with little or no experience in either using instruments in the field, or assessing the quality of data up to a century after they were were collected.

No one observing the weather runs around with a bunch of GUM documents; likewise those writing and ruminating about GUM, NIST and NATA certification etc. seldom observe the weather. Some don’t know the difference between a dry-bulb and recording thermometers (max & min), or even how a thermometer works, yet they still comment as though they do.

Same applies to statistics and I have worked with several talented statisticians. However, while they work with data and may provide advice, they seldom go out in the field and actually take measurements.

All the best,

Bill Johnston

Reply to  Bill Johnston
June 29, 2023 3:22 pm

Any experience with ISO 17025, Bill?

Reply to  karlomonte
June 29, 2023 8:19 pm

Don’t need to, neither does your healthcare professional need to, or your local meter-reader. ISO 17025 also only came into existence in 1999 so it was not much use for weather observers at Albany in 1907.

That is why the Kew (England) standards were adopted (and maintained) by astronomers and meteorologists located at observatories around the world. Kew standard instruments were very expensive to purchase …. I could go on but you are probably nodding-off by this.

You have provided some useful insights.

All the best,

Bill

Reply to  Bill Johnston
June 29, 2023 8:36 pm

Then you don’t know WTF you are typing about.

There is a whole lot more to metrology than just met station thermometers in Aust.

Oh and BTW, NOWHERE did I state anything remotely like of the sort you alleged here:

Given the background you allege karlomonte, how can you possibly claim that the use of instruments that have calibration certificates from NATA certified labs, or in the olden days, were compared with Kew standards maintained by colonial meteorologists (whose was to test and certify instruments – chronometers, barometers and thermometers) were ‘inaccurate’ or not suitable for measuring the temperature of the air, which was their purpose.

You are now in Nick Stokes territory—lying.

Reply to  karlomonte
June 29, 2023 9:55 pm

Dear karlomonte,

I’m not lying at all. Met equipment has an interesting history and arguably one of the most well preserved observatories is Sydney Observatory.

Compared to the length of time observations have been made, NATA and ISO 17025 are relatively new (1999). What do you think happened before that? And yes, our organisation had a NATA certified laboratory.

While I do not know anything about you, and you prefer anonymity, you give the impression that you are familiar and knowledgeable about this (i.e., a reliable source of information).

However, your grumpy response and name-calling is not in the least self-flattering, so please desist.

Yours sincerely,

Dr Bill Johnston

Reply to  Bill Johnston
June 29, 2023 4:54 pm

My biggest problem is when examining temps in the U.S. before 1980 they are all recorded to the nearest integer. Yet somehow the statisticians tease out anomalies in the 1/1000th decimal place and uncertainty to match. That just isn’t possible from temps with resoltion in the units digit. All of my lab teachers in college would have failed me if I had tried that. And that was before calculators.

My engine customers would have craped if I used a slide caliper to obtain readings in the 1/1000ths with uncertainties in the 1/1000ths uncertainty.

Reply to  Jim Gorman
June 29, 2023 5:58 pm

Yet somehow the statisticians tease out anomalies in the 1/1000th decimal place and uncertainty to match.

Could you point to one data set that claims uncertainty of less than 1/100th a degree? Berkley Earth gives monthly uncertainties of a few hundredths of a degree at best.

All of my lab teachers in college would have failed me if I had tried that. And that was before calculators.

Were you making averages based on tens of thousands of observations at college, without a calculator?

Reply to  Willis Eschenbach
June 30, 2023 7:35 am

Thanks. But that isn’t a global data set. Maybe I should have been clearer, but I was responding to Jim Gorman’s comment that surface data before 1980 was only reported as an integer.

Reply to  Bellman
June 30, 2023 3:55 am

If you see an uncertainty of 0.013, what decimal point do you think the uncertainty was calculated to before rounding?

Reply to  Jim Gorman
June 30, 2023 7:49 am

I would hope it was calculated to as many decimal places as possible. There’s zero point in prematurely rounding and intermediate values. (Assuming you are not doing it by hand).

Reply to  Geoff Sherrington
June 29, 2023 2:15 pm

Spent many an hour overhauling enginesin my father’s shop. Between racing engines and diesel farm engines expected to run for thousands of hours, measurements were critical. It was always the last digit that was the problem.

Spent many an hour measuring RF amps for bandwidth and noise figure. Again, that last digit was the problem when dealing with microvolts.

All of this gives one a very large appreciation for measurement uncertainty. One unhappy customer because you screwed up is quite a teacher.

steve_showmethedata
June 29, 2023 4:18 pm

Willis wrote about Figure 6 :However, when we calculate the uncertainties correctly, we get a very different picture”. “Incorrect vs correct” with no proof why he is correct and standard methods are wrong! Willis is absolutely wrong in the statistical methods he is applying to the inferences he is making. Bill Johnson stated one of these errors in his third point in more deferential language “you may be confused …” (June 26, 2023 3:52 pm) but I am less inclined to be deferential. Again Willis is absolutely, unequivocally wrong! See my post dated June 29, 2023 1:04 am.

Reply to  steve_showmethedata
June 29, 2023 10:01 pm

Dear steve_showmethedata,

I wonder if you could comment on the implications that the starting point of both temperature series is x=0, y=0, on examining differences between regressions.

Yours sincerely,

Bill Johnston

steve_showmethedata
Reply to  Bill Johnston
June 30, 2023 9:38 pm

Dear Bill

The data in Fig 6 are temperature anomalies which means the original mean temperature records for each series have been adjusted to be differences from the mean temperature for a given starting i.e. base year of 1979. So both data series have been artificially constrained to have zero anomaly at that base year. This raises an important issue that the adjustment is different for each series where that difference is unknown to the reader of Fig 6. Therefore, a potential difference between the two series has been removed artificially. That said if you accept that then the test between series is now only for the slopes of the lines from the base year which assumes a zero intercept at zero value of Year_adj=(year-base_year) (i.e y=0 at x=0). You can fit that in R by excluding the intercept as lmTA <- lm(formula=TempAnom ~ -1+SeriesFactor.Year_adj, data=…). The SE of the predicted regressions for each series is obtained using predict(lmTA, se.fit=T). However, in determining how the trend for the two series differ over a given year range I would use the average temperature itself and not the anomaly which as described above hides a difference in the two series from the getgo. I would fit a standard regression with estimated intercepts and test for common slopes and separate intercepts using the extra sum of squares F-test (i.e. H0: the extra parameter specifying series-specific slopes is zero). If H0 is not rejected I would test for common intercepts as well and therefore identical regression lines. If the intercepts are statistically significantly different that would be of interest but using temperature anomalies you don’t get to do that test. Importantly, a conclusion of common slopes and different intercepts for average temperature is a quite different result to different slopes for temperature anomaly since how do we know that differentially adjusting the data and fitting zero intercepts has not contributed to the difference in slopes in Figure 6!

The Figure 3 in this paper (DOI: 10.1002/aqc.2373),
for which I did all the stats analysis while I was Senior Applied Statistician
at the Australian Antarctic Division, was investigating sink rates of pelagic
longlines as a component of our research on mitigating seabird bycatch. The
data are logically constrained to be zero depth at zero time and variation of replicate
profiles about their average increases with time and thus depth. The graphical
method I developed for this paper to compare differences between two or more
regression lines (in this case cubic regression splines) can be applied to any
regression/curve fitting analysis (e.g. empirical distributions using quantiles
as the response as in my Fisheries Research paper; http://dx.doi.org/10.1016/j.fishres.2014.05.002).

Reply to  steve_showmethedata
July 1, 2023 12:53 am

Thank you Steve.

I believe your “This raises an important issue that the adjustment is different for each series where that difference is unknown to the reader of Fig 6. Therefore, a potential difference between the two series has been removed artificially.” is a vital point missing in many conversations.

While I am not particularly interested in the data in question (and I don’t have it), if I could pose a question:

As ‘trend’ is the issue under discussion, and the intercept (if there is one) is mostly not informative, why not add a dummy-value to each of the datasets to separate them by an interval, and then test for interaction using lm and AOV? While there is no covariate (but SOI could be), the model could then be lm(data ~ Shseries(factor) * year).

While lm summary would show a difference between the ‘intercept’ and series 2 (the dummy offset between the two), my question is would AOV validly indicate if the interaction term was significant (i.e., that slopes were not homogeneous)?

Feel free to roast me if I’m barking up the wrong tree.

Yours sincerely,

Bill Johnston

Reply to  Bill Johnston
July 2, 2023 12:33 am

Steve,

On re-reading your post above, the question I put was indeed barking up the wrong tree. Assuming any intercept (including x-0, y=0), will fix the slope.

Thanks for your insights,

Cheers,

Bill

steve_showmethedata
Reply to  Bill Johnston
July 2, 2023 2:15 am

Hi Bill

Thanks for pointing to the issue we both see as crucial of the hidden difference due to the adjustment to give zero anomalies for each series.

I am not sure what your point is with reference to “add a dummy-value to each of the datasets to separate them by an interval, and then test for interaction using lm and AOV?“.
The lm you give of
lm(data ~ Shseries(factor) * year)
specifies main effects (different intercepts) combined with a year regression and an extra parameter quantifying the difference of the level2 slope from the level1 slope as the interaction with the Shseries factor, Because there are just 2 levels of this factor, the H0 of equal slopes equivalent to this extra parameter being zero can be tested using the t_statistic of that parameter estimate divided by its standard error and comparing this to the nominal alpha-level two sided t-distribution critical value.
Not sure what extra I can add.

Reply to  steve_showmethedata
July 1, 2023 7:47 am

You are only considering the stated values and not their uncertainty. Go down the data set changing all even numbers to subtract the uncertainty and then run your R program. Then change them to add the uncertainty and rerun it again. Bet your slopes and intercepts all come out differently. Then do it for the odd numbers.

This is all meant to show that when you consider uncertainty there is a multiplicity of possible trend lines, up, down, sideways, and even zig-zag. Which on is the true regression line.

steve_showmethedata
Reply to  Tim Gorman
July 2, 2023 2:43 am

The data points use temperature means so if the corresponding sample sizes (m_i, i=1…n) are insufficient to make the sampling variances of these (mean) data points negligible in relative terms then the specific estimates of sample variance of each mean can be included as known values within the n x n diagonal matrix added to the R-matrix (see my post dated June 29, 2023 1:04 am) assuming the within-mean data values are independent. R packages such as MCMCglmm can do this using the mev option (see my paper DOI:10.9734/ARRB/2021/v36i1230460 for the details). If there is a measurement resolution error variance this can also be included in the same way noting that this variance is also divided by the sample size m_i (despite what some on this website assert, see my ResearchGate post; https://www.researchgate.net/publication/366175488_Response_to_WUWT_Plus_or_Minus_Isn't_a_Question).

Reply to  steve_showmethedata
July 2, 2023 7:14 am

Having read your paper, I question the following.

“””””So that uncertainties in the sample mean value DO get smaller as the sample size increases with all variances (i.e. both sample variances of the true values and stochastic measurement error variances) scaled by dividing by the sample size.”””””

Total variance in a sample may or may not decrease by larger sample size. If samples absolutely mirror the population distribution, then no change in individual samples’ variance will occur. That is not likely, so increasing sample size can result in less variance or more variance, i.e. sampling error.

Increasing sample size does not affect the values of the data nor does it affect the values of uncertainty. Variance is a property of the distribution of data, not the value of the data itself. In turn, uncertainty is attached to the data and is not changed by performing statistical calculations. Dividing uncertainty by “n” is an invalid operation.

Mathematicians must remember that neither means nor variance is measured data. They are statistics of a sample distribution or statistical parameters of a population. A mean is a calculated value of the central tendency in a group of data. A variance is a description of how the range of data changes. Uncertainty of measurements has rules of propagation when combining data. Uncertainty adds, either directly or in quadrature and these operations are not changed by the number of data that are available. The number of data points added together can only increase the total uncertainty.

steve_showmethedata
Reply to  Jim Gorman
July 2, 2023 5:35 pm

Thanks for taking the time to read my short unpublished note on ResearchGate. The above statement you quote follows from mathematically explicit definitions and calculations for the stochastic error model discussed. Please respond in mathematically explicit terms what you think is incorrect with the mathematical statistics I describe and the conclusions I draw. Your loosely connected, poorly defined and at times ambiguous non-mathematical expressions of what you are trying to communicate is not the language of theory and practice used in professional discourse. If the maths is hard to present here (I often struggle with that limitation, thus the ResearchGate short note) then do a similar thing and post a link. This WUWT post by Willis with its ill-defined descriptions through lack of mathematical language combined with dogmatic and unproven assertions that are supposed to have us in awe by their overturning of well proven, standard textbook statistical theory by such self-taught “experts” suffering from a high degree of hubris demeans the credibility of this website. Refusing to respond to legitimate and rigorously proven critiques is not true science which is meant to be self-correcting (by “self” meaning scientists at times correcting other scientists for the common goal of advancing theory and practice through best-practice data collection, processing, and analyses including theoretical/mathematical/statistical modelling). It appears that the claims of being willing to be corrected and to learn were insincere. Hint: get a professional statistician to look over your new “proving the experts wrong again” statistical methods before you publish.

Reply to  steve_showmethedata
July 2, 2023 5:49 pm

I don’t need to prove anything. You have made assertions in your paper that are not necessarily correct in a measurement environment. You need to show how your assertion fits into the GUM or a NIST framework dealing with measurements. These sources are the gospels for finite measurements.

steve_showmethedata
Reply to  steve_showmethedata
June 30, 2023 1:18 am

Willis wrote: “I’ve never taken a statistics class in my life. I am totally self-taught. So it’s possible my analysis is wrong. If you think it is, please quote the exact words that you think are wrong, and show (demonstrate, don’t simply claim) that they are wrong.” (I’ve done that using standard statistical theory and even pointed to one of my own contributions to the statistical methods literature that is relevant to the discussion). “I’m always happy to learn more”. (Really???). So is the above outbreak of humility for the present or just aspirational??

steve_showmethedata
Reply to  steve_showmethedata
July 4, 2023 12:02 am

I did not see a response from Willis to this post of June 29, 2023 1:04 am, but I see now there was a non-mathematical response to half my critique (i.e. no response to the invalidity of using confidence bound overlaps including the published papers I referenced) to the original response posted after the date of the above post June 29, 2023 4:18 pm . So I did get a response of sorts but was confused by no response to the June 29, 2023 4:18 pm that generally challenged the statistical methods in this article. So I withdraw comments about no response at all but my defence is the above lack of response and thus lack of a cross-reference. The bottom line is that it is useless and totally frustrating trying to debate statistical methods with one side using no mathematical notation. That is an exercise in futility. All technical discussions in peer review statistical methods literature speak in the language of mathematics for obvious reasons. I cannot respond to strings of English sentences that are so vague and poorly defined in the absence of clear mathematical expressions and proofs so that it becomes impossible to have a proper technical discussion. It is shear frustration trying to argue against what are commonly maths-free, techno-jargon-sounding word salads including deflections and non-sequiturs. Confusion due to maths-free and thus poorly defined methods will continue and will result the inability to clearly challenge incorrectly applied statistical methods!

Classic case, Willis said about my post, “I see nothing in your comment that refutes that” (i.e. “that”=his counter-argument) with no specific reference to the maths I present or the paper I reference demonstrating that it is incorrect to use overlapping confidence bounds for the inferential purpose Willis used them. I said in my post “Given how he is applying regression methods in the inferences about uncertainty which should involve only estimation error and not sampling error as well (see below) he is incorrect in using only the residual error variance and a t-statistic to calculate a CI about the regression line.” and “So neither (1) nor (2) correspond to using only sig2_hat combined with a t-statistic with appropriate DFs but (1) is the correct method for the inference he is making.” But Willis says effectively about those challenges to the methods he uses “nothing to see here” using the above vague motherhood statement!! What! It is patently just a method of deflection “I see nothing…”! In some ways that’s worse than no response, because it is a pretense at a response since it doesnt engage at all with the arguments I make! That’s even more frustrating than the lack of maths notation!

KB
June 30, 2023 3:54 am

So … what is the uncertainty of the residual trend?
Well, it’s not what is shown in Figure 3 above. Following the rules of uncertainty, the uncertainty of the difference of two values, each with an associated uncertainty, is the square root of the sum of the squares of the two uncertainties. But the uncertainty of the seasonal trend is quite small, typically on the order of 1e-6 or so. 

I’m not sure about this. If you plot any data and calculate a trend for it, the uncertainty on that trend comes only from the scatter on the data you have plotted. It does not matter how each data point has been calculated, although there is a presumption that the uncertainties on the points are approximately equal to each other.
The fact that you have detected a statistically significant trend means exactly that there is a non-random source of variation in the data set. If the uncertainty on each point was truly high then a statistically significant trend wouldn’t be detected. The plot would be a random scatter.
I too carry no qualifications in this field but I still think there is a fundamental misunderstanding in this article.

Reply to  Willis Eschenbach
June 30, 2023 5:00 pm

Prototypical skewed distribution. Hard to believe anyone believes you can get any good statistical parameters from this. Any standard deviation from this will be worthless from a probability standpoint. I expect a 68% interval will be +2, -12 and a mean of 29?

To evaluate this would require proper sampling to insure you could get a normal sample means distribution.

Reply to  Jim Gorman
July 1, 2023 1:14 am

So would you not log transform in the first instance, and reason that solar radiation is not normally distributed over the surface?

Although attenuated by the monsoon, radiation(max) occurs between the tropic meridians and diminishes poleward. Data also shows possible bi-modality around where Tanom = 5 degC. Superimposing an |absolute latitude| band on the x-axis would aid interpretation.

Cheers,

Bill Johnston

Reply to  Bill Johnston
July 1, 2023 6:03 am

A log transform wouldn’t change the data, just the way it looks on the page.

Good catch on the modality. The average of a multi-modal distribution is of even less use than one of a skewed distribution.

Have climate scientists never heard of the 5-number statistical descriptor that is many times used for skewed distributions? Or would that cause everyone to question their assumption that all error is random, Gaussian, and cancels?

Reply to  Tim Gorman
July 1, 2023 2:10 pm

Tim, you say “Have climate scientists never heard of the 5-number statistical descriptor …” Of course they have. Most use R these days. Many would also use a basic package such as PAST from the University of Oslo, to undertake preliminary analysis. PAST will provide a data summary that includes the quartiles ranges, medians etc.

One thing to do would be to create the histogram graph as above, drop on a normal distribution curve and transform the y axis to logs (by ticking the box), and see what happens.

Also, draw a Q-Q plot using raw vs. log-transformed data ….

There are some pretty knowledgeable and cluey climate scientists around ….

All the best,

Bill Johnston

Reply to  Bill Johnston
July 2, 2023 5:07 am

If they have heard of it then why do we never see them use it?

Reply to  Tim Gorman
July 2, 2023 1:55 pm

You don’t know they don’t Tim. Remember too that t-tests and the like are fairly robust to non-normality as the number of samples increases above about N=30.

BTW, the log-transform does more than just “change the way data look on the page”. It also causes skewed (and ratio data) data to be symmetrical. It also causes multiplicative effects to be additive, which I’m sure you know anyway. It is mainly used to cause non-normal data to be amenable to statistical tests that require data to be Gaussian (normal) in their distribution.

I use the log10 transform for examining the ratio of counts of upper-range daily temperature extremes relative to low-range extremes. Also, although I have not not published any streamflow data, nutrients in runoff etc. such data are strongly skewed and require transformation before analysis.

The point is that if one calculates average streamflow, soil or nutrient wash-off rates using raw data dominated by a small proportion of extreme events, means will considerably overstate the true average (by factors of 10 and 20 for instance. So transforming data to be symmetrical is important in levelling the playing field and understanding what is going on.

I hope this helps.

All the best,

Bill Johnston

Reply to  Willis Eschenbach
July 1, 2023 1:42 pm

Or elevation or longitude or time or …..

Reply to  Willis Eschenbach
July 1, 2023 1:57 pm

Thanks Willis,

I would plot an additional axis with latitude aligned with say each 5 degree increment to indicate the overarching effect of latitude on T (may have to be a log scale, but I would not know that from here.)

Drawing the graph is nice, but there is more to gleaned by interpreting it well. Serving up more information assists understanding which I see as important.

I have a model that calculates SR (which is part of another model), As I recall, bimodality is due to the earth not being a perfect sphere.

All the best,

Bill Johnston

steve_showmethedata
July 4, 2023 12:05 am

I did not see a response from Willis to this post of June 29, 2023 1:04 am, but I see now there was a non-mathematical response to half my critique (i.e. no response to the invalidity of using confidence bound overlaps including the published papers I referenced) to the original response posted after the date of the above post June 29, 2023 4:18 pm . So I did get a response of sorts but was confused by no response to the June 29, 2023 4:18 pm that generally challenged the statistical methods in this article. So I withdraw comments about no response at all but my defence is the above lack of response and thus lack of a cross-reference. The bottom line is that it is useless and totally frustrating trying to debate statistical methods with one side using no mathematical notation. That is an exercise in futility. All technical discussions in peer review statistical methods literature speak in the language of mathematics for obvious reasons. I cannot respond to strings of English sentences that are so vague and poorly defined in the absence of clear mathematical expressions and proofs so that it becomes impossible to have a proper technical discussion. It is shear frustration trying to argue against what are commonly maths-free, techno-jargon-sounding word salads including deflections and non-sequiturs. Confusion due to maths-free and thus poorly defined methods will continue and will result the inability to clearly challenge incorrectly applied statistical methods!

Classic case, Willis said about my post, “I see nothing in your comment that refutes that” (i.e. “that”=his counter-argument) with no specific reference to the maths I present or the paper I reference demonstrating that it is incorrect to use overlapping confidence bounds for the inferential purpose Willis used them. I said in my post “Given how he is applying regression methods in the inferences about uncertainty which should involve only estimation error and not sampling error as well (see below) he is incorrect in using only the residual error variance and a t-statistic to calculate a CI about the regression line.” and “So neither (1) nor (2) correspond to using only sig2_hat combined with a t-statistic with appropriate DFs but (1) is the correct method for the inference he is making.” But Willis says effectively about those challenges to the methods he uses “nothing to see here” using the above vague motherhood statement!! What! It is patently just a method of deflection “”I see nothing…”! In some ways that’s worse than no response, because it is a pretense at a response since it doesnt engage at all with the arguments I make! That’s even more frustrating than the lack of maths notation!