Uncertain Uncertainties

Guest Post by Willis Eschenbach

Well, I’ve been thinking for a while about how to explain what I think is wrong with how climate trend uncertainties are often calculated. Let me give it a shot.

Here, from a post at the CarbonBrief website, is an example of some trends and their claimed associated uncertainties. The uncertainties (95% confidence intervals in this instance) are indicated by the black “whisker bars” that extend below and above each data point.

Figure 1. Some observational and model temperature trends with their associated uncertainties.

To verify that I understand the graph, here is my own calculation of the Berkeley Earth trend and uncertainty.

Figure 2. My own calculation of the Berkeley Earth trend and uncertainty (95% confidence interval), from the Berkeley Earth data. Model data is taken directly from the ClimateBrief graphic.

So far, so good, I’ve replicated their Berkeley Earth results.

And how are that trend and the uncertainty calculated? It’s done mathematically using a method called “linear regression”. Below are the results of a linear regression, using the computer program R.

Figure 3. Berkeley Earth surface air temperature, with seasonal anomalies removed. The black/yellow line is the linear regression trend.

The trend is shown as the “Estimate” of the change in time listed as “time(tser)” in years, and the uncertainty per year is the “Std. Error” of the change in time.

This gives us an annual temperature trend of 0.18°C per decade (shown in the “Coefficients” as 1.809E-2 °C per year), with an associated decadal uncertainty of ±0.004°C per decade (shown as 3.895E-4°C per year)

So … what’s not to like?

Well, the black line in Figure 3 is not the record of the temperature. It’s the record of the temperature with the seasonal variations removed. Here’s an example of how we remove the seasonal variations, this time using the University of Alabama at Huntsville Microwave Sounding Unit (UAH MSU) lower troposphere temperature record.

Figure 4. UAH MSU lower troposphere temperature data (top panel), the average seasonal component (middle panel), and the residual with the seasonal component removed.

The seasonal component is calculated as the average temperature for each month. It repeats year after year for the length of the original dataset. The residual component, shown in the bottom panel, is the original data (top panel) minus the average seasonal variations (middle panel)

Now, this residual record(actual data minus seasonal variations) is very useful. It allows us to see minor variations from the average conditions for each month. For example, in the residual data in the bottom panel, we can see the temperature peaks showing the 1998, 2011, and 2016 El Ninos.

To summarize: the residual is the data minus the seasonal variations.

Not only that, but the residual trend of 0.18°C per decade shown in Figure 3 above is the trend of the data itself minus the trend of the seasonal variations. (The seasonal variations trend is close to but not exactly zero, because of the end effects based on exactly when the data starts and stops.)

So … what is the uncertainty of the residual trend?

Well, it’s not what is shown in Figure 3 above. Following the rules of uncertainty, the uncertainty of the difference of two values, each with an associated uncertainty, is the square root of the sum of the squares of the two uncertainties. But the uncertainty of the seasonal trend is quite small, typically on the order of 1e-6 or so. (This tiny uncertainty is due to the standard errors of the averages of each monthly value.)

So the uncertainty of the residual is basically equal to the uncertainty of the data itself.

And this is a much larger number than what is usually calculated via linear regression.

How much larger? Well, for the Berkeley Earth data, on the order of eight times as large.

To see this graphically, here’s Figure 2 again, but this time showing both the correct (red) and the incorrect (black) Berkeley Earth uncertainties.

Figure 5. As in Figure 2, but showing the actual uncertainty (95% confidence interval) for the Berkeley Earth data.

Here’s another example. Much is made of the difference in trends between the UAH MSU satellite-measured lower troposphere temperature trend and ground-based trends like the Berkeley Earth trend. Here are those two datasets, with their associated trends and the uncertainties (one standard deviation, also known as one-sigma (1σ) uncertainties) incorrectly calculated via linear regression of the data with the seasonal uncertainties removed.

Figure 6. UAH MSU lower troposphere temperatures and Berkeley Earth surface air temperatures, along with the trends showing the linear regression uncertainties.

Since the uncertainties (transparent red and blue triangles) don’t overlap, this would look like the two datasets have statistically different trends.

However, when we calculate the uncertainties correctly, we get a very different picture.

Figure 6. UAH MSU lower troposphere temperatures and Berkeley Earth surface air temperatures, along with the trends showing the correctly calculated uncertainties.

Since the one-sigma (1σ) uncertainties basically touch each other, we cannot say that the two trends are statistically different.

CODA: I’ve never taken a statistics class in my life. I am totally self-taught. So it’s possible my analysis is wrong. If you think it is, please quote the exact words that you think are wrong, and show (demonstrate, don’t simply claim) that they are wrong. I’m always happy to learn more.

As always, my best wishes to everyone.

w.

4.8 37 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

465 Comments
Inline Feedbacks
View all comments
Richard Page
June 26, 2023 10:24 am

I haven’t taken any statistics classes either Willis, but I think you are essentially correct. I’ve been saying for a while now that what is stated as ‘uncertainty’ is actually a mathematically derived probability that the correct answer ‘probably’ lies within that range. The actual uncertainty is far larger and, if correctly applied, would make a mockery of the climate enthusiasts ‘0.05’ or similar uncertainty range.

Reply to  Richard Page
June 27, 2023 4:54 am

The uncertainty would be as they say- if and only if all the assumptions that went into the argument were true and if all measurements were correct.

I took a one credit course in statistics in college. It was terribly taught. The book was awful. It presumed the student is an idiot so instead of really explaining the development of statistics as a tool and how the formulas were developed- it just gave formulas. I was a good math student in high school and took advanced calculus there and it was well taught so I really understood it. I wonder if many scientists- who do take advanced statistics courses were also poorly taught this branch of math. If you don’t really understand it so you could derived the formulas yourself- you aren’t going to be able to use it properly, in my opinion. I only wish I had taken a more advanced course as I do believe it’s an extremely powerful tool when used correctly. But it’s so obscure to most people that it’s easy to misuse it for devious reasons.

June 26, 2023 10:31 am

“And this is a much larger number than what is usually calculated via linear regression.”

If I’m understanding this correctly, the much larger uncertainty is based on the seasonal variation. But this is not random variation – it’s quite predictable that summers will be warmer than winters. This is a type of auto-correlation, and should be accounted for in the uncertainty analysis.

the assumption of the standard uncertainty of the slope is that all data is independent. This is not the case because of auto-correlation, and that includes the season changes.

But a much easier way is to use the seasonally adjusted data, or just use annual data.

June 26, 2023 10:58 am

And of course there is the elephant in the room- we are dealing with one very thin slice of a slice of a segment of climatological history.

June 26, 2023 11:12 am

It is evident that the estimated 5-95% uncertainty stated in temperature datasets is not the real uncertainty, as every time they go through a version change, as from HadCRUT3 to 4 to 5, or UAH 5 to 6, the new set of temperatures is way outside of the previously estimated uncertainty. The real uncertainty is close to ±0.1ºC.

JCM
Reply to  Javier Vinós
June 26, 2023 11:20 am

 every time they go through a version change, as from HadCRUT3 to 4 to 5, or UAH 5 to 6, the new set of temperatures is way outside of the previously estimated uncertainty. 

bingo.

June 26, 2023 11:48 am

I think autocorrelation needs to be factored in. Don’t ask me how to do this, but the fact that month to month temperature variations tend to be connected, as in they are not completely random, makes a difference.

Richard Page
Reply to  TheFinalNail
June 26, 2023 12:28 pm

You mean that the temperature in one month is as a variation of the preceding month’s temperature? And so on throughout the year? I think that might work if the difference from one month to the next was quite small but might get unstuck if there was a large drop or increase in one (or more) month – outside the probability range.

Reply to  Richard Page
June 26, 2023 12:50 pm

For a given location there’s ‘usually’ not that big a difference between consecutive monthly temperatures. I believe (and don’t quote me) that linear regression operates on the principle that each consecutive value is independant of the next/previous, even if constrained within certain limits. For correlated data like monthly temp anomalies a correction has to be made. Tamino covered this a few years ago, but the maths is beyond me.

Reply to  TheFinalNail
June 27, 2023 11:33 am

… linear regression operates on the principle that each consecutive value is independant of the next/previous, …

That is true if the data are de-trended. However, for a time-series with one or more cyclical components (real world data), including the large-amplitude cyclical component(s) increases the standard deviation. One has to carefully define just what is being summarized with the statistics.

Reply to  Clyde Spencer
June 27, 2023 11:52 am

You also have the problem that different portions of the cyclical components have different variances, e.g. winter vs summer and NH vs SH among others. How do you de-trend such data?

Reply to  Tim Gorman
June 27, 2023 7:37 pm

I think that the implication is that one has to analyze the data by segments and can’t apply a simple process to the whole Earth and the entire time-series.

June 26, 2023 12:00 pm

I’m an Econometrican by training. I have dealt with time series data for decades. The approach I would take is to first deseasonalize the monthly data by taking the 12th difference. Yes, you lose 12 months of data. I have always found the process of taking a 30 year average to deseasonalize a bit weird. The deseasonalized data can be used for all sorts of stationary tests. I’m not sure why there is such a difference in the approach to analyzing climate time series data as compared to economic time series data.

Reply to  Nelson
June 26, 2023 12:15 pm

“deseasonalizing” won’t remove the uncertainty in the data. That uncertainty should be propagated through to any statistical analysis that is done on it. But in climate science that is never done. It’s because most climate scientists *are* statisticians that have never been trained in metrology and uncertainty. If you give them a database of values given as “stated values +/- uncertainty” all they will do is analyze the stated values.

Reply to  Willis Eschenbach
June 26, 2023 3:14 pm

‘What am I missing here?’

If past is prologue, probably nothing. I can see why someone looking to apply ARIMA to the raw monthly data would first difference the data (I=12) to make it stationary. Likewise, I can see why someone would fit a sine wave to the data in order to analyze the residuals. The first is a black box, the second is an attempt at understanding causality.

Allan MacRae
Reply to  Willis Eschenbach
June 28, 2023 12:21 am

Hi Willis,

I haven’t posted on wattsup in a long time, and may not again for a long time.
I wanted to thank you for your many interesting posts over the years.
In particular, your comment of 21Mar2020 on the Covid-19 lockdowns, together with my independent post of the same date, were to my knowledge the earliest and most accurate calls on the Covid-19 global scam.

I’ve fallen far down that Covid-19 rabbit hole, and written two books on the Climate and Covid scams. The latest is here:
COVID & CLIMATE CHRONICLES – THE BIG CULL
[excerpt re Covid-19]
“I’ve watched this carnage unfold – identified it in Feb2020, published my warning on 21Mar2020, and was ignored. There were 13 million Covid-19 vaxx-deaths worldwide to end 2022, excluding China; increasing 19 million by end 2023.”
Denis Rancourt et al (Feb2023) also calculated 13 million Covid-19 vaxx-deaths to end 2022.
A recent Chapter:
THE CORRUPTION OF OUR INSTITUTIONS 5
Addendum to 18June2023 – Denis Rancourt et al
https://allanmacrae.substack.com/p/rancourt-also-states-no-real-pandemic

I sure Willis that you realized long ago that the warming-alarmist side of the climate debate never argued the science, they merely screeched the propaganda of Lenin, Goebbels and Alinsky – an effective tactic to convince the ignorant and innumerate.
In summary, the left lies about everything, and they can fabricate lies much faster than we can disprove them.

I wish you well Willis. Look me up if you ever get to Calgary.
Over and (probably) out, Allan MacRae

Post Script
I’ve recently become aware of the huge global child trafficking business. It appears highly probable that the leaders of our two countries are pedeaux (French, plural of pedeau, as in Justin Pedeau) and part of a powerful global network of same. I’m not sure I want to go down that rabbit hole.

“THE OLIGARCHS WHO RULE OUR WORLD (THE COMMITTEE OF 300) ARE PEDOPHILES WHO CONTROL THE SYSTEMS OF GLOBAL CHILD SEX TRAFFICKING AND CONTROL THE UN AND WEF.”
“We know that there are more than 8 million children per year who disappear.”
“The situation with Donald Trump was a big shock…   The plandemic was supposed to be in 2016….  A disaster regarding food and water, they want to arrive in 2025 now…”
– Calin Georgescu – Former United Nations Executive and Former President of the Club of Rome for Europe.

bdgwx
Reply to  Nelson
June 26, 2023 7:19 pm

Like Willis I’m interested in a bit more detail about what you’re describing.

bdgwx
June 26, 2023 12:03 pm

Following the rules of uncertainty, the uncertainty of the difference of two values, each with an associated uncertainty, is the square root of the sum of the squares of the two uncertainties.

That is for uncorrelated values. For values that are correlated it is more complicated.

For example…

263.0 ± 0.20 K minus 262.5 ± 0.20 K with a correlation of r = 0.00 is 0.5 ± 0.28 C

263.0 ± 0.20 K minus 262.5 ± 0.20 K with a correlation of r = 0.25 is 0.5 ± 0.25 C

263.0 ± 0.20 K minus 262.5 ± 0.20 K with a correlation of r = 0.50 is 0.5 ± 0.20 C

263.0 ± 0.20 K minus 262.5 ± 0.20 K with a correlation of r = 0.75 is 0.5 ± 0.14 C

263.0 ± 0.20 K minus 262.5 ± 0.20 K with a correlation of r = 0.90 is 0.5 ± 0.09 C

This can be computed via JCGM 100:2008 equation 16. Or you can use the NIST uncertainty machine. Note that when r = 0 equation 16 reduces to equation 10. And when the measurement model is y = f(x, y) = x – y then equation 10 reduces to the familiar root sum square formula.

Reply to  bdgwx
June 26, 2023 12:22 pm

The problem is that temperatures are *NOT* highly correlated when considering they are taken at different times under different physical conditions using different measurement devices. See the attached map of temps in northeast Kansas at 2pm. Just how correlated are those temperatures? They vary with respect to geography (altitude, closeness to water), land use (urban, rural, planted to crops, used as pasture), and weather fronts among a myriad of other impacts.

All you can basically say is that they are daytime summert temperatures – which doesn’t mean they are correlated and which has no impact on the uncertainty of their average – the uncertainty of the average is just the plain growth of the individual uncertainties, be it grown directly or by root-sum-square.

This has been pointed out to you multiple times in the past. Why do you always conveniently ignore it?

july_2023.jpg
Reply to  Tim Gorman
June 26, 2023 4:44 pm

Just how correlated are those temperatures?

It’s impossible to tell from just a single value. If you want to know how well correlated any pair of stations are you would need to look at all the data across a certain time period.

I expect the correlation would be very high, especially if you measured it for an entire year.

Reply to  Bellman
June 27, 2023 3:25 am

Single values are what makes up the average for an area! The issue is the uncertainty introduced from measuring multiple different things using different devices and assuming they represent multiple measurements of the same thing using the same device thus their average is a “true value”. It isn’t a true value because the assumption is wrong from the very start!

When you calculate correlation using only the stated values without considering the uncertainty in those values, even over a time period, you are making the unspoken and unwritten assumption that all the stated values are 100% accurate because all uncertainty cancels. In other words, if you truly consider the uncertainty introduced from measuring multiple things using different devices you won’t *know* what the correlation actually is.

It’s like picking three boards from the scrap pile at a construction site and assuming their lengths are all correlated and the uncertainties in their lengths don’t add. How do you know their lengths are correlated?

Reply to  Tim Gorman
June 27, 2023 5:55 am

Single values are what makes up the average for an area!

Which has nothing to do with your question about what the correlation was between the temperatures.

It’s like picking three boards from the scrap pile at a construction site and assuming their lengths are all correlated and the uncertainties in their lengths don’t add. How do you know their lengths are correlated?

You really don’t understand what correlation means, do you?

Reply to  Bellman
June 27, 2023 8:12 am

I understand perfectly what it means.

The population growth of swans in Denmark and the number of babies born in Denmark are highly correlated. Why can’t you just average the two, divide the uncertainty in both counts by 2, and reduce the total uncertainty!

Reply to  Tim Gorman
June 27, 2023 12:35 pm

The population growth of swans in Denmark and the number of babies born in Denmark are highly correlated.

I’ll take your word for it – but some citation would be useful. What is the correlation coefficient? Over what period are you taking the correlation?

Why can’t you just average the two

You can, but that’s got nothing to do with correlation. You can average two uncorrelated or correlated values. Correlation will just increase the uncertainty. No idea why you’d want to average this though.

divide the uncertainty in both counts by 2

why would you do that?

Reply to  Bellman
June 27, 2023 11:41 am

And do you know the difference between correlation and cause-and-effect. It is an important distinction. If the correlation is spurious or just coincidence, then it is unimportant. Auto-correlation is important when trying to predict a dependent variable based on the value of the causative independent variable. In the extreme case, perfect auto-correlation means that the independent variable has no predictive value.

Reply to  Clyde Spencer
June 27, 2023 12:28 pm

And do you know the difference between correlation and cause-and-effect.

Yes. How many times have I had to point out that correlation does not imply causation here?

I could probably explain it better if I had the faintest idea what Tim wanted to know about correlation. This thread started because he was asking what the correlation was between various temperatures taking on a single day. I simply pointed out you couldn’t tell from a single value for each station. But rather than address that we are on to some obsession about swans, and I’ve still no idea what point he thinks he’s making.

bdgwx
Reply to  Willis Eschenbach
June 27, 2023 8:23 am

For the seasonal data as follows and repeating each year from 1979 to 2022 I get a trend of 0.000 ± 0.056 C/decade using the standard uncertainty from an ordinary linear regression. But using the AR(1) method which factors in the auto-correlation I get 0.000 ± 0.197 C/decade with an AR(1) expansion factor of v = 3.54.

Jan 263.18 K
Feb 263.27 K
Mar 263.43 K
Apr 263.84 K
May 264.45 K
Jun 265.10 K
Jul 265.42 K
Aug 265.23 K
Sep 264.64 K
Oct 263.95 K
Nov 263.41 K
Dec 263.19 K

Here are how the AR(1) uncertainties evolve over time ending at 2022/12 for each of the 3 graphs in figure 4 in your post. Notice that the variability caused by the seasonal component results in significantly higher uncertainties as a result of the very high correlation of the seasonal data.

The top graph (absolute) has a trend of +0.134 ± 0.202 C/decade at the end of the period.

comment image

The middle graph (seasonal) has a trend of 0.000 ± 0.197 C/decade at the end of the period.

comment image

The bottom graph (anomaly) has a trend of +0.134 ± 0.046 C/decade at the end of the period.

comment image

Reply to  bdgwx
June 26, 2023 7:11 pm

You are using the wrong measurand. As NIST Tn1900 points out, the measurand you are using is the average temperature as determined by a monthly average. The correct uncertainty in that monthly average is contained in the variation of the collection of data, i.e., the daily readings for a month.

You are trying to use individual uncertainty of each reading as the uncertainty in the average. As I have tried to point out, that is improper. An average of temperature readings is in essence, individual experimental determinations of Tmax, taken under as similar conditions as possible.

NIST TN 1900 lays out very succinctly what is the measurand and how it is determined. It is also very succinct about how to calculate the uncertainty of the average of those experimental determinations. And, it uses the GUM as a reference.

Reply to  Jim Gorman
June 26, 2023 8:57 pm

Dear Jim,
 
All you have is the data that you have and without additional information, a datapoint cannot imply anything about ‘accuracy’.
 
Unless one has experienced the process of undertaking standard weather observations, used the various instruments under a wide range of conditions (and reported the same) as I have, meaning of the term accuracy of an observation, and its partner-term uncertainty (which mean different things) cannot be defined and therefore has no bearing on the discussion.    
   
Uncertainty, is by definition half the interval scale used in estimating a datapoint. For a 1mm graduation on a tape measure it is +/- 0.5mm. For a ruler graduated in 1/10ths of an inch it is (1/10)/2 = 0.05 inches. For a Fahrenheit meteorological thermometer with 1 degF increments it is 0.5 degF, for a Celsius met-thermometer with ½ degC divisions it is 0.25 degC, which rounds to 0.3 degC.
 
However, the Fahrenheit thermometer has 180 divisions between freezing and boiling (212–32 = 180), whereas the Celsius thermometer has 100 full divisions between zero and 100 degC, at face value the Fahrenheit thermometer is more accurate. Nevertheless, observed to the nearest ½ degC (for which there are 200 sub-divisions), the Celsius thermometer is (200/180) = 1.11 times more accurate than a Fahrenheit one.
 
While ignoring eyeball-error, transcription-error, re-transcription-error …. it is generally regarded that the uncertainty of comparing two values is +/- 0.3 degC.
 
Then there is the site uncertainty. Whether the site was watered or not; passing vehicles and aircraft, the state of the Stevenson screen, slackness of the observer, and all the rest, over which there is no control, and about which there is almost zero knowledge.  
 
Taking account of all this, it is meaningless to talk about the accuracy of an observation. It is also reasonably meaningless to wax in a conjective sense (existing neither in the subject nor in the object but in the group) about uncertainty (see https://sweettalkconversation.com/2015/04/02/individual-volition-and-conjective-reality/).
 
Which takes me to the opening sentence, where I say “In the first place, all you have is the data that you have, and without additional information a datapoint cannot imply anything about ‘accuracy”.
 
Ahhh but wait …. You can average and derive an uncertainty for that. You can undertake a regression analysis and derive uncertainties for slopes and intercepts; calculate confidence intervals for the line, prediction intervals for new data, you can also calculate limits – the temperature when time=zero, you could also decide that nothing makes sense and toss the data in the bin.
 
You can also compare say one set of observations (say UAH data) with another (say Berkely earth), decide they are different then wonder why. After all they both use the same satellites, the same microwave thingos but they result in different trends. Which trend is “better” or more “accurate”? Mmmm, without going back to first principles, hard to decide really.
 
Both trends are especially perplexing given that the only trend in Australian temperature data is that resulting from homogenisation.
 
Homogenisation is the process used by Australia’s Bureau of Meteorology to make data, say for Halls Creek or Cairns agree with models (https://www.bomwatch.com.au/bureau-of-meterology/part-6-halls-creek-western-australia/; https://www.bomwatch.com.au/data-quality/climate-of-the-great-barrier-reef-queensland-climate-change-at-cairns-a-case-study/).
 
All the best,
 
Dr Bill Johnston
 www.bomwatch.com.au
 

Reply to  Bill Johnston
June 27, 2023 1:31 am

I said that “While ignoring eyeball-error, transcription-error, re-transcription-error …. it is generally regarded that the uncertainty of comparing two values is +/- 0.3 degC. However I was wrong.

As the uncertainty of an observation is +/- 0.3 degC, the uncertainty of comparing two independent observations is +/- 0.6 degC.

Cheers,

Bill

bdgwx
Reply to  Bill Johnston
June 27, 2023 7:43 am

As the uncertainty of an observation is +/- 0.3 degC, the uncertainty of comparing two independent observations is +/- 0.6 degC.

Not quite. For an instrument with 1/2 C divisions the uncertainty is ±0.25 C (rectangular). If the measurement model is y = a – b then u(y) = 0.2 C. Equivalently you can say it is ±0.5 C (triangular). Note that 0.2 is the standard deviation of a triangular distribution with ±0.5 bounds.

I encourage you to use the NIST uncertainty machine to verify this yourself.

Reply to  bdgwx
June 27, 2023 3:42 pm

Good morning bdgwx,

The uncertainty (observational error) strictly of a Celsius meteorological thermometer is as you say 0.25 degC (half the interval scale). As one cannot read a met-thermometer to two decimal places this conventionally rounds-up to +/- 0.3 degC, Errors are additive when comparing two values, thus the combined error of a difference is +/- 0.6 degC. The error in this case means the true value lies within 0.3 degC of the observed value and the true difference lies within 0.6 degC. (Search for measuremant error.)

Another point to make is that met-observations are made by just one person. Error in the sense of a distribution (several eyeballs examining the same thermometer independently) is not determinable from a single observation. Likewise whether temperature was observed “accurately” by that person cannot be known.

Also, accuracy with respect to the measurand (the air being measured) would require a minimum of three thermometers being read independently at each time (i.e. replication), which is also not possible when just one observer undertakes observations.

If we were to plot daily observations and conduct a linear regression analysis, each observation would have an identical error bar of +/-0.3 degC

All the best,

Bill Johnston

Reply to  Bill Johnston
June 27, 2023 5:39 pm

You pretty much nailed it! One observation does not create a distribution.

bdgwx
Reply to  Bill Johnston
June 27, 2023 5:42 pm

Errors are additive when comparing two values

That rule applies when the uncertainty of the two values is rectangular. It is a useful rule for. It’s just limited. The exact rule for uncorrelated measurements of a comparison (ie y = a – b) is that the combined standard uncertainty is the square root of the sum of the squares of the standard uncertainty of the two values. It is an important distinction because if you want to use the comparison y in yet another calculation you need to leave its combined uncertainty in standard form so that it can be propagated correctly.

To demonstrate this consider two values 5.5 ± 0.3 (triangular) and 2.5 ± 0.3 (triangular). The difference between the two is 3.0 ± 0.1 (1σ nearly gaussian). Clearly the additive rule is way off. You can verify this result with the NIST uncertainty machine.

Reply to  bdgwx
June 27, 2023 6:01 pm

 is the square root of the sum of the squares of the standard uncertainty of the two values. “

This *ONLY* applies if you can assume partial cancellation of the uncertainty contributions. With only two values to compare it’s going to require pretty strict justification for assuming partial cancellation. It doesn’t matter what distribution you assume, if it is symmetric then there is an equal possibility that both contributions will be on the same side of the stated value compared to one being on one side and one on the other. That is why when you only have two measurements it is usual to use direct addition of the uncertainty and not quadrature addition.

Reply to  Bill Johnston
June 27, 2023 3:31 am

Uncertainty, is by definition half the interval scale used in estimating a datapoint”

Your assumption here is wrong. If you have a difference in environment for each data point then an uncertainty is introduced. A tape measure can expand in high temps and contract in low temps. That introduces an uncertainty totally unrelated to the “interval” markings on the tape. A mercury thermometer can have hysteresis, i.e. it reads differently in falling temps than in rising temps, which introduces uncertainty that is totally unrelated to the “interval” markings on the device.

Reading a device manually is only ONE factor introducing uncertainty. What if the device has a digital readout? Is the uncertainty of the indicated value zero?

Reply to  Tim Gorman
June 27, 2023 5:00 am

A digital display will at best be rounding to the last displayed digit, so surely the uncertainty cannot be less than +/- half the magnitude of the last displayed digit?

Reply to  DavsS
June 27, 2023 7:22 am

Agreed, but there should be no uncertainty in the *reading* of the measurement. It will be exactly what is displayed, no parallax error, no estimation error. As you say however, there will *still* be uncertainty associated with the digital reading, and not just +/- half the magnitude of the last displayed digit but also with the circuitry that develops that reading. If the oscillator driving the measuring device is is off frequency, the reading won’t be accurate. If the resistance values are not spot on, the reading won’t be accurate. And on and on and on …..

Meaning that the uncertainty of the measuring device will be *more* than +/- half the last significant digit.

Reply to  Tim Gorman
June 27, 2023 4:28 pm

If it was really important, you would calibrate the scales and adjust the balance accordingly.

You can even calibrate the implied error, by using laboratory grade weights that overstep the 0.05 interval by 0.01g. It can be done and in the olden days of laboratory balances, each was supplied by a box of counter weights, which one would handle using tweezers to avoid weight contamination from sticky fingers.

As I recall from physics 1.01 scales should be corrected for altitude, possibly air pressure, while balances do not require correction (too long ago!).

Cheers,

Bill

bdgwx
Reply to  DavsS
June 27, 2023 7:45 am

Correct. There will be other sources of uncertainty as well so the total uncertainty will likely be a little more than the resolution uncertainty.

Reply to  Tim Gorman
June 27, 2023 4:15 pm

Dear Tim,

it in not my assumption (see above). The environment is the measurand, the thermometer is the means of measuring the difference between them.

Stretching of a tape does not change the interval scale. You are actually talking about bias. Measuring air pressure using a mercury barometer requires a bias correction for expansion due to ambient air-T. (They have a thermometer on the side for that purpose). A mercury thermometer with a break in the column (which is not uncommon), also introduces bias to the observation, in that case bias is systematic.

Hysteresis is also a non-issue for mercury meteorological and laboratory grade thermometers. The digital readout of your bathroom or kitchen scales measuring whole grams, is +/- 0.5g. Measuring to 0.1g, it is +/-0.05g

We had precision scales in our lab situated on a concrete bench under an airtight cover able to measure to four decimal places. Precision was +/- 0.00005 g. (The scales could detect movement in the lab and trucks going past to the workshop.)

Siding springs observatory near Coonabarabran consists of two large concrete cylinders separated by an air-gap. Settings of the telescope on the inner cylinder, which required great accuracy could be disturbed by a careless footfall.

All the best,

Bill

Reply to  Bill Johnston
June 27, 2023 5:49 pm

You are actually talking about bias.”

Yep. It is *systematic* bias. It’s due to calibration error. And it can’t be eliminated using a statistical analysis Since every measurement is a combination of random error and systematic bias and you can’t determine the systematic bias then you can’t determine the random bias either. All you can do in a field instrument is assume an uncertainty that is large enough to encompass both components (but no more than large enough).

Hysteresis is certainly an issue in any measurement device. Even a mercury lab thermometer. You can calibrate such an instrument at the freezing point and the boiling point but you can’t make the mercury rise and fall at the same rate at the bottom of the device and at the top of the device. That *is* hysteresis and it is a systematic bias that can’t be eliminated through statistical analysis.

able to measure to four decimal places. Precision was +/- 0.00005 g.”

Precision is not accuracy. See the attached picture, especially the one on the right. High precision, low accuracy. It doesn’t matter what the precision of the instrument is if it is not accurate.

average_vs_precision.png
Reply to  Bill Johnston
June 27, 2023 11:48 am

… the Celsius thermometer is (200/180) = 1.11 times more accurate than a Fahrenheit one.

No, the C thermometer has 1.11X greater precision.

Reply to  Clyde Spencer
June 27, 2023 12:03 pm

Bingo. Hitting the same spot on the target every time is precision. If that spot is no where near the center of the target then the accuracy is poor and the precision doesn’t help much if the goal is to hit the center of the target.

Reply to  Tim Gorman
June 27, 2023 5:40 pm

But Tim, an observer does not necessarily hit the same spot every time. The minimum ‘accuracy’ is dictated by the intervals on the instrument. If the intervals are precise, the uncertainty is still 1/2 the interval scale.

My mention of degF above is only relevant in comparing pre-metric degF, with post-metric degC.

Anyway here is a data snip for Cunnamulla post office in Queensland. This is what I derive using code I run in R. It takes about 3 seconds to summarise 100 years of daily observations.

When analysing data I use much more information than simple averages. In addition I sometimes analyse frequencies within classes, sometimes undertake percentile regression etc. depending on the problem of interest.

In terms of satellite data, which is focus of this post, I can’t derive any attribute statistics that would allow an assessment of the quality of the data in the same way as I do for specific sites. While satellite data may be perfect, it still is a leap of faith to accept them at face value. That is really the point.

Way back I raised the question as to why UAH data for Australia shows a trend, when hundreds of surface T datasets from across the continent (and Tasmania), including Cunnamulla (1957 to 2017 – needs updating), show no trend. Why the discrepancy? Furthermore, given they use exactly the same instruments on the same satellites (i.e. the same observers), why is there a difference between UAH and Berkley earth?

Yours sincerely,

Bill

CunamullaDataSnip.jpg
Reply to  Bill Johnston
June 27, 2023 5:56 pm

u_total = u_random + u_systematic.

You are basically saying that u_random is the MINIMUM uncertainty. Of course that is true. But no field instrument has 0 u_systematic. Calibration always drifts. If it didn’t there would be no need for periodic re-calibration. Since when that drift occurred is impossible to pinpoint all you can do, as I said in another post, is to assume an uncertainty interval that encompasses both factors.

Reply to  Tim Gorman
June 27, 2023 7:40 pm

One can say that they are “precisely wrong!” 🙂

Reply to  Clyde Spencer
June 27, 2023 4:36 pm

Thanks Clyde,

While in the application of measuring temperature by eyeball, accuracy and precision are interchangeable concepts, you are correct.

Cheers,

Bill

Reply to  Bill Johnston
June 27, 2023 7:51 pm

There is a fine, but important, distinction. That is the number of significant figures that one is required to display to imply the precision, and the maximum number that are justified being displayed. As Tim points out above, precision implies repeatability within a narrow range, whereas accuracy implies numbers that cluster around the true value. One can have high precision with low accuracy, or moderate accuracy with low precision; the ideal situation is high precision with high accuracy. Typically, if one has high precision, there is hope that a calibration procedure can transform that into high accuracy as well.

bdgwx
Reply to  Clyde Spencer
June 28, 2023 8:12 am

And like Bellman and I keep saying if the uncertainty is explicitly stated per JCGM 100:2008 section 7 then we don’t have to imply anything.

Reply to  bdgwx
June 28, 2023 9:34 am

Which you and he still don’t understand.

Reply to  bdgwx
June 28, 2023 11:13 am

And like Bellman and I keep saying if the uncertainty is explicitly stated per JCGM 100:2008 section 7 then we don’t have to imply anything.

Let’s look at some things in JCGM GUM-6:2020.

11.1.2

Models of this kind underlie all Type A evaluations of uncertainty, even if they are not articulated explicitly: in table H.2 of JCGM 100:2008, five replicated readings of electric current are combined into their arithmetic average, and the experimental standard deviation of this average is computed according to equation (5) in the GUM. The choice of the average as presumably ‘best’ (in the sense of minimum mean squared error) to summarize those observations suggests a measurement model where the observations are equal to the true value of the current plus normally distributed errors, all of the same precision. Reliance on equation (5) suggests that these observations are believed to be uncorrelated.

Hmmmm? Experimental standard deviation? What is that?

Do you see a reference to expanding the resolution beyond that of the “observations”?

11.1.4

Statistical models employ probability distributions to describe sampling variability, or uncertainty more generally, which render details of the observations (empirical data) unpredictable. Such uncertainty clouds the relationship between true values of observable properties and true values of properties of interest that are not accessible to direct observation, whose values need to be inferred from the experimental data.

Hmmm? Observations are unpredictable. Temperature for the following day?

2.26 (3.9)

measurement uncertainty

non-negative parameter characterizing the dispersion of the quantity values being attributed to a measurand, based on the information used

NOTE 2 The parameter may be, for example, a standard

deviation called standard measurement uncertainty (or a specified multiple of it), or the half-width of an interval, having a stated coverage probability.

Now let’s go to TN 1900.

7 Uncertainty evaluation for measurands defined by observation equations starts from the realization that observation equations are statistical models where the measurand appears either as a parameter of a probability distribution, …

These parameters need to be estimated from experimental data, …

Parameter of a probability distribution. One statistical parameter is the mean. Another statistical parameter is standard deviation. Both are based on “experimental data”. Do I need to show more references of what experimental data is. There are numerous examples contained in these documents.

You all need to give some references rather than just proclaiming you are correct. Referring to a general document is NOT a reference. Willis has provided you with detailed descriptions of his data. You should do the same. Just relying on the proclamations of BEST is not sufficient for arguing.

Reply to  Jim Gorman
June 28, 2023 1:07 pm

Let’s look at some things in JCGM GUM-6:2020.

A lot of random quotes. A pity none of them actually offrs any evidence for your claim.

Experimental standard deviation?, What is that?

A fancy name for sample standard deviation.

Do you see a reference to expanding the resolution beyond that of the “observations”?

Do you see a prohibition on it. It’s your claim, you should be able to find some reference in any of the GUMs or NIST documents saying in no uncertain terms that you must never quote an average to m ore decimal places than the individual measurements.

measurement uncertainty
non-negative parameter”

See. I told Tim you can’t have a negative uncertainty.

Parameter of a probability distribution.

Like a mean of a probability distribution. Something I keep being told doesn’t exist.

Both are based on “experimental data”.

I’m not sure what question you think this is answering. We seem a long way from significant digits rules.

Reply to  Bill Johnston
June 28, 2023 6:24 am

Go read Annex B in the GUM.

Reply to  karlomonte
June 28, 2023 2:22 pm

Dear karlomonte,

I have looked at Annex B in the GUM but I don’t think anyone who writes all these theories or tosses around these ideas has ever seen a meteorological thermometer, let alone used one under realistic conditions. Splitting hairs over instrument uncertainty/error/calibration verses eyeball error/accuracy based on no information or experience at all, while interesting, is not a fruitful exercise.

At the end of the day all you have is a number someone wrote down in a field book, possibly in a hurry or under adverse circumstances; that somebody else transcribed to a register, that somebody else then typed onto a card or tape or directly into a database and possibly that somebody else checked, As an eyeball estimate, whether that number is any any good cannot be determined from sampling theory or the GUM or anything else.

Assessing whether that number is useful, or an outlier post hoc takes a different skill-set than theorizing about it. Some people don’t even know how to deduct cycles from data or why that is necessary.

While adapted from someone else, I believe I’m the first person to have routinely assessed the precision with which temperature observations have been made. Who would know that the Australian database transformed degF data to DegC to one decimal place and that on re-transforming to degF rounding frequencies (precision) were returned in the ratio of 54% x.0, and ~ 23% x.1 and x.9. Determining whether thermometers were observed in whole degF before 1972 requires those frequencies to be summed. Likewise proportions of whole and 1/2 degF were not returned as expected.

Yet here we are digressing into the general area of ‘precision’ with no knowledge of what it means in the real world, or how to assess it in real data. The GUM is no help in this regard.

All the best,

Bill Johnston

http://www.bomwatch.com

Reply to  Bill Johnston
June 28, 2023 3:44 pm

I have looked at Annex B in the GUM but I don’t think anyone who writes all these theories or tosses around these ideas has ever seen a meteorological thermometer, let alone used one under realistic conditions. Splitting hairs over instrument uncertainty/error/calibration verses eyeball error/accuracy based on no information or experience at all, while interesting, is not a fruitful exercise.

I have no idea what you are on about here. Splitting hairs? You haven’t grasped what it is about at all.

Reply to  karlomonte
June 28, 2023 4:29 pm

With respect karlomonte I have the Annex B in the GUM right in front of me and there is nothing in the document that relates to the issues being discussed. A measured T value encompasses all the possible components of error, only one of which (instrument uncertainty) can be objectively specified.

Getting bogged-down by endless discussion about precision, accuracy, averaging, and what happens to uncertainties, linear regression CIs etc. is a bit of a time waster.

What I came up with (from https://sciencing.com/how-to-calculate-uncertainty-13710219.html) is:

“TL;DR (Too Long; Didn’t Read)

If you’re adding or subtracting quantities with uncertainties, you add the absolute uncertainties. If you’re multiplying or dividing, you add the relative uncertainties. If you’re multiplying by a constant factor, you multiply absolute uncertainties by the same factor, or do nothing to relative uncertainties. If you’re taking the power of a number with an uncertainty, you multiply the relative uncertainty by the number in the power.”

At this point I have to get on with something else. Thanks for your insights re. satellite-T.

All the best,

Bill Johnston

Reply to  Bill Johnston
June 28, 2023 6:14 pm

If you want to put these terms into a blender and mix them all up, this is your prerogative, but they have very specific definitions in metrology.

Why does NIST bother to report uncertainty values for fundamental constants?

Reply to  karlomonte
June 29, 2023 12:41 am

Dear karlomonte,

They have specific definitions in statistics too.

I have read through the reference you gave and found nothing of relevance to the measurement of temperature, or for that matter transforming degC to degF (where they drop a decimal point), and all the other issues that I’ve outlined in relation to met data. Arguing finer points like accuracy vs precision is a bit of a time-waster when an observation (the number) is the product of both, and neither can be objectively quantified.

There is also a problem with comparing regression lines constrained to zero at their intercept (no variance), and increasing uncertainty over time using a 1-sigma interval but aside from mentioning it, I don’t feel confident about debating the issue.

I found it interesting that while I was taught uncertainties were additive in comparing two values, while it has not been on my radar, there is much more to error propagation than that. Also, as I left most of my algebra and calculus behind at high school in 1965, I find not all references are useful. Also, maths (or math) alone won’t necessarily resolve practical issues.

Your insights in relation to satellite-T is interesting, but in my mind there are still outstanding issues. A lot of data has also been krigged to support the warming narrative, so except for data you can put your hands on, I’m not sure what to trust any more.

All the best,

Bill

June 26, 2023 12:12 pm

And how are that trend and the uncertainty calculated? It’s done mathematically using a method called “linear regression”. Below are the results of a linear regression, using the computer program R.”

This is “best-fit” measurement. The best fit is done using the stated values of the measurement. But the actual measurement should be given as “stated value +/- uncertainty”.

Each graph you post has the black line showing only the stated value of the temperature measurement, or the average value (e.g. monthly) of the stated value – no uncertainty interval is provided with the stated value.

The issue is that the trend line for the stated values may or may not be anywhere near what the actual trend line would be since you aren’t showing the uncertainties in the actual measurement data.

The true value of each measurement can be anywhere in the uncertainty interval, it can be at +u or -u or anywhere in between. The monthly average for June could be clear at the positive end of the uncertainty interval while for July it could be all the way to the negative side of the uncertainty interval. YOU SIMPLY DON’T KNOW.

The differences between the trend line and the stated values is not the only possibility that must be considered. The true trend line could be up, down, sideways, or even a zig-zag as long as it is inside the uncertainty interval of the actual measurements. You simply can’t tell which one would be correct.

This doesn’t even begin to address the uncertainty in monthly averages which must be propagated onto the average from the individual measurements – and which is typically ignored in climate science. The monthly average is typically considered to be a “true value” which is 100% accurate. This is done by assuming all uncertainty is random, Gaussian, and will cancel over a large number of measurements. The problem is that this is only true where you have multiple measurements of the same thing using the same device. Tmin and Tmax are *not* the same thing so this restriction gets violated right off when a mid-range value is calculated from Tmin and Tmax. It gets further violated when combining mid-range values from different measurement stations. Meaning the assumption that all uncertainty is random, Gaussian, and cancels is an assumption of convenience, not of physical reality.

The physical reality is that the temperature databases aren’t fit for purpose. The uncertainty that *should* be propagated forward from the individual measurements actually gets so large that even stating anomalies in the tenths digit is a lost cause. You simply can’t say that the temperature at Station A (70degF +/- 1degF) averaged with the temperature at Station B (72degF +/- 1degF) is 71degF. It’s actually 71degF +/- 2degF (or 1.4degF if you add by root-sum-square). The uncertainty of the average is larger than that for the individual measurements. The uncertainty GROWS, it doesn’t reduce.

The most common mistake made is to take a large number of measurements and calculate the standard deviation of the averages taken from that large number of measurements. All that tells you is how close you have calculated the population average. It does *NOT* tell you how accurate that population average is. If he individual temperatures have large uncertainties then the average will have even larger uncertainties because uncertainty GROWS, it doesn’t reduce when you are measuring different things using different devices.

I know this was a long rant. I apologize. But the subject is not amenable to just assuming all uncertainty cancels like is done in climate science.

Reply to  Willis Eschenbach
June 27, 2023 3:56 am

Sorry to take so long in replying. Had family in from out-of-state and was tied up.

You missed my point. The uncertainties make the slope indeterminate. You simply can’t avoid that. I sketched out a very poor attempt to explain this. It uses three points to try and define the trend line in temperatures over the past three days at my location.

If the uncertainty in the temps was +/- 1degF then the actual slope for those three points can range from an upward slope to no slope to a zig-zag. No amount of averaging and no Monte Carlo simulation can fix this. It is a function of “not knowing”. You can’t fix “not knowing” with a statistical analysis.

When you are averaging values derived from different instruments you do “not know” the systematic bias that is introduced by each device. Any number of authors on uncertainty will tell you that data values with different systematic biases are just not amenable to statistical analysis. That includes Taylor, Bevington, and Possolo. No amount of Monte Carlo simulations can fix “not knowing”.

Take a look at TN1900 written by Possolo. In order to find an average Tmax in a month at a fixed location using the same measuring device he had to make some simplifying assumptions: 1. The measurements were all of the same thing, 2. No uncertainty in the stated values. He didn’t even attempt to use mid-range values for each day because there is no way that could be MEASURING the same thing. In the real world neither of these assumptions actually hold – yet they are the unwritten assumptions that litter all of climate science.

If your data has uncertainty then you simply can’t just ignore it and base your statistical analysis on the stated values alone. You would be violating one of Feynman’s rules – don’t fool yourself. You are only fooling yourself that you know something you can’t possibly know.

slopes.jpg
Reply to  Tim Gorman
June 27, 2023 10:06 pm

Tim you say “The uncertainties make the slope indeterminate”.

The least squares linear regression line must pass through mean(x) and mean(y) (i.e., the centroid of the data from which the line is constructed). If you know the the slope and the means, you can draw the line freehand. With that constraint, the slope is bounded by the confidence intervals describing the line’s location (which are parallel). Confidence limits of the slope coefficient determine the likely (95%) limit of the lines rotation.

Say for Cunnamulla in Queensland (1957-2109). The raw data trend is 0.243 degC/decade (P(slope=0) = 6.33E-05; SE = 0.0565 degC/decade. Bootstrapped 95% slope CI = 0.122, 0.360 degC/decade. So therein lie the limits of rotation of the line around x-bar, y-bar. Intercept limits (Tmax when time = zero) really don’t matter in this case. (Analysis was done using PAST from the University of Oslo).

As residuals show data for Cunnamulla are not homogeneous, the slope of the line is entirely spurious.

Going to Marble Bar (1901-2020), the R output is:

> summary(LinearModel.6)
 
Call:
lm(formula = MaxAv ~ Year, data = Dataset)
 
Residuals:
    Min      1Q  Median      3Q     Max
-1.29804 -0.40981 0.04881 0.37442 1.40013
 
Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) -15.464330 12.446042  -1.243  0.22048   
Year         0.023716   0.006226   3.809  0.00042 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
 
Residual standard error: 0.579 on 45 degrees of freedom
Multiple R-squared: 0.2438,   Adjusted R-squared: 0.227
F-statistic: 14.51 on 1 and 45 DF, p-value: 0.0004201

Calculated as 2*SE the 95% CI for Year is 0.0125 degC/Yr
(more correctly 1.96*0.00623 = +/-0.0122 degC/yr)

Again as the data are affected by site changes they are not homogeneous, thus the slope (0.024 degC/Yr) is spurious.

The main point is that in both cases, the least squares line passes through mean(x), mean(y) while the slope (rotation) of the line is constrained within confidence intervals.

All the best,

Bill

Reply to  Bill Johnston
June 28, 2023 6:56 am

The least squares linear regression line must pass through mean(x) and mean(y)”

Sorry, but I missed this message.

You are assuming that the mean(x) and mean(y) are 100% accurate! No uncertainty in the mean.

If mean(x) can be from +1 to -1 and mean(y) can be from +1 to -1 then which mean(x) and which mean(y) do you pick to determine your least squares linear regression line?

A +1 to -1 would give a negative slope. A -1 to +1 would give a positive slope. A 0 to a 0 would give a horizontal line. You have an almost infinite number of possible mean(x) and mean(y) values you can choose to determine line.

Reply to  Tim Gorman
June 28, 2023 3:07 pm

BS Tim. Instead of making-up fairy stories, find some data and check it for yourself.

By definition the least squares line must pass through the data centroid. The limits of rotation of the line is given by the 95% CI’s

b.

Reply to  Bill Johnston
June 29, 2023 12:56 pm

No, the least squares line must pass through the centroid of the data being used. If that data is the stated value and ignores the uncertainty in the stated value then whoever is doing the analysis is only fooling themselves. As Feynman said, yourself is the easiest person to fool

98F +/- 1F, 99F +/- 1F, and 102F +/- 1F will give one trend line if only the stated values are used.

If, however, you use 99, 98, and 101 (all possible values within the uncertainty interval) you will get a different trend line, one that is *STILL* WITHIN THE UNCERTAINTY INTERVAL.

Which one is correct?

Climate science assumes the stated values are 100% correct and uses the trend line developed from those as the “true trend line” while ignoring the uncertainty of the measurements. They are fooling only themselves, and apparently you as well.

bdgwx
Reply to  Willis Eschenbach
June 27, 2023 9:56 am

Thanks, Tim, and basically I agree with you, particularly your last sentence.

Do you agree with this statement?

If he individual temperatures have large uncertainties then the average will have even larger uncertainties because uncertainty GROWS, it doesn’t reduce when you are measuring different things using different devices.

Reply to  bdgwx
June 27, 2023 12:00 pm

“No man ever steps in the same river twice, for it’s not the same river and he’s not the same man.”
― Heraclitus

This summarizes well the issue of measuring something, and the distinction between stationarity and non-stationarity. If one wants to measure the diameter of a ball bearing, which has negligible ellipticity, precisely, a high-quality micrometer and a temperature-controlled environment will suffice. The precision can be increased in proportion to the square-root of the number of measurements because the ball bearing has a unique diameter and the only variations encountered will be random and self-cancelling.

On the other hand, if one is measuring the temperature of the river of air passing a weather station, what one is doing is sampling air parcels that have no fixed temperature. The parcels tend to increase in temperature during the day, and decrease at night, but also have unpredictable increases and decreases. That is, there are daily, seasonal, and annual trends where not only the temperature changes, but also the mean and standard deviation change over time. A time-series of recorded temperatures is a classic example of non-stationarity. One is not justified in claiming increased measurement precision simply by averaging many readings. What one obtains with many measurements is a distribution with statistical parameters that vary with the starting and ending point of the sampling period. Notably, the standard deviation will increase if the time-series has a positive trend.  The precision of the recorded temperatures is the inherent precision of the temperature-measuring device because there is only one opportunity to measure a particular parcel of air with the same instrument. All subsequent measurements are different parcels of air. One can calculate the statistical parameters of a large number of samples, but presenting results with greater precision than the measuring device provides meaningless results. Indeed, the longer the time period, the less meaningful high-precision summary statistics become.

Consider the case of measuring the height of a human. A useful approximation is obtained easily. However, an attempt at high precision is thwarted by the fact that one’s height will change through the course of the day, with how strongly the hair on their head is compressed, what their posture is, and whether an instantaneous measurement is during the systolic or diastolic phase of their blood pressure. That is, the boundary for the extent of the human body becomes fuzzy with smaller increments of length. The problem is further exacerbated in trying to obtain an average for a population of humans of different ages, ethnicities, and hairstyles. Trying to assign unwarranted precision to measurements is a fools errand.

Reply to  Clyde Spencer
June 27, 2023 12:37 pm

Nice summation.

Reply to  Clyde Spencer
June 27, 2023 12:44 pm

The precision of the recorded temperatures is the inherent precision of the temperature-measuring device because there is only one opportunity to measure a particular parcel of air with the same instrument. 

Exactly! Many are the times I have tried show the trendologists, without success, that for air temperature measurements n is always equal to one.

Reply to  karlomonte
June 27, 2023 6:21 pm

This is going round and round.

There are two types of thermometers. One provides continuously changing measurements, the other recording type provides only one instantaneous measurement between resets.

The recording type, is either a maximum (it can only ratchet up) other recording instrument is the minimum, which records only on the down-step. They have two different mechanisms that mark the max and min. The max has a constriction that breaks the column when the mercury retreats, the min has a small metal bar that slides down under the meniscus and remains at the lowest point as the meniscus rises. Max/min thermometers are almost horizontal, the others are vertical.

Average T is calculated as (Max + Min)/2

The accuracy/inaccuracy issue is set by the interval as perceived by the observer.

The non-recording thermometer is a dry-bulb or if covered by a patch of muslin irrigated from a water-well, a wet-bulb. Cooling of the wet-bulb relative to the dry-bulb estimates relative humidity and dew=pint temperature. Again the accuracy/inaccuracy issue is set by the interval as perceived by the observer.

So inside the Stevenson screen thare are four thermometers, or thermometers + temperature probes and possibly a chart recorder or two (thermograph, hygrograph and possibly an aneroid barometer, each with a rotation speed of 7-days).

In Australia, cooperating stations take observations at 9am local time, sometimes 3pm; Bureau sites commence at 3am, and take three-hourly observations through the day until 2100h.

That is it.

While the air-river goes past, instruments only measure two values each day. The rest is used for forecasting, and for aviation purposes, and for telling listeners the temperature at fixed times during the day when announcers read the news.

Cheers,

Bill

Reply to  Bill Johnston
June 27, 2023 8:01 pm

The accuracy/inaccuracy issue is set by the interval as perceived by the observer.

The accuracy/inaccuracy is determined by comparison with some standard(s). The process is called calibration. More formally, it is mapping the intervals (or voltages/resistances) to some defined scale such as 100 major divisions between freezing and boiling of water. The precision is a function of the minor divisions that can be resolved.

Reply to  Clyde Spencer
June 28, 2023 2:43 pm

Oh gee, calibration that is a novel idea (sarc).

Do you have any evidence that thermometers and PRT-probes are used witthout being calibrated? Do you seriously think a met-thermometer is also used to measure boiling water?

Have you ever seen one, or undertaken observations?

b.

Reply to  Bill Johnston
June 28, 2023 3:49 pm

Why don’t you know that the only calibrations many Pt RTDs have are based solely on the manufacturer’s specifications? I thought you were the expert.

Reply to  Bill Johnston
June 29, 2023 12:49 pm

From NIST:

GMP 11 Good Measurement Practice for Assignment and Adjustment of Calibration Intervals for Laboratory Standards 1 Introduction Purpose Measurement processes are dynamic systems and often deteriorate with time or use. The design of a calibration program is incomplete without some established means of determining how often to calibrate instruments and standards. A calibration performed only once establishes a one-time reference of uncertainty. Periodic recalibration detects uncertainty growth, serves to reset values while keeping a bound on the limits of errors and minimizes the risk of producing poor measurement results. A properly selected interval assures that an item will be recalibrated at the proper time. Proper calibration intervals allow specified confidence intervals to be selected and they support evidence of metrological traceability. The following practice establishes calibration intervals for standards and instrumentation used in measurement processes. 

From GMP 11, Table 10

Table 10. Recommended intervals for temperature standards
Standards Initial Cal Interval (months) Source
25.5 ohm SPRT 36 NIST
100 ohm PRT′s 12 Accredited Lab Standard
Thermistor 12 Accredited Lab Check Standards 12 Accredited Lab
Liquid-in-glass standards* 6* Accredited Lab

I hope the table comes out correctly. If not go look at GMP 11

36 months is a long time for *ANY* field measurement device to go without being calibrated.

Reply to  Bill Johnston
June 28, 2023 6:25 am

A rising mercury column is working against the force of gravity. A falling mercury column is working with the force of gravity. You don’t consider that to be hysteresis component of the max/min temperature readings?

Average T is calculated as (Max + Min)/2″

That is not an average, it is a mid-range value. Since daytime temps are approximately sinusoidal and nighttime temps are exponential/polynomial decays their means are not represented by Tmax and Tmin. The actual average would be the average value of the daytime sinusoid and the nighttime decay curve.

Using Tmax and Tmin introduces its own biases into the temperature record that are somehow not recognized in climate science but are in agricultural science. Just one more count against the way climate science works.

Reply to  Tim Gorman
June 28, 2023 2:35 pm

No I don’t Tim.

Tmax and Tmin thermometers are held at about 5 deg off horizontal. This is to prevent instruments being reset by ‘wind shake’. Dry bulb thermometers are calibrated in a vertical position and the ‘air’ above the mercury column is actually mercury vapour in a vacuum.

All thermometers are provided with a calibration certificate and can be checked against laboratory grade instruments using a water bath, oil bath or ice bath. Also in undertaking 9am observations, max/min reset values are noted and can be checked against DB readings.

Maximum and minimum temperatures are essentially spot-values that are usually analysed separately. We have been down this path before and in my opinion your argumants are not convincing.

Regards,

Bill Johnston

Reply to  Bill Johnston
June 29, 2023 7:10 am

Bill,

Any tilt at all makes a difference. The size of that difference is what is important. Is it negligible? Or, when added to the other uncertainties, does it tilt the uncertainty estimate.

Calibration is a non sequitur. Field temperature measurement devices are not calibrated in a water bath before each measurement. Once that instrument leaves the lab calibration begins to drift – ALWAYS. If it didn’t then periodic calibration would not be necessary. Since you don’t know the amount of calibration drift at any point in time the uncertainty interval is what you use to account for it.

Reply to  Bill Johnston
June 28, 2023 6:27 am

Four instruments, four different uncertainties. To be rigorous you need to do an uncertainty analysis on each one. And how do you know there are no temperature gradients inside the housing?

Reply to  karlomonte
June 28, 2023 7:12 am

How do you know one is over green grass and one is over brown grass? How do you compare the impacts of the microclimate on the calibration that is done in a lab before installation?

Reply to  Tim Gorman
June 28, 2023 7:25 am

Simple, just ignore them and divide by root-N.

Reply to  Tim Gorman
June 28, 2023 2:51 pm

Hardly worth an answer, but you never know it could be a serious question.

Do you think they calibrate a medical thermometer by sticking it in someone’s ear. The instrument measures the microclimate. That is its purpose.

b.

Reply to  Bill Johnston
June 29, 2023 12:52 pm

Hubbard and Lee showed clear back in 2002 that measurement station microclimate causes reading differences – e.g. green grass underneath in the summer and brown grass/snow in the winter.

The instrument does *NOT* account for these. Neither do most observations, be they human or automated.

Reply to  Tim Gorman
June 30, 2023 12:53 pm

Let me add that there a many items involved in the micro-climate that change over time. Paint fading, spider webs, here in Kansas, floating cottonwood seeds, etc.

Reply to  karlomonte
June 28, 2023 2:46 pm

No there are not four different uncertainties. There are just two, the most important one being eyeball error. Of course there are gradients within the Stevenson screen. So …

b

Reply to  Bill Johnston
June 28, 2023 3:51 pm

You should just stop now, every additional post only highlights your ignorance.

bdgwx
Reply to  Clyde Spencer
June 27, 2023 5:11 pm

Your post is interesting and worthy of discussion. It touches on sampling uncertainty. That is the component of uncertainty that arises due to the selection of when or where the measurements occur. Because there is variability in the actual temperature of air parcels or the heights of people depending on when the measurements occurred this can introduce a source of uncertainty on an aggregation (like an average) of those measurements. NIST TN 1900 E2 presents a scenario that is similar. In that example they want to determine the average Tmax for a month using only 22 samples and determine the uncertainty including the component arising from the variability of temperature itself.

Anyway, aside from the fact that your post is interesting I’m not sure it is relevant to my post. I’m simply asking if Willis believes that the uncertainty of the average is larger than the uncertainty of the individual elements that went into it in direct contradiction to well established texts on the propagation of uncertainty like JCGM 100:2008, NIST TN 1297, and the like. Basically I want to know if he accepts the law of propagation of uncertainty.

Reply to  bdgwx
June 27, 2023 5:52 pm

I’m not sure it is relevant to my post.”

Oh, it *is* relevant. It is just an inconvenient truth for you to admit to.

bdgwx
Reply to  Willis Eschenbach
June 28, 2023 6:55 am

Agreed. However, the Gormans and several other WUWT participants will say it is the root sum square of the individual uncertainties resulting in 16.433 ± 3.5..

Reply to  bdgwx
June 28, 2023 7:26 am

Incorrect, try again.

Reply to  bdgwx
June 28, 2023 7:34 am

Because that *is* what it is. Trend lines based solely on the stated values of measurements while ignoring the measurement uncertainties that go along with those stated values only serve to fool yourself.

You don’t know what you don’t know and can never know. The stated value of a measurement is *never* 100% accurate. There is *always* some uncertainty in the measurement, especially in field devices.

If one measurement’s true value is at the +1 end of the uncertainty interval and the second measurement is at the -1 end of the uncertainty interval then will a trend line through those points have the same slope as a trend line through the stated value?

bdgwx
Reply to  Willis Eschenbach
June 28, 2023 10:20 am

bdgwx: However, the Gormans and several other WUWT participants will say it is the root sum square of the individual uncertainties resulting in 16.433 ± 3.5.

TG: Because that *is* what it is.

See what I mean?

bdgwx
Reply to  bdgwx
June 28, 2023 8:16 pm

And if you point out absurdities with the argument it only ends in even more absurd arguments. For example, if you were to point out that an ASOS station actually takes 6 readings per minute and averages them have 6 * 60 * 24 * 365 = 3153600 measurements. If you were to apply RSS (erroneously) you get the ridiculously absurd uncertainty of the annual average of ±1776 C. And it is defended vehemently.

Reply to  bdgwx
June 29, 2023 7:14 am

WRONG. You still don’t understand what you are yapping about.

Reply to  bdgwx
June 29, 2023 8:19 am

Your argument is specious to the point of absurdity.

If the “measurand” is defined as an annual average as computed by a distribution consisting of 3153600 data points, then this is an experimental situation. You need to compute the standard deviation of the distribution and continue as in NIST TN 1900.

You keep ignoring the GUM. It is really simple.

1) If you measure the same thing multiple times with the same device(s) the uncertainty is the addition of relative uncertainties involved. This is done by RSS.

2) If you measure different things that are done under repeatable conditions, then you must determine uncertainty by using experimental protocols. TN 1900 addresses this and references the GUM Sections 4.2.3, 4.4.3, and G.3.2.

I also want to mention that TN 1900 has the following which is very applicable.

“””””(7a) Observation equations are typically called for when multiple observations of the value of the same property are made under conditions of repeatability (VIM 2.20), or when multiple measurements are made of the same measurand (for example, in an interlaboratory study), and the goal is to combine those observations or these measurement results. “””””

“””””EXAMPLES: Examples E2, E20, and E14 involve multiple observations made under conditions of repeatability. In Examples E12, E10, and E21, the same measurand has been measured by different laboratories or by different methods.”””””

EXAMPLE E2 INVOLVES MULTIPLE OBSERVATIONS MADE UNDER CONDITIONS OF REPEATABLITY.

Exactly what you describe.

I also want to make the observation that with this kind of data, the preferred scientific method to find an common temperature would be to integrate the data into a “temperature•time” value like degree•year. A much better unit to use in comparing year over year changes.

Reply to  Jim Gorman
June 29, 2023 8:42 am

“If the “measurand” is defined as an annual average as computed by a distribution consisting of 3153600 data points”

That’s how it’s defined. At least it’s not how I’d define it. The measurand is the average global temperature over the period of interest. It’s the limit of the average of the entire globe divided into small patches as the patches tend to zero.

The 3153600 data points are just a way of estimating the theoretical average.

“. You need to compute the standard deviation of the distribution and continue as in NIST TN 1900.”

In which case you are agreeing that the uncertainty decreases as the sample size increases. And I assume you are happy to ignore the measurement uncertainty and just use the stated values. But for some reason that never seems to be what is claimed

“integrate the data into a “temperature•time” value like degree•year. A much better unit to use in comparing year over year changes”

Gosh. Exactly what I was suggesting when we were going through the whole “it’s impossible to average temperatures as they are intensive” nonsense.

Reply to  Bellman
June 29, 2023 1:23 pm

Idiot.

Reply to  karlomonte
June 29, 2023 1:36 pm

Thanks for sharing the sum total of your wisdom. I’ll take your your advice to heart.

Reply to  Bellman
June 29, 2023 1:41 pm

Its all your illogic and sophistry are worth, certainly not any of my time to refute your nonsense.

Reply to  Willis Eschenbach
June 29, 2023 1:16 am

I could not get Willis’ R code to work in base R, possibly because he did not specify a package (neither was a seed value specified so I could repeat his random numbers).

I then installed the package errors scanned in the same values as above, ran the same commands and got “16.4 (3)”, whatever that means. Here is the code:

> thevars <- scan()
1: 13.5
2: 17.2
3: 13.3
4: 15
5: 18.6
6: 18
7: 15.5
8: 13.9
9: 19.7
10: 16.6
11: 17.5
12: 18.4
13:
Read 12 items
> thevars
 [1] 13.5 17.2 13.3 15.0 18.6 18.0 15.5 13.9 19.7 16.6 17.5 18.4
> errors(thevars)=1 # assign errors
> sum(thevars/12) # mean and uncertainty
16.4(3)

So the mean is the same but the (3)??

I’m now reading the vignette (https://cran.r-project.org/web/packages/errors/vignettes/rjournal.pdf) and it covers much of the ground traversed over the last few days by the seemingly unresolved discussion about ‘precision’, ‘accuracy’, and error propagation.

To be clear, beyond observing the weather and comparing two values or populations of values or regression lines. error propagation per se holds little interest and I don’t think it is such a big deal for my work.

Here is some maximum temperature observations measured in Fahrenheit by an unknown observer, probably using a Stevenson screen in the backyard of the post office at Albany WA from 2 January 1907 to 18 January 1907 that have been transformed to Celsius.

21.9 22.8 17.9 23.1 19.6 22.4 22.3 22.8 23.9 24.2 24.7 23.6 24.9 27 26.9 19.7 18.4

From all the stuff that has been said and argued about, can anyone tell me anything about the uncertainty of any of the data-points? Is 17.9 an outlier for example – mis-transcribed perhaps? Is 26.9 affected by by parallax? This is what data looks like.

1907 being the start year, can anyone then tell me about using these data to baseline long-term trend at Albany? Surely the GUM would say something? no? what about NIST 1900, no? Well who? What about the uncertainty machine?. No, Nooooooo, nah, blah blah.

Casting all that aside, why use 1-sigma to compare two satellite timeseries essentially derived from the same instruments over the same time-frame, given that both commence at x=0, y=0, a point with no variance? It is an unusual scenario for which I can find no statistical analogue.

If there is something here I’m missing, please get in touch.

Yours sincerely,

Bill Johnston

http://www.bomwatch.com.au

(Who can I charge two days of my time to? Oh that’s right, I’m just a mug-punter.)

Reply to  Willis Eschenbach
June 29, 2023 8:46 pm

Thanks Willis.

I also calculated the propagated error using two approaches in Excel. Both gave the same result.

My long-running belief that when comparing two values, errors are simply additive (as per physics 1.01, 1966) has been shattered. (Not that it makes much real difference in this case.)

Did you read the response by https://wattsupwiththat.com/2023/06/26/uncertain-uncertainties/#comment-3740668. Although I have other commitments going-on and I have not fully digested the entire post, steve_showmethedata raises some important issues.

Cheers,

Bill

Reply to  bdgwx
June 27, 2023 10:17 pm

No bdgwx,

A search for “uncertainties of averages” brings up:

The average value becomes more and more precise as the number of measurements N increases. Although the uncertainty of any single measurement is always Δ , the uncertainty in the mean Δ avg becomes smaller (by a factor of 1/ N) as more measurements are made.

(https://www.physics.upenn.edu/sites/default/files/Managing%20Errors%20and%20Uncertainty.pdf).

Bill

bdgwx
Reply to  Bill Johnston
June 28, 2023 5:30 am

Yeah. I think some of the confusion is that when I, Bellman, Nick, etc. say the uncertainty of the average decreases as the number of measurements increase they think we’re saying that the uncertainty of the individual measurements also decreases which isn’t what we’re saying.

Reply to  bdgwx
June 28, 2023 7:03 am

What you are saying is that you can calculate the population mean more and more precisely as you add additional measurements. That says NOTHING about the accuracy of the mean you calculate or the accuracy of the population mean!

You want to substitute how precisely you can calculate the population average for the uncertainty of the population average.

The SEM is *NOT* the accuracy of the mean. It is only a measure of how precisely you have calculated the mean!

The average uncertainty is not the accuracy of the mean either, especially when you are measuring different things.

The speed of light averaged with the speed of sound is *NOT* a meaningful average. Neither is the average of temperatures. All of these are different things. The average of multiple measurements of different things tells you nothing about reaiity.

Reply to  Bill Johnston
June 28, 2023 6:30 am

This is ONLY true if you are making multiple measurements of the same measurand with the same instrument.

None of this is true for air temperature measurements.

Reply to  karlomonte
June 28, 2023 2:59 pm

Correct karlomonte, we throw away the thermometer every day, use a bendy bit of straw some days, the finger test on others, and on days ending in p, we count how many layers of clothing are needed to stay warm.

Measuring with a thermometer each day is like measuring with a thermometer each day.

b.

Reply to  Bill Johnston
June 28, 2023 3:55 pm

Heading back into clown territory, again.

Again, you have not the SLIGHTEST idea about what I’m talking about.

Go try to get your met station accredited by a third-party accreditation agency, then maybe it will be possible to communicate with you.

Reply to  karlomonte
June 29, 2023 1:38 pm

You are either not understanding the taking of repeat observations, using the same instrument at the same place under similar conditions, or you are being deliberately provocative.

Been to hospital or the Dr lately and had tour temperature measured using a calibrated instrument – its like that except in a Stevenson screen.

b.

Reply to  Bill Johnston
June 29, 2023 3:18 pm

Well pin a bright shiny star on your chest.

Reply to  Bill Johnston
June 29, 2023 5:48 pm

You are either not understanding the taking of repeat observations, using the same instrument at the same place under similar conditions, or you are being deliberately provocative.”

Temperatures are *NOT* taken under similar conditions, weather insures that. Repeated measurements that have systematic error don’t cancel out.

My wife was an RN at the local hospital for 42 years. One of her recurring gripes was measuring equipment that disappeared into the bowels of the maintenance crew room leaving the floor lacking needed equipment.

What were the maintenance people doing with that equipment?

They were recalibrating it. Why were they recalibrating it? Because of calibration drift over time.

Something you seem to be stubbornly resistant to admitting.

Reply to  Tim Gorman
June 29, 2023 9:02 pm

You are hypothesizing Tim. The whole purpose of taking repeat measurements is to observe changes in prevailing conditions. For that to happen, conditions under which measurements are made need to be held constant. Otherwise Tim, the data are not homogeneous i.e. both site conditions, and the environment being monitored are changeable. Good observing practice requires a consistent site and good data-hygiene.

The effect of site changes on historic observations can also be detected using independent statistical tests. Such tests are the basis of BomWatch protocols. Outlines including case studies, and the underlying datasets are available as reports at http://www.bomwatch.com.au. See for example: https://www.bomwatch.com.au/climate-data/climate-of-the-great-barrier-reef-queensland-climate-change-at-gladstone-a-case-study/.

Regards,

Bill Johnston

Reply to  Tim Gorman
June 26, 2023 3:55 pm

You simply can’t say that the temperature at Station A (70degF +/- 1degF) averaged with the temperature at Station B (72degF +/- 1degF) is 71degF. It’s actually 71degF +/- 2degF (or 1.4degF if you add by root-sum-square). The uncertainty of the average is larger than that for the individual measurements.

Let’s try rolling that boulder up the hill again.

If, as you say an uncertainty of ±1°F means that the true temperature could be anywhere within ±1°F of the measured temperature, than by your definition the maximum Station A could be is 71°F, and the maximum for Station B is 73°F. If the true value for each is the maximum then then the true average is (71 + 73) / 2 = 72°F.

Similarly, if both are at the bottom end of the interval than the true average is (69 + 71) / 2 = 70°F.

Hence the true average is between 70 and 72, that is 71 ± 1°F. It is not possible for the uncertainty to be ±2 as that would imply the true average could be 73°F, which would require the true values of stations A and B to be 72 and 74, which would mean each would have an uncertainty of ±2°F (and then by your logic the uncertainty of the average would be ±4°F, which in tern would mean each station would need an even bigger interval, and so forth).

In general it is not possible for the uncertainty of an average to be greater than the biggest uncertainty of an individual component – and short of a consistent systematic error or an infeasible coincidence the uncertainty of the average will be less than that of the individual components.

Reply to  Bellman
June 26, 2023 6:35 pm

Try again.

If each and every data point in the regression was 1° higher, then the resulting trend line would also be 1° higher at each and every point. If each and every data point was 1° lower, then the resulting trend line would also be 1° lower. Therefore, the uncertainty in the trend line is not the residuals surrounding the “middle” value but from the uncertainty of the real value of each and every data point.

Reply to  Jim Gorman
June 26, 2023 6:46 pm

Tim wasn’t talking about a regression. He was specifically talking about the average of two stations. And specifically repeating his nonsense that the uncertainty of the mean is the same as the uncertainty of the sum. You need to at some point figure out if you still think this is the case, or if you agree with your NIST Ex2 calualtion that specifically involves dividing the standard deviation by root N.

As far as linear regression is concerned, if as you say each data point was 1° higher then you either have a systematic error, or a miracle – but in either case the uncertainty of the slope is not affected.

Reply to  Bellman
June 27, 2023 4:15 am

You simply cannot know the true value of either data point. That means you can’t know what the true value of the average is. When you divide by n you are finding the AVERAGE UNCERTAINTY, not the uncertainty of the average!

Possolo made two basic assumptions in Ex. 2. The main one is that all the stated values were 100% accurate. He then went on to find the variation of the stated data values and defined that variation as the uncertainty of the average. That assumption simply doesn’t hold in the real world. Yet it is the unwritten and unspoken assumption that you and all the other climate scientist continually make. As usual, all uncertainty is random, Gaussian, and cancels. You can deny you make that assumption but you can’t even admit that Possolo specifically used it in Ex. 2.

Reply to  Tim Gorman
June 27, 2023 6:41 am

You simply cannot know the true value of either data point.

Gosh, almost if there was uncertainty.

“That means you can’t know what the true value of the average is.

Interesting. Maybe someone needs to come up with a word for this “not knowing”.

When you divide by n you are finding the AVERAGE UNCERTAINTY, not the uncertainty of the average!

When you divide what by n? The thing I keep wanting to divide by n is the sum, in order to get an average, and I also point out that you then have to divide the uncertainty of the sum by n to get the uncertainty of the average. This is not the same as the average uncertainty. It’s not the same becasue the uncertainty of the sum is not the same as the sum of the uncertainties (unless you subscribe to Kip’s logic).

You should know this because I’ve explained it to you dozens of times before, yet you still keep making the same mistake. For some reason you think that a good meaningless soundbite is better than a reasoned argument.

Possolo made two basic assumptions in Ex. 2.

He makes many assumptions, not all of which I would agree with. But it’s meant to be an example, not a blueprint. Like most things, you have to learn to think for yourself, and decide what assumptions are useful and what will cause problems.

The main one is that all the stated values were 100% accurate.

It’s a useful simplifying assumption. (Note assumption does mean you believe it to be true). Any measurement errors are present in the stated values, and the measurement uncertainties are likely to be small and irrelevant compared with the variations in the daily values.

He then went on to find the variation of the stated data values and defined that variation as the uncertainty of the average.

No he doesn’t. The standard deviation of the daily values are taken to be the uncertainty of the daily values. The uncertainty in the average is the good old standard error of the mean (or whatever the PC brigade insist we call it now), the standard deviation divided by the square root of the number of observations.

That assumption simply doesn’t hold in the real world.

So why keep bringing it up?

Yet it is the unwritten and unspoken assumption that you and all the other climate scientist continually make.

I am not a climate scientist. Nor am I a statistician. I’m pretty much in the same boat as the author, largely self-taught. And if you could ever make a reasoned argument I’d give it the attention it deserves. But I’ve really no idea what unspoken assumption you think all climate scientists are making.

It’s really strange following these discussions. Half the time we have Jim attacking climate scientists for not following the method of TN1900, the rest of the time we have you attacking them for making the same assumptions as TN1900.

And, I’ll have to keep repeating this, as you keep ignoring it. Nobody does this, becasue the actual real world global average anomaly is a completely different beast to looking at a single average at a single station, and the uncertainty calculations are definitely not based on just taking the SEM of all measurements.

And non of this has anything to do with the actual topic of this post, which is the much more interesting subject of calculating the uncertainty of a trend line with seasonality.

As usual, all uncertainty is random, Gaussian, and cancels.

Wrong, wrong, partially wrong. But keep fighting those strawmen.

Reply to  Bellman
June 27, 2023 8:19 am

The thing I keep wanting to divide by n is the sum, in order to get an average, and I also point out that you then have to divide the uncertainty of the sum by n to get the uncertainty of the average.”

What you get is the AVERAGE UNCERTAINTY, not the uncertainty of the average!

Again: let x = x1 + x2 + x3 + … + xn
u(x) = u(x1) + u(x2) + u(x3) + … + u(xn)

Now let q_avg = x/n

According to Taylor Eq 3.8, the uncertainty of q_avg is:

u(q_avg)/q = u(x)/x + u(n)/n

IT IS NOT u(q)/q = [u(x)/x] / n!

Uncertainties add. When working with uncertainties all factors should be uncertainty factors.

[u(x)/x] / n is mixing uncertainty factors with non-uncertainty factors.

all [u(x)/x] / n tells you is the average uncertainty, not the uncertainty of the average!

bdgwx
Reply to  Tim Gorman
June 27, 2023 9:07 am

TG said: Again: let x = x1 + x2 + x3 + … + xn

u(x) = u(x1) + u(x2) + u(x3) + … + u(xn)

ALGEBRA MISTAKE #26: u(x) = u(x1) + u(x2) + u(x3) + … + u(xn) does not follow from Taylor 3.16.

Applying Taylor 3.16 we see that:

u(x) = sqrt[ u(x1)^2 + u(x2)^2 + u(x3)^2 + … + u(xn)^2 ]

TG said: According to Taylor Eq 3.8, the uncertainty of q_avg is:

u(q_avg)/q = u(x)/x + u(n)/n

ALGEBRA MISTAKE #27 : u(q_avg)/q = u(x)/x + u(n)/n does not follow from Taylor 3.8.

Taylor 3.8 says δq/q ≈ δx/x + … + δz/z. Notice that this is the provisional rule using the tilde equal sign ≈ and not the exact equal sign =.

Anyway, so if we want to know δq_avg then by substitution Taylor 3.8 becomes δq_avg/q_avg δx/x + … + δz/z or using your nomenclature it is u(q_avg) / q_avg u(x)/x + u(n)/n. Starting from there we have the following.

u(q_avg)/q_avg u(x)/x + u(n)/n

u(q_avg)/q_avg u(x)/x + 0/n

u(q_avg)/q_avg u(x)/x

u(q_avg) u(x)/x * q_avg

u(q_avg) u(x) / x * (x/n)

u(q_avg) u(x) / n

Of course that is but a provisional rule with an approximate answer only. To handle this exactly with independent inputs we need to apply Taylor 3.18. Of course application of Taylor 3.18 yields u(q_avg) = u(x) / n as well in this specific case.

As always I implore you to use a compute algebra system so that are you do not make algebra mistakes in the future.

Reply to  bdgwx
June 27, 2023 11:46 am

“ALGEBRA MISTAKE #26: u(x) = u(x1) + u(x2) + u(x3) + … + u(xn) does not follow from Taylor 3.16.
Applying Taylor 3.16 we see that:
u(x) = sqrt[ u(x1)^2 + u(x2)^2 + u(x3)^2 + … + u(xn)^2 ]”

You are kidding me, right?

Taylor specifically says in Chapter 3 that direct addition is the worst case and quadrature addition is the best case. You have to make a judgement on a case by case basis which to use!

And you are trying to tell me my math is wrong (i.e. Taylor is wrong) because I used direct addition instead of quadrature addition?

Give it a rest troll!

ALGEBRA MISTAKE #27 : u(q_avg)/q = u(x)/x + u(n)/n does not follow from Taylor 3.8.”

Of course it does!

u(q_avg) ≈ u(x) / n”

This is the AVERAGE UNCERTAINTY! It is not the uncertainty of the average! How many times does this have to be pointed out?

If you take 100 boards, each with a different measurement uncertainty, you truly believe that the uncertainty associated with using them to build a beam is the average uncertainty? These are different things measured with different things. There is no guarantee of *any* cancellation of the uncertainties at the end of the process. There may be *some* cancellation which is why you add in quadrature. But you *really* think that the average uncertainty actually tells you anything about the end product?

bdgwx
Reply to  Tim Gorman
June 27, 2023 12:11 pm

TG said: u(q_avg)/q = u(x)/x + u(n)/n

BG said: u(q_avg)/q = u(x)/x + u(n)/n does not follow from Taylor 3.8.

TG said: Of course it does!

No. It does not. u(q_avg)/q is not the same thing as u(q_avg)/q_avg. If you’re going to use formulas (which I encourage) then you are required to substitute values into the formula correctly. If you are declaring that q = q_avg then you must substitute q_avg in place of all instances of q you see in the formula.

TG said: This is the AVERAGE UNCERTAINTY!

ALGEBRA MISTAKE #28: Σ[u(xn), 1, n] / n does NOT equal u(Σ[xn, 1, n]/n)

They are different concepts with different values. You cannot use one in place of the other. And u(q_avg) is clearly the uncertainty of q_avg and q_avg is clearly the average of the sample such that q_avg = Σ[xn, 1, n]/n. Therefore u(q_avg) = u(Σ[xn, 1, n]/n) is the uncertainty of the average. It is NOT the average of the individual uncertainties of x. This is debatable. This is a mathematical fact.

Can you please start using a computer algebra system?

And can you please stop defending previous algebra mistakes by making yet more algebra mistakes?

I’m not trying to rude or insulting here. I’m trying to convince you to perform your algebra carefully and correctly. I don’t think that is unreasonable.

Reply to  bdgwx
June 27, 2023 12:55 pm

No. It does not. u(q_avg)/q is not the same thing as u(q_avg)/q_avg. “

Nit picker!

So I missed an “avg”.

My algebra is just fine. So is Taylor’s. It’s Taylor you have an argument with. He does *NOT* show any division by “n” being used to reduce uncertainty.

Reply to  bdgwx
June 27, 2023 12:00 pm

Clown.

Reply to  bdgwx
June 27, 2023 12:52 pm

I have probably misled you by stating x = x1 + x2 + ..;. + xn

Break it down without that simplifying equation.

q = x1 + x2 + x3 + … + xn

q_avg = (x1 + x2 + x3 + … + xn)/n

u(q_avg)/q_avg = u(x1)/x1 + u(x2)/x2 + … + u(xn)/xn

Do your multiplication and you get

u(q_avg) = (x/n)(ux1)/.x1) + (x/n)(u(x2)/x2 + … + (x/n)(u(xn)xn)

Where does your cancellation happen?

What does (x/n) (u(x1)/x1) equal?
What does [(x1 + x2 +…+xn)/n times u(x1)/x1 equal?

That is going to work out to be complicated.

[(x1 + x2 + … + xn)u(x1) ] / nx1 ? It’s not a simple cancellation.

bdgwx
Reply to  Tim Gorman
June 27, 2023 6:28 pm

TG said: q = x1 + x2 + x3 + … + xn

q_avg = (x1 + x2 + x3 + … + xn)/n

Let’s do the derivation step by step using Taylor 3.8.

(1) δq/q ≈ δx/x + … + δw/w

To avoid confusion with the symbology of Taylor 3.8 we will make the following declaration consistent with what you did above just with different variables.

a = x1 + x2 + x3 + … + xn

b = a / n = (x1 + x2 + x3 + … + xn)

The goal is to find δb. Since b = a / n we can use Taylor 3.8 by substituting b for q, a for x, and n for w exactly as the rule dictates.

(2) δb/b ≈ δa/a + δn/n

Since δn = 0 we further simplify.

(3) δb/b ≈ δa/a + 0/n

(4) δb/b ≈ δa/a

Now we are stuck because we don’t know δa. No worries. Let’s put a pin in (4) and work δa via Taylor 3.16.

(5) δq = sqrt[ δx^2 + … + δz^2 ]

Complying with the rule we need to substitute a for q, x1 for x, x2 for z, and so on.

(6) δa = sqrt[ δx1^2 + δx2^2 + δx3^2 + … + δxn^2 ]

Then if we let δx = δx1 = δx2 = δx3 = … = δxn then we can reduce further.

(7) δa = sqrt[ n * δx^2 ]

Now that we know δa we can pick back up from equation (4).

(4) δb/b ≈ δa/a

(8) δb/b ≈ sqrt[ n * δx^2] / a

(9) δb ≈ sqrt[ n * δx^2] / a * b

And because b = a / n we can simplify further.

(10) δb ≈ sqrt[ n * δx^2] / a * (a.n)

(11) δb ≈ sqrt[ n * δx^2] / n

(12) δb ≈ sqrt[n] * sqrt[δx^2] / n

(13) δb ≈ sqrt[n] * δx / n

(14) δb ≈ δx * (sqrt[n]/n)

(15) δb ≈ δx / sqrt[n]

The following also holds though I’m going through the steps right now. I’ll just state it.

(16) δb = δa / n = δx / sqrt[n]

That is it. I verified all reduction steps with a computer algebra system. You get the same result if you used Taylor 3.18 instead of 3.8 except instead of the tilde equal (≈) you get an exact equal (=).

Reply to  bdgwx
June 27, 2023 3:22 pm

Anyway, so if we want to know δq_avg then by substitution Taylor 3.8 becomes δq_avg/q_avg ≈ δx/x + … + δz/z 

This is an unproven substitution you have made. Please show the proof that δq/q = δq_avg/q_avg so that you can make the substitution. You have taken a formula made by an expert and modified it without showing a mathematical proof that it is appropriate to do so.

My guess is that the two are not equal since one is the total of the sum and the other is divided by n making them unequal and not eligible for substitution.

bdgwx
Reply to  Jim Gorman
June 27, 2023 6:30 pm

JG: This is an unproven substitution you have made.

This is standard introductory algebra level stuff. If I know z = x + y then I can substitute x + y in place of any z I see in another equation.

Reply to  bdgwx
June 27, 2023 7:20 pm

It is incorrect basic algebra to substitute one variable for another without a proof. You can not substitute one variable for another until you have PROVEN that the two variables are exactly equivalent.

I would note that your response does not include a proof, only an unproven assertion that my point is wrong. Not one math teacher that I know of would let a student substitute the way you have done.

YOU made the assertion that the substitution is legitimate. It is up to YOU to show the proof that the two variables are equivalent values. In geometry we call that congruent, i.e., equal and similar.

I have made the assertion that they are not equal. One is a sum total, the other is an average based on that sum. They can not be equal. Show otherwise.

bdgwx
Reply to  Jim Gorman
June 28, 2023 5:41 am

JG: It is incorrect basic algebra to substitute one variable for another without a proof.

That’s what the equal sign does. When you see z = x + y then it is proven that x + y is equal to z and that they can be used interchangeably.

Reply to  bdgwx
June 28, 2023 6:32 am

You don’t have that. You have:

y = x
z = x/n

You can’t substitute z for y!

bdgwx
Reply to  Tim Gorman
June 28, 2023 8:02 am

TG: You can’t substitute z for y!

Nor would I try to. Algebra only allows you to substitute x for y and x/n for z.

Reply to  Jim Gorman
June 27, 2023 6:39 pm

This is an unproven substitution you have made.

It’s the whole point of these rules for propagation. Any uncertainty calculation can be propagated into another calculation. See Taylor 3.8 Propagation Step by Step.

Reply to  Bellman
June 28, 2023 4:23 am

You totally missed the point, as usual. They are two different things. You can’t substitute one of them for the other without showing they are the same. If you have two equations

y = x
z = x/n

you can’t substitute z for y unless you can show z and y are equal. How do you do that? It would only be true if n = 1 and that can’t be the case since if you have only one observation you don’t have a distribution subject to statistical analysis.

Reply to  Jim Gorman
June 28, 2023 6:30 am

You are STILL calculating the average uncertainty and not the uncertainty of the average!

The average uncertainty is meaningless, which you would recognize if you had ever tried to build a beam from different components or build a stud wall from different components.

The average uncertainty won’t tell you if your beam will be long enough. The average uncertainty won’t tell you if your top plate will even reach some of the individual studs or will result in a wavy top plate.

The fact that you can’t accept that inconvenient truth only means that you are are a religious nut.

Reply to  Tim Gorman
June 27, 2023 2:30 pm

What you get is the AVERAGE UNCERTAINTY, not the uncertainty of the average!

Obviously this mantra has some religious significance to you so there’s not much point trying to keep pointing out why you are wrong.

u(x) = u(x1) + u(x2) + u(x3) + … + u(xn)

Except that’s not the usual way to propagate random independent uncertainties. I would take the square root of the sum of the squares, which is a somewhat smaller figure.

u(q_avg)/q = u(x)/x + u(n)/n

Correct.

IT IS NOT u(q)/q = [u(x)/x] / n!”

Also correct.

But you keep avoiding the central point. q_avg = x_sum / n. So

u(q_avg)/q_avg = u(x_sum)/x_sum + 0

so

u(q_avg) / [x_sum / n] = u(x_sum) / x_sum

hence

u(q_avg) = [x_sum / n] (u(x_sum) / x_sum)

and the x_sums cancel

u(q_avg) = u(x_sum) / n

all [u(x)/x] / n tells you is the average uncertainty, not the uncertainty of the average!

All it tells me is you still can’t distinguish between fractional and absolute uncertainties.

Reply to  Bellman
June 27, 2023 12:10 pm

But I’ve really no idea what unspoken assumption you think all climate scientists are making.

It has been made explicit by no less than Nick Stokes that the Law of Large Numbers justifies presenting a calculated mean with more precision than the individual measurements used to calculate the mean. Therefore, one can present temperature anomalies with more significant figures than were available in the raw data.

Nick Stokes
Reply to  Clyde Spencer
June 27, 2023 12:47 pm

Therefore”
Non sequitur.

Reply to  Nick Stokes
June 27, 2023 1:02 pm

The truth hurts, eh? It’s not a non sequitur. It is *exactly* the assumption made in climate science.

The law of large numbers doesn’t increase the accuracy of the mean – period.

Nick Stokes
Reply to  Tim Gorman
June 27, 2023 7:04 pm

But he’s not even talking about the mean. He said that it justifies presenting anomalies with higher accuracy. No-one says that.

Reply to  Nick Stokes
June 27, 2023 8:11 pm

He said that it justifies presenting anomalies with higher accuracy precision.

Maybe “no-one” actually says that, but it is implied by the tables of anomalies listed. There was a time when NASA presented tables of anomalies with three significant figures to the right of the decimal point, even though the original measurements had much less precision. I think that they have stopped doing that, but even two is probably one too many unless is is explicitly presented as a guard digit.

bdgwx
Reply to  Clyde Spencer
June 28, 2023 6:11 am

This is why it is always best to just state the uncertainty explicitly. That way you don’t have to infer it from significant figure rules.

Personally, I wish GISTEMP published 3 decimal places. If they did then I wouldn’t have to run GISTEMP on my own machine. As it is now I have to modify the gio.py source code file and run GISTEMP on my own to see 3 decimal places so that I can get a more precise estimate of the current ytd average and better predict the final year end average. It may seem subtle, but that 3rd digit can swing the probabilities by up to 10% when participating in prediction markets like Kalshi.

Nick Stokes
Reply to  Clyde Spencer
June 28, 2023 1:43 pm

Maybe “no-one” actually says that, but it is implied by the tables of anomalies listed.”

As often, hopeless carelessness with language. An anomaly is just the temperature with an expected value subtracted. No-one says that should be more accurate. What you are now talking about is a global mean of anomalies. Yes, the mean is known more accurately than the individual values, anomalies or otherwise. That is because a lot more information went into deriving it. Taking the mean improves accuracy. Forming anomalies does not.

Reply to  Nick Stokes
June 28, 2023 1:58 pm

Taking the mean improves accuracy.

Nonsense, and it throws away information.

Reply to  karlomonte
June 29, 2023 6:24 am

Most statisticians have no understanding of what variance means other than it is a statistical descriptor. Variance is also a measure of accuracy in a data set. The wider the variance the less likely the average is a “true value” since other values have a higher probability of occurring in relation to the value of the “average”. It’s why they always ignore the variance of the temperature data and substitute how precisely they can calculate the mean value. How accurately they can calculate the mean tells you nothing about the variance in the underlying data – as you say you throw away information – *valuable* infomation.

The concept that variance in the data is related to uncertainty in measurement never enters their mind. They have absolutely no understanding of the concept of “uncertainty”.

When you consider that an annual average temp includes both warm summer data and cold winter data it becomes obvious to most non-statisticians that the variance in the data is huge – meaning the uncertainty in the average goes UP, not down.

It’s only when you are a cult member in the religion of CAGW that increased variance means “more accuracy”.

Reply to  Tim Gorman
June 29, 2023 7:17 am

They don’t understand (and don’t care) that standard uncertainty is quantified by variance, without root-N, they are lost at sea.

Reply to  Nick Stokes
June 29, 2023 6:16 am

No-one says that should be more accurate.”

They don’t have to “say” it. Actions mean more that words. When they quote anomalies out to the hundredths digit from measurement data that barely justifies values in the tenths digit they are *implicitly* stating that the average should be more accurate than the underlying data.

Go look up the definition of “implicit”.

Never mind, I’ll give it to you:

implicit: Implied or understood though not directly expressed.

That is because a lot more information went into deriving it.”

“more information does *NOT* increase accuracy. More information can be just as wrong as less information. You cannot measure a board down to the millimeter using a device marked only in centimeters by just making more measurements, i.e. more information! You simply cannot increase resolution beyond what the measuring device provides. YOU DON’T KNOW AND CAN NEVER KNOW THE TRUE VALUE BEYOND THE RESOLUTION OF THE MEASURING DEVICE.

Even with a digital voltmeter you cannot tell what the actual value of a signal is beyond the 3 1/2 digit resolution of the meter. YOU DON’T KNOW AND CAN NEVER KNOW. No amount of measurements can let you *know*.

And that *is* the problem with trying to use the hundredths digit in the global average. You don’t know and can never know what that hundredths digit is when your information consists of data in the unit or tenths digit.

Only statisticians with no real world experience think statistics can increase measurement resolution!

Reply to  Nick Stokes
June 28, 2023 6:41 am

What Clyde says!

It’s implied by calculating anomalies out to the hundredths digit!

Reply to  Nick Stokes
June 28, 2023 6:32 am

Irony overflow.

Reply to  Clyde Spencer
June 27, 2023 1:01 pm

And that the SEM is the uncertainty of the average and not just a measure of how closely you have calculated the population mean. How close you are to the population mean says NOTHING about the accuracy of the population mean.

Reply to  Tim Gorman
June 27, 2023 1:59 pm

You keep saying this and I’m sure you are misunderstanding something. The population mean is what you are measuring – it’s the measurand if you like. It is not possible for the population mean to be inaccurate becasue that implies it is different from itself.

It’s possible the population you are looking at isn’t what you actually want, and it’s possible your sampling doesn’t give you the correct answer, but it’s meaningless to say the population mean is inaccurate.

Reply to  Bellman
June 27, 2023 5:29 pm

The population mean is *NOT* the measurand when you are measuring different things using different devices under different environmental conditions.

The average has no meaning when you are measuring different things using different devices. The data is not related in any way. It is no different than averaging the swan population in Denmark with babies being born – they are different things. Averaging different things is meaningless.

“but it’s meaningless to say the population mean is inaccurate.”

see the attached picture. You can calculate the average of the points in the right target very precisely. But it will be HIGHLY inaccurate.

This is what you get when you average different things. You can calculate the average right down to the gnat’s behind, take it out to as many decimal places as you want. Bit it is still highly inaccurate because it doesn’t represent reality! There is no such thing as the “average of different things”. Be it swans/babies or the temperature in Denver vs the temperature in Miami. They are *different things*. Even the max temp in Denver is different from the minimum temperature. The mid-range between the two is not reality, just like the 4′ average of a 3′ board and a 5′ board. That average board doesn’t exist. It’s not reality. Saying that is reality violates Feynman’s rule of “you are the easiest person for you to fool”.

image_2023-06-27_192136848.png
Reply to  Tim Gorman
June 27, 2023 6:13 pm

The population mean is *NOT* the measurand when you are measuring different things using different devices under different environmental conditions.
The average has no meaning when you are measuring different things using different devices.”

Remind me of Argument by Dismissal again.

I say the population mean is what I’m trying to measure, you just say it can’t be.

I say the average can be meaningful, you just say it can;t be if it’s made up of different things.

The data is not related in any way.

And I say it is. It’s related by the fact it’s part of the population you are trying to find the mean of. If I want to know the average temperature of the earth, than any measurements of temperature are related by being part of the entirety of the global temperature.

It is no different than averaging the swan population in Denmark with babies being born – they are different things.

Well it is because swans and babies are not really related. You may be able to average them to get some sort of index, but they are not coming from a single population. That’s not to say you can’t average them, but it’s difficult to see what meaning you could attach to the average.

Averaging different things is meaningless.

You still haven’t defined exactly what you mean by “different things”. Are two temperatures measured on different days at the same stations different things? Are two temperatures recorded at the same time in the same city different things? Are two different trees in the same forest different things?

Regardless, the obvious objection to your dismal of averages of different things as being meaningless, is that people are constantly finding meaning in the averages of different things.

You can calculate the average of the points in the right target very precisely. But it will be HIGHLY inaccurate.

What is the population you are measuring on the right? If it’s the population of all shots fired at the target than you have accurately assessed the population mean and discovered it is not on target. That’s one of the main reasons for looking at the mean – is a particular population what is expected. In this case the expected population mean was that the mean would be close to the center of the target, and with a large enough sample size you can show that that particular gun or shooter was not on target.

If your population is actually on target and for some reason you are just measuring the shots wrongly then you have a systematic error in your sample. That does not mean the population mean is inaccurate it means your sample is wrong.

This is what you get when you average different things.

You’re saying the four bullets fired are different things and there is no point in averaging them? So what was the purpose of the exercise. You’ve got a clear pattern of the shots being off target but you are saying it’s impossible to draw any conclusion from that.

Bit it is still highly inaccurate because it doesn’t represent reality!

If the sample doesn’t represent reality then you have a problem, but it isn’t a problem with the concept of averaging different things. It’s just a bad experiment.

There is no such thing as the “average of different things”.

There’s that dismissal again. At what point do you want to inform every scientist and statistician over the last 100 years or so that they’ve been wasting their time drawing conclusions from averages of different things?

Even the max temp in Denver is different from the minimum temperature.

Big, if true!

The mid-range between the two is not reality

Is it just the mean daily temperature that ceases to exist, or is it all temperatures between min and max?

Saying that is reality violates Feynman’s rule of “you are the easiest person for you to fool”.

I may have said this before, but does everyone who quotes that seem to assume that by “you” he meant everyone except themselves?

Reply to  Bellman
June 27, 2023 8:32 pm

I say the population mean is what I’m trying to measure, …

One can rarely measure the population mean in real world data. One, instead, tries to estimate the population mean by determining the mean of a sample of the population.

The average of different things only has meaning if it is stated that the average represents the average weight or diameter of apples and oranges. I’m not sure why one would want to know that, however. The problem is that climatologists present an average of ‘fruit’ and don’t make it clear that is is actually apples and oranges (maybe with some pineapples thrown in for good measure). They imply that what they are measuring is a suitable sample of the air such that they can make judgments about the behavior of the entire air mass. I’m questioning the assumption that the way the air temperature is measured, and the way the data are processed, is fit for purpose.

Reply to  Bellman
June 28, 2023 6:18 am

I say the population mean is what I’m trying to measure, you just say it can’t be.”

I didn’t just say it can’t be. I gave the reason – it doesn’t exist in reality!

“:If I want to know the average temperature of the earth, than any measurements of temperature are related by being part of the entirety of the global temperature.”

That’s like saying I can get the average height of Watusi’s by measuring the heights of Mexican natives.

Before you can calculate the average temp of the globe by using different locations you must first establish the relationship between the temperatures at each point on the globe. (anomalies don’t do it because the variance of the temperature distribution at each point is different) If you don’t do that then you have no idea of what weight each value should have when calculating the total – a major problem that is totally ignored by climate science – and you. It’s like you have never heard of physical science before – which I can believe is true.

Well it is because swans and babies are not really related.”

The temperature on Pikes Peak is not directly related to the temperature in Miami either. So what makes you think including both in a common average makes sense? The temperature in Edmonton is not directly related to the temperature in Rio de Janerio. So what makes you think including both in a common average makes sense?

“but they are not coming from a single population.”

Neither is the temperature in Las Vegas and Miami. The diurnal interval is different – so they are not the same thing.

“That’s not to say you can’t average them, but it’s difficult to see what meaning you could attach to the average.”

EXACTLY! And what does the average of the temperatures in Point Barrow and Madrid tell you? What meaning can be attached to that average?

” than you have accurately assessed the population mean and discovered it is not on target. “

You keep making the points I am asserting and then totally ignoring what the assertions mean!

” with a large enough sample size you can show that that particular gun or shooter was not on target.”

And how do you determine if the global average temperature is not on target?

If your population is actually on target and for some reason you are just measuring the shots wrongly then you have a systematic error in your sample. That does not mean the population mean is inaccurate it means your sample is wrong.”

No, it means your measuring device was wrong and the population mean will then be wrong, i.e. inaccurate. The population mean is determined by the individual data elements. If the individual data elements are wrong then the population mean will be wrong as well. Do you *really* think the population mean exists as a separate entity from the individual data elements used to determine the mean?

You’re saying the four bullets fired are different things and there is no point in averaging them?”

That’s not what I’m saying at ALL? Why do you have such a hard time reading the English language? I’m saying that it doesn’t matter how precisely you measure something, if the measuring device is inaccurate then the calculated mean will be wrong as well no matter how many digits you use to determine the mean!

If that rifle put all four shots into the same hole, i.e. perfect precision, and they are not hitting the bullseye then the rifle is inaccurate no matter how precise it is.

Why is it so hard for you to understand that simple concept? One that a four year old could figure out?

There’s that dismissal again. At what point do you want to inform every scientist and statistician over the last 100 years or so that they’ve been wasting their time drawing conclusions from averages of different things?”

That’s what the Catholic Church said to Galileo! Wrong is wrong. You are trying to justify it by using the argumentative fallacy of Argument to Tradition. “We’ve always done it this way so it must be right”. You need to go watch the movie “Fiddler on the Roof” someday! You and Teyeve would get along fine!

Phoenix is surrounded by high plateau’s in an arid environment. Hays, Ks is on the flat plains of western KS. You seem to think that the temperatures in these locations are the same thing. That averaging them together will tell you something about the environment that exists around them and between them. They are, however, unique in their environments – different elevations, different geography, different terrain, different land use, different wind patterns, different weather patterns, and on and on and on. The *ARE* different things.

If you measure the heights of a group of Watusi’s and another group of pygmies you would say you are measuring the same thing – heights. And the average would tell you something meaningful – like you could order T-shirts sized to fit the average for the whole population. What you would find, however, is that the T-shirts would likely fit no one, too small for the Watusi’s and too big for the pygmies.

Temperatures are the same thing in your view but in reality they are not. And their average tells you nothing meaningful. If you were measuring heat content and averaging it you might get much closer to a meaningful average – but climate science doesn’t do this even though the information has been available for over 40 years to do so. TRADITION RULES!

Is it just the mean daily temperature that ceases to exist, or is it all temperatures between min and max?”

Even here you totally ignore physical reality. The mid-range value, (Tmin + Tmax)/2 is *NOT* the mean temperature. The mean of the daytime temps, a sinusoid, and of the nighttime temps, an exponential/polynomial decay, are not Tmax and Tmin. If you really want the mean daily temp then you need to find the average of the daytime mean and the nighttime mean first! The mean is *NOT* the mid-range value and therefore you don’t know the real mean – implying that it doesn’t exist. Does a tree falling in the forest make a “sound” if there is no one there to hear it?

Reply to  Tim Gorman
June 28, 2023 7:34 am

What would the target look like if you fired a rifle with a scope from the prone position, a rifle with notch sights standing, a snub-nose 38, and a 12-gauge shotgun?

Then of course average the results!

Reply to  karlomonte
June 28, 2023 7:37 am

The average uncertainty would indicate the .38 subby must be very accurate, right?

Reply to  Tim Gorman
June 28, 2023 7:46 am

Yes!

Reply to  Tim Gorman
June 28, 2023 9:35 am

I didn’t just say it can’t be. I gave the reason – it doesn’t exist in reality!

You think just stating “it doesn’t exist” isn’t an argument by dismissal?

I seriously doubt you exist in reality but that doesn’t mean I can’t argue with you. But when everything’s coming down to these pedantic philosophical platitudes I lose interest. You can debate whether anything mathematical “exists in reality”, it doesn’t mean it’s not useful.

That’s like saying I can get the average height of Watusi’s by measuring the heights of Mexican natives.

Apart from it being not remotely like saying that, you are correct. What it’s like is saying you can estimate the average height of all the people on the planet by staking a large random sample from all the people on the planet. Or you can estimate the average height of a Watusi by taking a random sample from amongst the Watusi population.

If you don’t do that then you have no idea of what weight each value should have when calculating the total

Why do you want to weigh each value. If you can take a large enough random sample then there is no need to weigh any value – they all had equal probability of being selected.

If, as is the case with global temperature, you do not have a random sample, you are going to want to weigh the values, but you are doing on the basis of land area.

a major problem that is totally ignored by climate science

Apart from every climate data set, all of which use some form of weighting.

The temperature on Pikes Peak is not directly related to the temperature in Miami either

they are related by both being part of the global temperature you are trying to measure. It doesn’t matter how closely they are correlated, in fact ideally there is no correlation, as you’d like a completely independent sample.

So what makes you think including both in a common average makes sense?

Because they are both part of a common population, and are objective is to estimate the mean of that population.

The temperature in Edmonton is not directly related to the temperature in Rio de Janerio. So what makes you think including both in a common average makes sense?

See my previous answer.

Neither is the temperature in Las Vegas and Miami.

Of course Las Vegas and Miami are both part of the same population (the population being global temperatures). Or are you now going to claim that the USA is not part of the globe.

The diurnal interval is different – so they are not the same thing.

That’s a good thing. If all your sample consisted of the same thing it wouldn’t be much of a random sample.

EXACTLY!

It would be great if at some point you actually explained what this nonsese about Danish swans it about. You seem to think that somehow saying there is no point averaging Danish swans and babies means that all averages are meaningless. I’ll take that as evidence that you don’t understand basic logic.

Also, I’m still waiting for you evidence that swan populations and babies are strongly correlated. I’m assuming this is another cherry pick from a large number of possible correlations, but it’s also possible it’s just one of your fabrications.

And what does the average of the temperatures in Point Barrow and Madrid tell you? What meaning can be attached to that average?

Very little, which is why you don’t want to base a global average on a sample of size two. Though by your logic that should still be a lot more accurate than a sample of 10000.

Enough for now.

Reply to  Bellman
June 28, 2023 1:39 pm

You keep making the points I am asserting and then totally ignoring what the assertions mean!

I think these metaphors are getting too mixed. All I’m saying is that if your estimate of the population mean is wrong because all your measurements have a systematic error, or because there is a bias in your sampling or whatever – that does not mean the population mean is inaccurate, it means your measurement of the mean is inaccurate.

I think you are saying is the population mean is whatever you measure it as, whereas I’m saying the population mean is the thing you are measuring.

If you go into a forest and measure every tree (i..e the entire population) but your tape measure adds 2m to each tree, then your measurement of the mean will be 2m to large. But that doesn’t make the population mean wrong, it’s your measurements that are wrong. The population mean is the same as it ever was.

The population mean is determined by the individual data elements.”

Yes, by the elements, not by their measurements.

Why do you have such a hard time reading the English language?

It’s the Gorman language I have a problem with. If you explained your point better rather than coming up with cute metaphors and insults at every turn, maybe you would be easier to understand.

That’s what the Catholic Church said to Galileo!

They said and did a lot more than I said.

By all means if you have a theory that proves everything we knew about probability for the last 100+ years was wrong, say it and won’t put you on trial however bad it is. All I’m saying is you need to understand that what I’m saying is very basic and well accepted theory. For some reason you want to both cling to the idea you are Galileo whilst painting me as the heretic.

I think that’s as much as it’s worth commenting on. The rest is just more reprtion and the traditional appeal to Fiddler on the Roof.

Reply to  Bellman
June 29, 2023 6:02 am

All I’m saying is that if your estimate of the population mean is wrong because all your measurements have a systematic error, or because there is a bias in your sampling or whatever – that does not mean the population mean is inaccurate, it means your measurement of the mean is inaccurate.”

Unbelievable. The average doesn’t exist as an entity that can be measured. It isn’t a “thing” that exists in reality. It can only be determined as a statistical descriptor of things that actually exist in reality.

And here you are saying basically that the population mean can be right while its being wrong. The problem is that you don’t *know* what the “right” population mean actually is if the things that determine it are wrong.

That’s the whole purpose of the concept of “uncertainty”. YOU DO NOT KNOW AND CAN NEVER KNOW.

Even multiple measurements of the same thing doesn’t guarantee that the average is the “true value”. If your measuring device wears as the multiple measurements of them are being made the average is *not* the true value. In such a case there is no such thing as a “true value”. That’s why you have to justify in each and every case that the multiple measurements of the same thing using the same device results in a true Gaussian distribution. *Any* skewness, any at all, makes the assumption that the average is the “true value” a false assumption.

You have been reduced to bare, basic sophistry to try and justify the unjustifiable.

Reply to  Bellman
June 28, 2023 3:41 pm

You think just stating “it doesn’t exist” isn’t an argument by dismissal?”

I suggest you go back and read up on your argumentative fallacies. Not being a real thing *is* a reason for stating something is not real. That’s not a dismissal for no reason, it is a dismissal *for* a reason!

” I lose interest”

You lose interest when you’ve been backed into a corner and you know it.

” average height of all the people on the planet by staking a large random sample from all the people on the planet”

Can you then take that average height and order T-shirts for all the people in the population and have it fit them? That’s what the GAT attempts to do – say the planet is the average of all kinds of different things. The problem is that average has so much uncertainty no one knows for sure it actually fits anyone let alone everyone!

“Because they are both part of a common population, and are objective is to estimate the mean of that population.”

The problem is that the temperatures are *NOT* part of a common population. You have different variances, different uncertainties, and different distributions all shoved into the same data set. It’s like shoving in Shetland ponies with quarter horses and saying that the average height represent the average height of *all* horses. It doesn’t!



Reply to  Tim Gorman
June 28, 2023 4:07 pm

Can you then take that average height and order T-shirts for all the people in the population and have it fit them?

It’s a strong field, but that might be dumbest idea you’ve had this week. Do you really think that knowing the average height of a population means that everyone magically become that height?

That’s what the GAT attempts to do – say the planet is the average of all kinds of different things.

OK, second dumbest idea. Nobody, apart from you, thinks that knowing the average temperature on the planet means that everywhere becomes that average temperature.

The problem is that the temperatures are *NOT* part of a common population.

Yes they are, and the fact that you don’t understand that is one of your many problems.

It’s like shoving in Shetland ponies with quarter horses and saying that the average height represent the average height of *all* horses.

Do you ever tire of these repetitive meaningless examples? Judge the average height of all horses from a sample which includes just two species. It would be like taking your and your brother’s IQ and claiming the average represented the average human IQ.

Reply to  Bellman
June 29, 2023 1:13 pm

It’s a strong field, but that might be dumbest idea you’ve had this week. Do you really think that knowing the average height of a population means that everyone magically become that height?”

Like it or not that *is* what climate science does – assumes everyone is “warming” at an alarming rate so everyone has to wear the same T-shirt!

You can’t even be consistent in your analyses. Assuming everyone is warming the same is ok but everyone wearing the same T-shirt is not. I’d put it down to just plain hypocrisy but I suspect it is the same culprit you always suffer from – absolutely no knowledge of or experience with the real world.

Reply to  Tim Gorman
June 29, 2023 1:47 pm

Like it or not that *is* what climate science does – assumes everyone is “warming” at an alarming rate so everyone has to wear the same T-shirt!

Well done. It’s now only the third dumbest statement.

No one says everywhere is warming at the same rate, and what does that have to do with everyone needing the same size of t-shirt?

Assuming everyone is warming the same is ok but everyone wearing the same T-shirt is not.

OK, fourth dumbest.

absolutely no knowledge of or experience with the real world.

Says someone who thinks that knowing the average size of a person means everyone can wear the same size t-shirt.

Reply to  Bellman
June 29, 2023 3:19 pm

Just quit now while you are drowning in the quicksand, its not working for you.

Reply to  Bellman
June 29, 2023 5:51 pm

Oh man! Get out of your basement sometime!

What does Global Net Zero mean to you?

What does it mean to struggling 3rd world countries that are trying to advance their standard of living so they can have electricity 24/7/365? Instead of intermittent solar and wind?

GNZ means everyone wears the same T-shirt. And that simply goes right over your head!

Reply to  Tim Gorman
June 29, 2023 6:16 pm

Wow, talk about a non-sequiter.

So far we have gone from – knowing the average height means you think everyone can wear the same size t-shirt, to knowing the global average temperature means everywhere is warming at the same rate, and therefore everyone must wear the same t-shirt, to net zero is like everyone wearing the same t-shirt.

Look – I know this is hard for your one track mind to understand, but I am trying to explain to you how basic statistics work, not espouse any energy policy. If you don’t like a policy argue against it, but don’t pretend to misunderstand statistics just to justify your beliefs.

Reply to  Bellman
June 30, 2023 4:57 am

ROFL!

Do you think it goes unnoticed that you didn’t address why forcing 3rd world counties into the GNZ meme isn’t forcing everyone to wear the same T-shirt?

Basic statistics do not handle single measurements of multiple things being jammed into the same data set. Basic statistics doesn’t even recognize uncertainty in any example I can find in the five statistics books I have collected. There is not a single example in any of them showing the data being presented as “stated value +/- uncertainty” and developing methods for handling the uncertainty portion. It’s all just “stated value” with the implication that the stated values are 100% accurate.

The very same thing *YOU* do, bdgwx does, Nick does, and right on down into most of the climate science alarmist cliques.

Reply to  Tim Gorman
June 30, 2023 7:55 am

“Do you think it goes unnoticed that you didn’t address why forcing 3rd world counties into the GNZ meme isn’t forcing everyone to wear the same T-shirt?”

You meant that to be a serious question!?

One is a policy about how best to reduce carbon emissions, and one is getting someone to wear a t-shirt. Even as a dumb metaphore, I can’t begin to figure out why you think they are similar.

Why is building a lawn mower not the same as walking round with a traffic cone on your head? The only answer is that they are not the same.

Reply to  Tim Gorman
June 30, 2023 8:02 am

“The very same thing *YOU* do, bdgwx does, Nick does, and right on down into most of the climate science alarmist cliques.”

As does Monckton, as does NIST, as does your brother, as does Taylor, as does anyone who understands how this works. There is little to no point in worrying about the measurement uncertainty when the data varies by an order of magnitude more than that from measurement errors. And the variation already includes the variatipn from the measurements.

Reply to  Bellman
June 27, 2023 8:17 pm

The population mean is what you are measuring

No, what is being calculated is the mean of the samples from the population. It may or may not be close to the mean of the population. That is the essence of the meaning of accuracy when one is trying to determine the mean of the population. Inaccuracy may result from poor calibration, drifting electronics, noise, or a poor sampling protocol.

Reply to  Clyde Spencer
June 30, 2023 8:31 am

The real problem Bellman has is that he has ONE large sample of something like >9000 stations. The CTL only applies when you have multiple samples whose individual means forms a distribution. With just one sample, you DO NOT have a sample means distribution. If you make the choice that there is ONE sample of >9000, then you must live with that choice. You can’t even invoke experimental conditions because you have only one experiment, albeit with multiple measurements.

Reply to  Jim Gorman
June 30, 2023 10:22 am

“The real problem Bellman has is that he has ONE large sample of something like >9000 stations.”

Stop trying to personalise this. I am not responsible for any data set. If I was I certainly would not create it by taking an average of 9000 stations, nor would I just use the CLT to estimate the uncertainty. Not does any data set I know of.

Please don’t consider my attempts to explain your misunderstandings about the CLT as meaning that’s what you would use to construct a global anomaly data set.

“The CTL only applies when you have multiple samples whose individual means forms a distribution.”

Wrong. I’ve explained to you on numerous occasions why it’s wrong. You won’t listen to me, and you don’t understand any of the explanations you keep quoting.

” With just one sample, you DO NOT have a sample means distribution.”

The whole point of the CLT is to tell you what distribution a single sample comes from. If you have enough individual samples you can estimate the standard error of the mean without needing the CLT. You do this by simulating multiple samples, e.g. bootstrapping, and that is how I think most of the uncertainty estimates are made. This is a substitute for the CLT.

Reply to  Bellman
June 30, 2023 12:35 pm

The ability to estimate the SEM without using the CLT requires proof that the single sample has a normal distribution. Otherwise, the standard deviation is not symmetric around the mean. I have never seen any proof from you, nor in studies that this is the case.

You seem to forget that the primary purpose of using sampling is to obtain a normal distribution (as the CLT shows) by doing correct sampling. It allows one to then calculate the population mean. The formula for doing so is “(SEM • √n) = σ”. Without a normal distribution this formula won’t hold and the CLT is worthless.

At best, the mean you get from a sample will also have the characteristic standard deviation of that sample. You can’t lower it with larger samples or more samples. You had better hope your sample has a small σ.

Reply to  Jim Gorman
June 30, 2023 1:30 pm

The ability to estimate the SEM without using the CLT requires proof that the single sample has a normal distribution.

No it doesn’t. I donlt know why this has to keep being repeated butt he point about the CLT is that the sampling distribution will tend to normal regardless of the population distribution. The distribution may affect the rate at which it converges, that’s one reason why resampling methods can be better. But for sufficiently large samples, it’s reasonable to assume the sampling distribution is normal.

Otherwise, the standard deviation is not symmetric around the mean.

You keep confusing normality with symmetric. But in any event it doesn’t matter.

You seem to forget that the primary purpose of using sampling is to obtain a normal distribution

No it isn’t. The primary purpose of sampling is to estimate the population distribution – especially the mean.

The formula for doing so is “(SEM • √n) = σ””

How is that a formula for calculating the mean? The best estimate of the population mean is the sample mean. The estimating of how good an estimate this is given by the SEM, that is σ / √n. If, as will normally be the case, you don’t know σ, it’s estimated from the sample mean.

Without a normal distribution this formula won’t hold and the CLT is worthless.

Completely untrue. You’ve even shown me simulations demonstrating why it’s untrue.

Reply to  Bellman
July 1, 2023 5:20 am

 sampling distribution”

One sample does not make a distribution. You can’t define a distribution with just one value. You need multiple samples to have a distribution of samples – i.e. a sampling distribution.

Reply to  Tim Gorman
July 1, 2023 3:42 pm

You can’t define a distribution with just one value.

You can define a distribution with zero values. It’s just something that defines the probability of random values.

You need multiple samples to have a distribution of samples – i.e. a sampling distribution.

I think this is your general problem in not understanding maths. You only understand concepts when it’s something you can hold in your hand or touch, and if you can;t you insist it doesn’t exist.

A sampling distribution is a distribution that defines the probability of a sample parameter, e.g. a mean. You might derive it experimentally by taking a large number of different samples, but normally you just calculate / estimate it from your knowledge of the population. You do not need multiple samples to estimate the sampling distribution of a given population, the CLT for example can do that. The only reason you usually need one sample is to estimate what the population is, from which you can estimate the sampling distribution.

Reply to  Bellman
July 1, 2023 5:04 pm

Boy dude you are really getting out there!

“””””You can define a distribution with zero values. It’s just something that defines the probability of random values.”””””

“define a distribution with zero values”. Go on, tell us how you define that distribution. What’s on the x-axis? What’s on the y-axis? With no values, I think that is going to be eye opening!

“””””You do not need multiple samples to estimate the sampling distribution of a given population, the CLT for example can do that. “””””
From:
https://www.scribbr.com/statistics/central-limit-theorem/

“””””The central limit theorem states that if you take sufficiently large samples from a population, the samples’ means will be normally distributed, even if the population isn’t normally distributed.”””””

“samples’ means” What do you think this means?

Normally it means there are a large number of samples and each sample has a means. A distribution of all those means of samples will be normal.

Here is a screenshot from:

https://onlinestatbook.com/stat_sim/sampling_dist/

Look closely at the range of how many choices there are for the number of samples.

Experiment with the size of the samples.

Lastly verify the formula:

SEM • √n = σ

Where SEM is the standard deviation of the sample’s mean and “n” is the number of elements in each sample.

Screenshot_20230701-185532.png
Reply to  Jim Gorman
July 1, 2023 5:31 pm

“define a distribution with zero values”. Go on, tell us how you define that distribution.

OK. A normal distribution with mean 100 and standard deviation 10. A uniform distribution from 0 to 1. A Poisson distribution with λ = 4.

Do you want any more examples?

“samples’ means” What do you think this means?

I’ve told you what that means numerous times before. It’s describing in a simple way what a sampling distribution means. If you ever bothered to read beyond the opening paragraph you would see it says:

Fortunately, you don’t need to actually repeatedly sample a population to know the shape of the sampling distribution. The parameters of the sampling distribution of the mean are determined by the parameters of the population.

Reply to  Bellman
July 1, 2023 6:03 pm

“”””” A normal distribution with mean 100 and standard deviation 10. “””””

How do you have a mean with no values? How do you have a standard deviation without values.
Heck 100 is a value! 10 is a value. An SD of 10 tells you that 68% of the area under the curve lay within an interval of ±10.

Give it up dude. If you know the statistical parameters of the population, then you know the statistics the sample means will have.

The estimated mean will be close the the population mean. The Standard Deviation of the sample means will be “SEM = σ/√n”. Where “n” is the size of the samples (not the # of samples).

You give me a mean and an SD and I’ll draw you a normal distribution that has those. Look up “norm.dist” in Excel.

Reply to  Jim Gorman
July 1, 2023 6:49 pm

How do you have a mean with no values?

This is painful.

The values talked about where the means from a sampling distribution made of of a large number of samples. The claim was you needed all these value to define the distribution. You do not.

If you know the statistical parameters of the population, then you know the statistics the sample means will have.

Yes, yes, yes. That’s what I’m saying. The claim you were defending was that you could only know the distribution by taking many samples.

The estimated mean will be close the the population mean.

It may be, it may not. That’s the point of estimating the sampling distribution, to say how likely you are to be close.

Where “n” is the size of the samples (not the # of samples).

You do realize how you keep saying this as if it’s meant to be a revelation when it’s precisely what I keep telling you.

So do you now accept that it’s wrong to say

One sample does not make a distribution. You can’t define a distribution with just one value. You need multiple samples to have a distribution of samples

or will we have to go through all this again in a few days time when you have forgotten this conversation?

Reply to  Bellman
July 2, 2023 5:40 am

The claim was you needed all these value to define the distribution. You do not.”

In other words, *YOU* can know the unknowable.

Reply to  Tim Gorman
July 2, 2023 9:30 am

No. You can not know what you do not know. But that doesn’t you can no thing about the unknown. I really don’t understand why this a difficult concept for you given it’s the point of all the uncertainty analysis you keep banging on about.

The whole point of uncertainty is to say you do know something about the unknown. If you say a measurement has an uncertainty interval, you are saying you know something. It doesn’t mean you know what the error is, but you are putting limits on that error.

Reply to  Bellman
July 2, 2023 4:47 pm

You didn’t even bother to read what I have posted from Bevington.

The uncertainty interval is typically a GUESS, an educated guess but still a guess, at what interval the true value might lie in. It is not a 100% guaranteed interval. So you are *NOT* putting limits on anything!

You *REALLY* need to start studying both Taylor and Bevington instead of just cherry picking. Both explain that uncertainty in experimental measurements are used to inform others of what to expect if they do the same experiment. The uncertainty specification doesn’t “limit” the possible results in any way, it is only used as a factor in judging how well the measurement describes the measurand for that *one* experiment.

Reply to  Tim Gorman
July 2, 2023 5:09 pm

A great point. How many experiments have been made over the years to measure the speed of light, chemical reaction products under different conditions, the percents of gases in the atmosphere. Every next experiment weighs into the “measured value”.

Reply to  Bellman
July 2, 2023 6:29 pm

You’ll notice I said , IF YOU KNOW THE POPULATION, you can calculate the sample means, and the sample means standard deviation.

If you don’t know the population statistical parameters, you must sample to determine sample mean statistics which can provide estimates if the population statistical parameters.

With one sample, a sample mean distribution does not exist. You may say one sample may give an estimate but it is subject to much error.

Reply to  Jim Gorman
July 2, 2023 6:51 pm

You’ll notice I said , IF YOU KNOW THE POPULATION, you can calculate the sample means, and the sample means standard deviation

And you’ll notice I agreed with you.

If you don’t know the population statistical parameters, you must sample to determine sample mean statistics which can provide estimates if the population statistical parameters.

Yes, that’s the point about sampling. But you only need one sample to do it. I pointed this out where it said this in your own source. Did you see that?

With one sample, a sample mean distribution does not exist.

And we are back to square one.

You may say one sample may give an estimate but it is subject to much error.

It’s generally good enough if the sample size is reasonable. It’s less certain than knowing the population standard deviation, that’s why you have divide by N – 1, and the student distribution. But again, your logic is completely flawed. If you can’t get a large sample, then you can’t get multiple samples. If you can get a large number of small samples, you can combine them into one large sample. It’s completely perverse to say take ten samples of size 20, just to let you know that a sample of size 20 isn’t very good, when you could have had a sample of size 200.

Reply to  Clyde Spencer
June 30, 2023 10:30 am

Sorry missed this comment.

“No, what is being calculated is the mean of the samples from the population. ”

I disagree. You are using the sample to estimate the population mean. That’s the point of sampling. The value you calculate is the mean of the sample, but the purpose is to measure with a degre of uncertainty the population mean.

“Inaccuracy may result from poor calibration, drifting electronics, noise, or a poor sampling protocol.”

Yes, all that and more. But even with perfect measurements and sampling, there is still the inaccuracy that comes from the randomness of a sample.

Reply to  Bellman
July 1, 2023 5:23 am

One sample is *NOT* the point of sampling. You can’t even correctly parse English grammar. “Sampling” – plural. “Sample” – singular,

Reply to  Tim Gorman
July 1, 2023 8:59 am

“You can’t even correctly parse English grammar.”

“Sampling” is a present participle or a gerund, not a plural.

“In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population.”

https://en.m.wikipedia.org/wiki/Sampling_(statistics)

Reply to  Bellman
July 1, 2023 9:22 am

Then tell us what a temperature database contains, samples or is it the entirety of all temperatures everywhere (population). Your answer will go a long way to determining what statistical tools are used.

Reply to  Jim Gorman
July 1, 2023 3:33 pm

Then tell us what a temperature database contains

Any particular one. At a guess temperatures. Possibly other stuff as well such as rain and sunshine.

samples or is it the entirety of all temperatures everywhere (population).

Obviously not all temperatures everywhere. That would take far to long to download.

Your answer will go a long way to determining what statistical tools are used.

Anything else to add, or is that the end of your wisdom for today?

Reply to  Clyde Spencer
June 27, 2023 2:17 pm

It has been made explicit…

If it’s explicit then it isn’t an unspoken assumption. And it’s hardly an assumption, it’s just standard statistics that some here refuse to accept.

Reply to  Bellman
June 27, 2023 8:37 pm

I presume that you are referring to,

It has been made explicit by no less than Nick Stokes that the Law of Large Numbers justifies presenting a calculated mean with more precision than the individual measurements used to calculate the mean.

You apparently don’t have an ear for sarcasm.

Reply to  Bellman
June 27, 2023 5:32 am

You still have no idea of what the term uncertainty truly means. Each data point has an uncertainty. When calculating the residuals, that means each residual also has an uncertainty inherited from the uncertainty in the data points. For a sensitivity analysis assume all the left data is at the high uncertainty and at the right, all the data is at their low points. Whoops, the slope just changed. Now reverse and guess what? The slope changed to the opposite sign. That is uncertainty! You simply don’t know and can never know where the correct values lay.

Reply to  Jim Gorman
June 27, 2023 6:10 am

For a sensitivity analysis assume all the left data is at the high uncertainty and at the right, all the data is at their low points. Whoops, the slope just changed. Now reverse and guess what? The slope changed to the opposite sign.

Yes. That’s exactly what the equations for calculating the uncertainties of the trend line are looking at. They effectively say that if you looked at all possible random changes in the residuals, and what effect they had on the trend line you can calculate how much variation there would be in the trend line.

Now, this is based on simple assumptions, especially the idea of independence, and especially in a time series, more sophisticated methods are needed to correct for the lack of independence amongst other things. And to try to bring this back on topic, that’s exactly the problem with this posts claim about uncertainty including seasonal change. The seasonal changes are not independent.

That is uncertainty! You simply don’t know and can never know where the correct values lay.

Exactly. And I don’t know why you keep shouting it as if it was some sort of gotcha. You do not know what the correct value is. If you did there would be no uncertainty.

Reply to  Jim Gorman
June 26, 2023 9:09 pm

Residuals are calculated for every datapoint as (data minus the estimate – i.e. the line [as described by the equation] for that point) and by definition the mean of the residuals = zero. In addition input data (y) are expected independent of previous data.

Cheers,

Bill Johnston

Reply to  Bill Johnston
June 27, 2023 5:10 am

But remember, EACH data point has an uncertainty associated with it. What this means is that each residual should also have an uncertainty interval. This is a point constantly overlooked when assessing the relevance and uncertainty of the trend line.

What’s worse is when the anomaly is calculated. Each month has an uncertainty interval and each baseline has an uncertainty interval. If these have been calculated as in NIST TN 1900, when the two random variables are subtracted, the two uncertainties^2 are ADDED!

Reply to  Jim Gorman
June 27, 2023 12:19 pm

… and each baseline has an uncertainty interval.

While it isn’t done, one could create a baseline for The Common Era, and define it as being exact, for example, the integer 14.

However, you are correct that if the baseline is defined as the mean for a certain 30-year interval, then the uncertainty for that period should be dealt with in calculating the anomalies.

Reply to  Clyde Spencer
June 27, 2023 1:04 pm

And since the uncertainty is usually larger than the anomaly you simply don’t know if the anomaly is positive or negative. So climate science just ignores the uncertainty and pretends the stated value is 100% accurate with infinite precision.

Crispin in Val Quentin
Reply to  Bellman
June 26, 2023 11:10 pm

Bellman:

In general it is not possible for the uncertainty of an average to be greater than the biggest uncertainty of an individual component “

This is not how uncertainties propagate. The average of two values each of which has an uncertainty is always going to be larger than that of either of the components. Mainly the uncertainty is rooted in the accuracy of the instruments (not their precision). If there are 5 calculation steps to create an output there are 5 increases in the resulting uncertainty. The basics are explained well on Wikipedia.

I agree with the comment above that climate scientists do not report the uncertainty of predictions correctly.

Uncertainties are never reduced by repetition, because it is inherent in the measurement system. All we can say with replication is it improves our calculation of where the middle of the uncertainty range lies. Knowing that to 5 decimal places doesn’t reduce the uncertainty that accompanies each measurement.

Further, the distribution of temperature measurements is not Normal, because radiation is to the 4th power of temperature K. One is much less likely to see a high temperature than a low one. For a variable insolation power that is Normally distributed, the consequent temperature change at each site in quite different, attenuated on the high side by greater radiation.

Reply to  Crispin in Val Quentin
June 27, 2023 4:18 am

The truly sad thing is that you’ll never convince Bellman, bdgwx, et al of the simple fact that uncertainty grows, it doesn’t reduce. They are so tied into the assumption that the average uncertainty is the uncertainty of the average that they can’t see the forest for the trees. They won’t even admit that systematic bias in the measuring system makes it impossible to use statistical analyses of the data. Taylor, Possolo, and Bevington all specifically state this in their books.

Reply to  Tim Gorman
June 27, 2023 5:41 am

The truly sad thing is that you’ll never convince Bellman, bdgwx, et al of the simple fact that uncertainty grows, it doesn’t reduce

You never consider that it may be because your arguments are unconvincing. I’ve given you numerous chances to explain why you think that, but it just comes back to mangled maths, and an assertion that you are right and everyone else is wrong. I’ve asked you to test your claims using simulations, or demonstrate how it could work in the real world. But you just reject this out of hand.

I’ll ask again, if you have 100 thermometers each with an uncertainty of ±0.5°C, and you take the exact average of all of them, how is it possible that the average could have an uncertainty of ±5°C? You need to explain where this extra uncertainty is coming from, because it defies mathematics and logic. An uncertainty of 5°C implies that the average could be wrong by 5°C. But that’s only possible if every thermometer reading was out by 5°C, by the simple definition of an average. But that contradicts the claim that the uncertainty of individual readings was only 0.5°C.

They are so tied into the assumption that the average uncertainty is the uncertainty of the average that they can’t see the forest for the trees.

Says someone who still can’t see we are not saying the average uncertainty is the uncertainty if the average, no matter how many times we explain this. If that was the case the uncertainty of the average would be the same as the individual uncertainties, it would not reduce if you increase sample size. The only person I’ve seen claiming tbhe uncertainty of the average is the average uncertainty is Kip Hansen, and you agreed with him.

They won’t even admit that systematic bias in the measuring system makes it impossible to use statistical analyses of the data. Taylor, Possolo, and Bevington all specifically state this in their books.”

Nobody denies that. It’s just an example of you moving the goal posts and still not justifying you claim.

You were the one who insists that if all uncertainties were random and independent the uncertainty of the average would increase with sample size. Random and independent mean no systematic uncertainties. You then jump to claiming that all the uncertainties might be systematic, but fail to understand that in that case the uncertainty of the average is the average uncertainty. It does not decrease, but it does not increase either.

Reply to  Bellman
June 27, 2023 8:10 am

I’ll ask again, if you have 100 thermometers each with an uncertainty of ±0.5°C, and you take the exact average of all of them, how is it possible that the average could have an uncertainty of ±5°C?”

Because uncertainties ADD. When calculating uncertainties “n” doesn’t divide the sum of data uncertainty. It *adds* to the total uncertainty. And since the uncertainty of a constant is zero the factor “n” cannot reduce or grow the total uncertainty!

According to Taylor Eq 3.18 if

q = x/y

then the uncertainty of q is

u(q)/q = u(x)/x + u(y)/y

It is *NOT* u(q)/q = [u(x)/x] / y

It is the same in Eq 3.18 but doing addition in quadrature.

If x = x1 + x2 + x3 + … + xn

and q_avg = x/n then the uncertainty of q_avg is

u(q_avg)/q_avg = u(x)/x + u(n)/n

and u(x) = u(x1) + u(x2) + u(x3) + … + u(xn)

u(n)/n = ZERO!

And it doesn’t matter if you do direct addition or addition in quadrature. “n” simply does not reduce the uncertainty of the average! It rmains u(x)/x in all cases!

Why you can’t understand this is simply beyond me. Why anyone can’t understand this is beyond me.

You only divide total uncertainty by “n” if you want to know the average uncertainty. This does nothing but spread the uncertainty evenly across all data elements – you still get the total uncertainty when you add them all up.

When working with uncertainties you work with uncertainties. All factors must be uncertainties. You, bdgwx, and the rest of the so-called climate scientists want to divide uncertainty factors by a non-uncertainty factor “n”. Why?

Reply to  Tim Gorman
June 27, 2023 9:06 am

Same old same old. I’m not asking you to demonstrate your midunderstand of the maths. I’m trying to get you to examine how your maths compares with the logic.

If your maths leaves to an illogical conclusion, such as the uncertainty of the mean being a lot larger than the uncertainty of any individual measurement, the sensible thing to do is check that your maths is correct.

“u(q)/q = u(x)/x + u(y)/y”

And you still can’t understand what that means for the uncertainty of an average? It’s not difficult or s trick. You are saying q = x / y. If the uncertainty of y is 0, you have

u(q)/q = u(x)/x

So what is u(q) compared with u(x) if q is smaller than x? It’s s simple question of reletive proportions. If q is equal to x / y than u(w) must be equal to u(x) / y. It’s the only way for the equality to hold.

“Why you can’t understand this is simply beyond me.”

Ditto. But rather than keep going round in circles, might I suggest you try to understand what I’m saying and consider if it’s just possible you’re the one who is misunderstanding something. And if you are still convinced I’m wrong try to demonstrate it with a real testable example that diesn’t involve you substituting a sum for an average.

Reply to  Bellman
June 27, 2023 11:21 am

IT’S STIL AN ADDITION OF UNCERTAINTTIES!

You don’t divide the total uncertainty by a non-uncertainty.

” If q is equal to x / y than u(w) must be equal to u(x) / y. It’s the only way for the equality to hold.”

Where in Pete’s name did you get u(w)? It’s not in any of the math in my post!

You just can’t admit that uncertainties add, can you. So you have to invent things not in the math.

Your argument isn’t with me, it’s with Taylor. It’s his math, right out of his book!

Why don’t you write him and tell him he’s a dunce when it comes to math!

Reply to  Tim Gorman
June 27, 2023 11:34 am

“Where in Pete’s name did you get u(w)? ”

Typo. u(w) should be u(q).

“Your argument isn’t with me, it’s with Taylor. It’s his math, right out of his book!”

You mean book that explains that as a consequence of the equation for division and multiplication, we have the special case that if q = Bx, where B has no uncertainty, then u(q) = B u(x). That book?

Reply to  Bellman
June 27, 2023 11:59 am

You simply can’t read simple English can you? Is your first language English or something else?

If you have a stack of 100 sheets of paper, each and everyone identical then the measurement of the stack will have an uncertainty. The uncertainty of each individual, identical sheet will be u(total)/B. It’s simple addition and you can’t seem to get it.

u(1) + u(2) + … + u100 = total uncertainty.

u(1) = u(2) = u(3) … = u(100) = total uncertainty/100!

If the sheets are not identical then this simply doesn’t apply! You just can’t get it through your head that multiple measurements of the same thing (i.e. individual, identical sheets of paper) is not the same as multiple measurements of different things. Doing so would shake your religious beliefs so you just block it out!

Reply to  Tim Gorman
June 27, 2023 12:46 pm

You simply can’t read simple English can you? Is your first language English or something else?

Could you for once make a point without resorting to some pathetic ad hominem?

What particular point do you think I’ve misread?

If you have a stack of 100 sheets of paper, each and everyone identical then the measurement of the stack will have an uncertainty.

I see you’re still obsessing over that one example rather accept the point it illustrates.

Let me spell this out. Do you agree with Taylor or not that if q = Bx, then u(q) = B u(x), where B is an exact figure with no uncertainty?

If you do, then your argument that dividing a value by n leaves the uncertain of the average the same as the uncertainty of the sum, is plainly wrong. If you don’t agree with it then explain why you think Taylor is wrong.

You of course will admit neither, but instead will keep trying to find spurious loopholes in every example, whilst insisting I don’t speak English good.

Reply to  Bellman
June 27, 2023 12:40 pm

… If your maths leaves to an illogical conclusion, such as the uncertainty of the mean being a lot larger than the uncertainty of any individual measurement, …

Consider the case of a single measurement with an certainty of 0.1 deg C. One takes a large number of measurements starting prior to dawn at Tmin and ending in the late afternoon at the time of Tmax. The mean will have an intermediate value, and the estimate of the uncertainty will be the standard deviation, which increases with every measurement. Even correcting for the increasing mean, the standard error of the mean will increase when the series has a positive trend.

Reply to  Clyde Spencer
June 27, 2023 2:12 pm

The mean will have an intermediate value

I’m not sure what point you are making here. When I talk about the average it’s always going to be over a fixed period of time. It doesn’t make sense to look at continuous intermediate values when the data isn’t stationary.

the standard error of the mean will increase when the series has a positive trend

Maybe, but that’s just becasue your standard deviation is increasing. Again, the assumption is we are taking a random IID sample from a fixed population.

To put it another way, the reason your means may be getting less certain is not becasue you are taking more measurements, it’s because you are taking a mean of a longer more varied period.

Reply to  Bellman
June 27, 2023 8:48 pm

… the assumption is we are taking a random IID sample from a fixed population.

No, the size of the population is open ended. It gets larger every day, without limit, approaching infinity eventually. That is the point of the issue of stationarity. Global temperature data does not have a unique, unchanging value. The statistics that we calculate today will be different from what we calculate at any point in the future.

It doesn’t make sense to look at continuous intermediate values when the data isn’t stationary.

That is exactly my point!

Reply to  Tim Gorman
June 27, 2023 11:30 pm

While uncertainties “ADD”, on average, half of the uncertainties of whatever the 100 values are, are likely to be within -0.5 degC, and half are likely to be within +0.5 degC i.e., they do not add in absolute terms.

Given your scenario, the values would form a distribution from x+0.05 to x -0.05. So if x was say 25 degC, its average would still be 25 degC, and the uncertainty would be close to an average of +/- 0.5 degC.

I have no idea how this would fit with your algebra.

Bill

Reply to  Bill Johnston
June 28, 2023 6:38 am

No, the point about uncertainty that you are missing is that it includes factors that cannot be assumed to go away by averaging a lot of numbers.

Again, go read the GUM terminology.

Reply to  Bellman
June 27, 2023 10:42 am

I’m not going argue. You are trying to conflate measurement error of the same thing with multiple measurements and the same device with measurements of different, but similar things.

They are two different things. Measurement uncertainty that is characterized by multiple measures of the same thing, provided the measures are randomly distributed, will allow the characterization of a mean value and a standard deviation of the distribution that will characterize the range of possible true values.

If you read section 4.2.1 of the GUM, you will see that it pointedly does not require multiple measurements of the same thing, only the same conditions of measurement. In other words, the same experiment at different times with as close as possible similar protocols.

Section 4.2.3 defines experimental standard deviation (uncertainty) of the mean calculations. Note 1 describes the use of the t-distribution to deal with the difference between s²(q_bar) and σ²(q_bar).

This is all discussed in NIST TN 1900 Example 2. It is what should be used to develop the experimental standard uncertainty of monthly temperatures.

Sections B.2.17 and B.2.18 adequately describe the use of the experimental standard deviation of the mean. Note 1 mentions using “multiples” of the experimental standard deviation of the mean as the measurement uncertainty value.

I have provided you with resources that you should read. I refuse to argue with you about your unsupported assertions. If you wish to discuss the references I have given, I suggest you contact NIST to lodge your complaints about their recommendation.

Reply to  Jim Gorman
June 27, 2023 11:49 am

It’s an article of faith with him. NIST is a heretic!

Reply to  Tim Gorman
June 27, 2023 12:37 pm

NIST is a heretic!

What makes you think that. I agree with most of the things I’ve seen from NIST. You’re the one who insisted their method didn’t work in the real world.

Reply to  Bellman
June 27, 2023 11:13 pm

While this discussion has gone well off the rails, I’m thinking that Tim is ignoring that uncertainty is 2-sided, not just a number with a + in front of it. As regression residuals are zero-centred, 50% of their ‘weight’ is negative relative to the line of best fit, and 50% is positive.

The attached is for Mildura (1890 to 2018). On the right are the linear trends, on the left are zero-centred residuals – that part of each signal not explained by trend.

It is unarguable that uncertainty of an average decreases (and the mean becomes steady) as sample size increases and I have done numerous preliminary experiments examining sample number in relation to stability and precision of an estimated mean, including using rating methods to determine feed on offer in field trials. While over-sampling is costly, under-sampling can be fatal.

I have another example based on temperature data for a paired site comparison at Townsville, shown as Figure 8 in https://www.bomwatch.com.au/wp-content/uploads/2023/06/Statistical_Tests_TownsvilleCaseStudy_03June23.pdf

All the best,

Bill

Mildura&Residuals.JPG
Reply to  Bill Johnston
June 28, 2023 6:39 am

I’m thinking that Tim is ignoring that uncertainty is 2-sided, not just a number with a + in front of it.

I can assure you that you are completely wrong here.

Reply to  Bill Johnston
June 28, 2023 7:28 am

It is unarguable that uncertainty of an average decreases (and the mean becomes steady) as sample size increases and I have done numerous preliminary experiments examining sample number in relation to stability and precision of an estimated mean,”

Sorry, that just isn’t the case.

The uncertainty of an average decreases if you are measuring the same thing using the same device under the same conditions.

It’s why racing engines running at 10,000rpm for hours use parts that are machined to exacting tolerances based on multiple measurements using devices calibrated over and over again against a gauge block instead of say -> pistons purchased at the local auto store.

You can go down and buy eight off-the-shelf pistons, measure each one, and come up with an average. But that average may or may not coalesce to an accurate mean that is usable in a racing engine. It’s because you are measuring different things that may not have even come from the same factory let alone the same production run. Some may be too big, some too small, some may weight too much, some may weight too little. The small differences make a big difference at 10,000 rpm.

You leave out the measurement uncertainty for Mildura, why? A regression line based solely on stated values may or may not represent reality.

Reply to  Tim Gorman
June 28, 2023 7:48 am

You can go down and buy eight off-the-shelf pistons, measure each one, and come up with an average. But that average may or may not coalesce to an accurate mean that is usable in a racing engine.

Or just skip the work and use the piston tolerance given by the manufacturer. Disaster city!

Reply to  Crispin in Val Quentin
June 27, 2023 5:28 am

The average of two values each of which has an uncertainty is always going to be larger than that of either of the components.

The trouble is you are making exactly the same mistake as Tim Gorman does, and I’ve been trying to explain to him why he’s wrong for over two years and it’s a futile exercise. So why would I think explaining to you why you are wrong will be any more productive?

The basics are explained well on Wikipedia.

I’ve been through every source thrown at me, and shown how they all agree that the uncertainty of the average is less than the uncertainty of the sum. It follows from the rules for propagating uncertainty when dividing, it follows from the general equation using partial differentials, it follows from understanding how random variables combine. None of this matters because Tim is always capable of misunderstanding and mangling the simplest equation.

Uncertainties are never reduced by repetition, because it is inherent in the measurement system.

Then you are disagreeing with every metrology source that recommends taking an average of several measurements to reduce uncertainty. You are even disagreeing with Tim who insists that the rules do apply if you are measuring the same thing multiple times.

All we can say with replication is it improves our calculation of where the middle of the uncertainty range lies

Yet somehow you think that hasn’t reduced the uncertainty, but increased it.

Further, the distribution of temperature measurements is not Normal, because radiation is to the 4th power of temperature K.

The distribution is irrelevant to Tim’s point which is entirely about measurement uncertainty. I’ve tried to explain that the bigger uncertainty is down to sampling, but I’m just shouted down.

Reply to  Bellman
June 27, 2023 7:51 am

I’ve been through every source thrown at me, and shown how they all agree that the uncertainty of the average is less than the uncertainty of the sum.”

That’s because you are calculating the average uncertainty and not the uncertainty of the average!

It is so simple a six year old can figure it out! If I take 10 boards of length x +/- 1 and try to span a distance of 10x will the beam make it across? You could wind up with a beam that is 10 units too short! The average uncertainty in this case is +/- 1 but the total uncertainty is +/- 10! The average value is x and the total would be 10x – maybe!

The exact same logic applies to temperatures.

Reply to  Tim Gorman
June 27, 2023 8:50 am

“It is so simple a six year old can figure it out! If I take 10 boards of length x +/- 1 and try to span a distance of 10x will the beam make it across? ”

A six year old could also spot your misdirection. If you want to know the total length of 10 boards you are using the sum, not the average.

Reply to  Bellman
June 27, 2023 11:15 am

Uncertainty is *always* a sum. Even if you have a set of totally random measurements of the same thing using the same device under the same environmental conditions the total uncertainty is a SUM. A group of positive uncertainties added to an equal number and values of negative uncertainties = ZERO! (if there are no systematic biases) It’s why the average of a set of random, Gaussian measurements can be averaged to get a true value!

If you don’t get total cancellation then the uncertainty is not the average uncertainty but the sum of the uncertainties.

Why is this so hard to figure out?

Reply to  Tim Gorman
June 27, 2023 11:52 am

“Even if you have a set of totally random measurements of the same thing using the same device under the same environmental conditions the total uncertainty is a SUM. ”

So why woul you ever take an average of multiple measurements? If all it’s going to do is increase the uncertainty, what’s the point?

“an equal number and values of negative uncertainties ”

What I’m the name of Taylor is a negative uncertainty?

I think this is why trying to ban the e-word only causes confusion

“Why is this so hard to figure out?”

Why do you think asking these pointless rhetorical questions is helpful?

The answer is always going to be the same. It ‘s hard to figure out because it’s wrong or because you can’t explain it properly, or both.

Reply to  Bellman
June 27, 2023 12:10 pm

So why would you ever take an average of multiple measurements? If all it’s going to do is increase the uncertainty, what’s the point?”

That *IS* the point! If I grab a board out of the ditch at the corner of Main and Broad Street and a different one at the corner of Washington and Lincoln then what does their average tell me? I won’t have a board of that value. I won’t be able to use the average value to build a dog house with.

Why do you think the average of multiple different temperatures taken under different environmental conditions using different devices is of some value or that it’s uncertainty won’t grow as you add more measurements of different things?

“What I’m the name of Taylor is a negative uncertainty?”

Typical. +/- actually means nothing to you I guess.

Reply to  Tim Gorman
June 27, 2023 1:51 pm

If I grab a board out of the ditch at the corner of Main and Broad Street and a different one at the corner of Washington and Lincoln then what does their average tell me?

You tell me. You’re the one who keeps insisting on doing this sort of thing. I was talking about taking multiple measurements of the same thing to get an a better estimate of the true value, whilst you insist this just makes the average less certain. And to be clear I said this in response to you saying

Even if you have a set of totally random measurements of the same thing using the same device under the same environmental conditions the total uncertainty is a SUM.

Why do you think the average of multiple different temperatures taken under different environmental conditions using different devices is of some value

Your question is why do I think it’s of some value to know or at least estimate how the global temperature is changing? Well, for one thing it keeps Monckton employed with his endless no warming since 3:30 last Tuesday posts, or his realityomiters.

Typical. +/- actually means nothing to you I guess.

± does not mean negative uncertainty. It represents an interval.

Reply to  Bellman
June 27, 2023 12:00 pm

Show where in the GUM, that averaging is done to calculate an uncertainty. Don’t fall back on the fact that µ is the mean and it is divided by a number. That does not imply that the uncertainty sum is also divided by the number of values when dealing with experimental standard deviations.

Look at the Section 4.4.3 example. Do you see the standard uncertainty of the mean being divided by the number of temperature items (20) to obtain an average uncertainty? It says:

The experimental standard deviation s(tₖ) calculated from Equation (4) is s(tₖ) = 1,489 °C ≈ 1,49 °C, and the experimental standard deviation of the mean s(t_bar) calculated from Equation (5), which is the standard uncertainty u(t_bar) of the mean t_bar is
u(t_bar) = s(t_bar) = s(tₖ )/√20 = 0,333 °C ≈ 0,33 °C.

Reply to  Jim Gorman
June 27, 2023 1:09 pm

Show where in the GUM, that averaging is done to calculate an uncertainty.

That’s not something I’ve claimed. I’m just pointing out the deceit in claiming something about the uncertainty of an average and then trying to prove it using an example that is about adding.

All I’m saying is that if you are only interested in measurement uncertainty of an exact average, the rules laid out in Taylor, or in the GUM or anywhere else show that the uncertainty should decrease and certainly shouldn’t increase as sample size increases. This, as we’ve gone through so many times before, is present in equation 10 of the GUM, or more simply can be derived from the standard rules laid out for propagating errors or uncertainty, in particular the one saying that when you divide to values the relative uncertainties add.

Don’t fall back on the fact that µ is the mean and it is divided by a number.

OK. I won’t – mainly becasue I’ve no idea what you are talking about at this point. Why are you dividing the mean by a number?

That does not imply that the uncertainty sum is also divided by the number of values when dealing with experimental standard deviations.

I’m still not sure what point you are making here, but if you want an “experimental standard deviation”, which I think is just the same as a sample standard deviation, then you do have to divide the sum of the squares of the deviations by N – 1, to get the variance, and then take the square root to get the experimental standard deviation.

Do you see the standard uncertainty of the mean being divided by the number of temperature items (20) to obtain an average uncertainty?

No. Why should you. The standard uncertainty of the mean is what we want. No need to divide it by anything.

All that example is doing is what I keep saying – take a number of measurements to get an average. Calculate the sample standard deviation of the results to get the uncertainty of an individual measurement and divide by root N to get the standard uncertainty of the mean. If you wanted to know the uncertainty of the sum you would multiply by root N rather than divide.

At no point do they suggest you use the uncertainty of the sum of the 20 measurements as the uncertianty of the mean.

Reply to  Bellman
June 27, 2023 7:54 pm

All I’m saying is that if you are only interested in measurement uncertainty of an exact average, the rules laid out in Taylor, or in the GUM or anywhere else show that the uncertainty should decrease and certainly shouldn’t increase as sample size increases. 

None of those references say that. You have significantly misinterpreted them. Look closely at both those references. The uncertainty only reduces (cancels) when there are multiple measurements of the same thing, with the same device, and when the “errors” form a Gaussian distribution.

When you only have single readings of different but similar things you are dealing with experimental uncertainty. The readings you collect form a distribution from which a mean and an experimental uncertainty can be calculated. DO NOT MIX THE TWO SCENARIOS.

From the GUM

3.2.2

NOTE 1 The experimental standard deviation of the arithmetic mean or average of a series of observations (see 4.2.3) is not the random error of the mean, although it is so designated in some publications. It is instead a measure of the uncertainty of the mean due to random effects. The exact value of the error in the mean arising from these effects cannot be known.

4.2.2

The individual observations qk differ in value because of random variations in the influence quantities, or random effects (see 3.2.2). The experimental variance of the observations, which estimates the variance σ^2 of the probability distribution of q

The daily Tmax observations for each day of the month are essentially 30 days worth of experiments. THE DATA VARIANCE of those observations determine the uncertainty of the mean that is calculated from those observations. You make the same mistake as many. A measurement requires the definition of a measurand and the data distribution that is used to calculate the mean and standard deviation for that measurand. You can’t define the mean as an average of daily observations and then use a different set of variances to calculate the uncertainty of that mean.

Reply to  Jim Gorman
June 28, 2023 6:45 am

You can’t define the mean as an average of daily observations and then use a different set of variances to calculate the uncertainty of that mean.

None of the trendologists will ever acknowledge this, it would collapse their entire act. Their milli-Kelvin “uncertainty” numbers disappear in a puff of greasy green smoke. (And my boundaries for the set of trendologists has expanded while reading this set of comments.)

Reply to  Jim Gorman
June 29, 2023 8:26 am

Sorry I missed this comment in all noise. It makes an interesting point

DO NOT MIX THE TWO SCENARIOS.”

The thing is, I think I’ve tried my best to keep all the scenarios separate. The problem is whichever scenario I describe someone will jump in to confuse it with another scenario.

Let me lay out some of the scenarios I’ve been interested in:

Scenario 1
The exact average of multiple things.

This isn’t that common, but it was the scenario that started all this. I.e. the average of 100 thermometers, each with a random independent uncertainty of ±0.5°C. If you only want the exact average, the only uncertainty is from the measurement errors, and if all uncertainties are the same this becomes u / √N.

Scenario 2
Measuring the same thing multiple times, with known measurement uncertainty.

The purpose in making multiple measurements is to get an average that will be a more certain estimate of the true measurand than any individual measurement. Uncertainty is u / √N.

Scenario 3
Measuring the same thing multiple times when uncertainty is not known

Same as scenario 2, but rather than using a known or assumed uncertainty, you use the distributions of measurements to estimate the uncertainty. That is use the standard deviation as the measurement uncertainty. So uncertainty of the mean is SD / √N.

Scenario 4
Average of a different things to estimate the population mean.

In this case the uncertainty mainly comes from the randomness of the sample. This comes either from taking random elements from a population, or because each measurement comes from a random distribution. (This is the NIST example for a monthly average).

The method is the same as for scenario 4, it’s just that the standard deviation is now comes from the spread of different values, rather than measurement errors. SD / √N.

All of these uncertainties are assuming random independent samples /measurement errors. Lack of independence can increase the uncertainty, but in no normal case is the uncertainty of the mean grater than the individual uncertainties.

Reply to  Bellman
June 29, 2023 1:27 pm

Just another load of sophistry from one of the gurus of the 1/root-N cult.

Reply to  karlomonte
June 29, 2023 1:38 pm

I take it you didn’t agree. Now if only you could figure out why you don’t agree.

Reply to  Bellman
June 29, 2023 3:19 pm

NUTS

Reply to  karlomonte
June 29, 2023 6:20 pm

I knew there had to be a reason.

Reply to  Bellman
June 29, 2023 8:41 pm

I knew

Liar.

Reply to  Bellman
June 28, 2023 6:31 am

You are even disagreeing with Tim who insists that the rules do apply if you are measuring the same thing multiple times.

But you are not measuring the same thing. You are measuring a defined measurand under similar conditions. IOW, they are not the same thing. Daily temperatures, products of several trials of a chemical reaction, the lengths of a multiple units made by cutting, etc. all involve different but similar things, hopefully under similar conditions.

In those cases, you must find the variation amongst the multiple measurements. That is not necessarily the true value, it is simply the mean of several (or many) experiments.

Reply to  Jim Gorman
June 28, 2023 11:45 am

“But you are not measuring the same thing.”

I was responding to the sentence

“Uncertainties are never reduced by repetition, because it is inherent in the measurement system.”

I took the word “never” to mean under all circumstances. That is including measuring the same thing.

bdgwx
Reply to  Crispin in Val Quentin
June 27, 2023 5:58 am

The average of two values each of which has an uncertainty is always going to be larger than that of either of the components.

NIST and JCGM disagree. In fact, you can prove this statement is wrong very quickly with the NIST uncertainty machine. Use two input quantities x0 and x1 with u(x0) = u(x1) = 2 and enter the measurement model y = (x0+x1)/2 and observe the result of u(y) = 1.4.

Mainly the uncertainty is rooted in the accuracy of the instruments (not their precision).

In that case we say the systematic effect is larger than the random effect. When this happens r(x0, x1) > 0.5 because each measurement will have the same systematic bias.

If there are 5 calculation steps to create an output there are 5 increases in the resulting uncertainty.

That’s not correct. It depends on how the partial derivatives of each step play out and the correlation between inputs. For example, the measurement model y = a – b where r(a, b) = 0.75 and u(a) = u(b) = 2 results in u(y) = 1.4. This is an example where a decrease in uncertainty occurs.

Uncertainties are never reduced by repetition

That’s true for individual measurements. But, it is incorrect for an aggregate of measurements like an average. For example, if the measurement model is y = Σ[x_i, 1, N] / N and u(x) = u(x_i) for all x_i then u(y) = u(x) / sqrt(N) when r(x_i, x_j) = 0.

Knowing that to 5 decimal places doesn’t reduce the uncertainty that accompanies each measurement.

But it does reduce the uncertainty of an average of those measurements.

Further, the distribution of temperature measurements is not Normal

The procedures for propagation of uncertainty do not require a normal distribution.

I encourage you to read through JCGM 100:2008 and play around with NIST uncertainty machine.

Reply to  bdgwx
June 27, 2023 6:21 am

Not this nonsense, again. Give it up.

Reply to  karlomonte
June 28, 2023 12:07 am

Well said, I’m almost on your side.

b.

Reply to  Bill Johnston
June 28, 2023 5:54 am

Arguing with the trendologists is pointless, and bg-whatever luvs to post this “play around with the NIST uncertainty machine” again and again and again and again. And yes, it is nonsense.

bdgwx
Reply to  Bill Johnston
June 28, 2023 8:51 am

Bill Johnston: Well said, I’m almost on your side.

You take the side of random WUWT comments over JCGM, NIST, ISO, UKAS, etc?

bdgwx
June 26, 2023 12:16 pm

Since the one-sigma (1σ) uncertainties basically touch each other, we cannot say that the two trends are statistically different.

While I agree with the statement. I’m not sure I agree with the method. To be honest I’m not sure I understand the method so I’m not challenging it here. Though if the method was assuming an uncorrelated difference as described earlier in the blog post then there may be issue. The way I handle this is to use the AR(1) method or ARMA method similar to how Foster & Rahmstorf 2011 describe. ARMA is probably more representative of the true uncertainty, but I’ve found with global temperature time series that it is not significantly different from the AR(1) method at least over a sufficient period of time, but does add a substantial amount of complexity. As a result I typically only compute the AR(1) corrections to the uncertainty in my own workflow. Both BEST and UAH exhibit about ±0.05 C/decade of AR(1) uncertainty. That is 2σ and so the +0.13 C/decade from UAH and +0.19 C/decade value from BEST are consistent with each other within the uncertainty envelope anyway. I don’t have access to my workflow right now, but when I get time I can report the actual values.

Reply to  bdgwx
June 27, 2023 4:26 am

That is 2σ and so the +0.13 C/decade from UAH and +0.19 C/decade value from BEST are consistent with each other within the uncertainty envelope anyway. I don’t have access to my workflow right now, but when I get time I can report the actual values.”

They are consistent in assuming stated values are 100% accurate with no uncertainty. If you use the real world, there is no way to define whether the trends are up, down, or sideways let alone out to the hundredths digit! You are making the same assumptions that Possolo did in TN1900 – no uncertainty in the data values and therefore the variation in the stated values is the “uncertainty” of the average. In other words, all uncertainty is random, Gaussian, and cancels. All that data has different systematic biases associated with each value. Bevington, Taylor, and Possol all state that when you have systematic biases the data is not amenable to statistical analysis. So what does climate science, and YOU, do? You just assume that it all goes away somehow. That is “science”?

bdgwx
Reply to  bdgwx
June 27, 2023 7:20 am

I don’t have access to my workflow right now, but when I get time I can report the actual values.

From 1979/01 to 2022/12 the trend and AR(1) uncertainty is as follows.

BEST is +0.19 ± 0.04 C/decade
UAH is +0.13 ± 0.05 C/decade

So BEST and UAH have an overlap between 0.15 and 0.18 C/decade.

I should probably note that BEST and UAH are not measuring the same thing so a comparison of this nature should be considered carefully.

June 26, 2023 12:49 pm

How does one calculate the individual thermometer reading for the huge area it is supposed to represent uncertainty?

It’s one thing to say someone’s eyeball is good to +- half or quarter of a degree, or such and such an instrument has such and such an accuracy and such and such a precision, but what about how well any one particular reading represents the thousands of sq kilometers it is meant to?

Reply to  PCman999
June 27, 2023 4:29 am

It doesn’t. It’s why homogenization is joke, an absolute joke. The temperature at any given location is dependent on so many independent variables that it is impossible to impute the value at one location to another. Wind, pressure, humidity, cloud cover, elevation, geography, and terrain are just some of the variables that are involved. And not one single homogenization study I’ve seen considers even one of them let alone all of them.

June 26, 2023 1:07 pm

Willis, the uncertainty you calculate must be about right. Looking to the distribution of trends from the UAH website (see the attached map) ranges from -0.15 to +0.45 C/decade. Although the trend is geographically increasing from South to North, the distribution of values around the average of about 0.15 C/decade looks pretty symmetric. Assuming it is Gaussian, the sigma would be, given this wide 0.6 range, around 0.15 C/decade. Much smaller than that would be difficult to imagine.

Trend_to_201812.PNG
Reply to  ad huijser
June 26, 2023 3:23 pm

Looks like the heater has been shifted northward.

The Dark Lord
June 26, 2023 1:33 pm

the data is not fit for purpose … it is a waste of time to try and calculate the uncertainty of values that are corrupted, manipulated and basically garbage …

Nick Stokes
June 26, 2023 1:35 pm

Willis,
I think the issue here is distinguishing between variability and uncertainty. You say that the residuals with seasonal variation give much more uncertainty in the trend than just the residuals. But as bdgwx says, seasonal variation is not uncertain. It happens every year. You are adding variation.

If you calculate the trend of sin(x), R will give you a trend (which depends on whether there are fractions of a period) and a σ. But the σ is not an uncertainty. There is no randomness. Nothing about sin(x) is uncertain; you could calculate σ analytically.

It only makes sense to identify σ  as uncertainty if you have taken out all aspects that could have been predicted, which is why you should only think about this when you have taken out seasonality.

Nick Stokes
Reply to  Nick Stokes
June 26, 2023 5:56 pm

The same issue of whether you can interpret variability is simpler to see with means. I looked at the daily maxima of Mildura, Victoria. The mean is 23.9°C, and the standard deviation is 7.4°C
But if I deseasonalise by subtracting the mean for each day of the year, I get a standard deviation of 4.3°C
So if I fly into Mildura, how uncertain am I about the max temp that day?
If I have no idea what time of the year it is, 7.4 would be a good estimate.
But if I do, then 4.3
If I know the temperature yesterday, it could be lower.

Mr.
Reply to  Nick Stokes
June 26, 2023 8:53 pm

I spent a week in Mildura one Thursday.

In the airconditioned bar I was holed up in, everyone was speculating how hot it must have been standing in the 400-metres long queue into Centrelink, where most of the local population seemed to be.

(Centrelink being the government unemployment welfare registry)

Reply to  Nick Stokes
June 26, 2023 9:09 pm

Nick,
So your voodoo mathematics know better whether or not a cold front with sleet is going to hit Mildura next day?
You keep failing to include all uncertainty factors in your estimates of uncertainty, including some big ones that are hard to measure.
Similar to the stats work going on with ocean level change. People are leaving out the potentially big factor of the ocean walls and floors changing (we know that they do) by assuming that their motion has no effect on the level measurements. Thus, one can argue for to estimates of uncertainty, one by forgetting geological observation of movement and the other saying its
is too hard to measure so we can ignore it for the time being and hope that Nature will not one day bite our bums with a big earthquake type basin movement.
While on sea level do you have an answer to the allegations that the satellite distance measurement has an accuract of mm to cm, that some say is larger than the change being sought>
From NASA:
Satellite altimeters can measure the height of the ocean from space to centimeter or millimeter accuracy when averaged over the globe. Both measurement methods capture regional trends in sea-level rise, and tide gauges also can provide an approximation of global trends, helping to calibrate satellite measurements. Geoff S
https://sealevel.nasa.gov/faq/21/which-are-more-accurate-in-measuring-sea-level-rise-tide-gauges-or-satellites/

Reply to  Geoff Sherrington
June 27, 2023 4:31 am

You didn’t really expect a reply did you?

Reply to  Geoff Sherrington
June 27, 2023 12:53 pm

People are leaving out the potentially big factor of the ocean walls and floors changing (we know that they do) by assuming that their motion has no effect on the level measurements.

Indeed, the rates of tectonic plate motion and isostatic rebound are on average about an order of magnitude faster than the claimed average rise in sea level.

Reply to  Nick Stokes
June 27, 2023 11:19 am

The mean is 23.9°C, and the standard deviation is 7.4°C

What exactly is the distribution from which the mean is calculated? All 30 days in a month?

But if I deseasonalise by subtracting the mean for each day of the year, I get a standard deviation of 4.3°C

Again, exactly what distribution was used to calculate “the mean for each day of the year”? 30 years of the same day in the year?

Nick Stokes
Reply to  Jim Gorman
June 28, 2023 6:07 pm

All 30 days in a month?”

All 365 days in the year.
You don’t calculate the mean from a distribution. You add up and divide by 365.

30 years of the same day in the year?”
Actually the same day in each year of the record – so 71. Again no distribution is involved.

Reply to  Nick Stokes
June 28, 2023 6:18 pm

Nitpick Nick Strikes Again!

Reply to  Nick Stokes
June 27, 2023 4:32 am

But as bdgwx says, seasonal variation is not uncertain.”

So winters in some years are not colder than other years? So summers in some years are not hotter than other years?

Seasons happen, that’s for sure. But how they vary is *very* uncertain!

Reply to  Tim Gorman
June 27, 2023 5:15 am

So winters in some years are not colder than other years? So summers in some years are not hotter than other years?

Yes, which is why you look at the relative temperatures for that time of year. What’s not uncertain is that summers will generally be hotter than winters. The variation between summer and winter is not random variation.

Reply to  Bellman
June 27, 2023 7:42 am

Your lack of knowledge of the real world is showing again.

The growth in he population of swans in Denmark is correlated with the births of new babies in Denmark. Can I simply average the two values and decrease the uncertainty of the average by subtracting the correlation?

But wait, you say, you can’t average those two things, they are different things!

Well, the temperature in Denver is a different thing from the temperature in Miami. Why wouldn’t you make the same objection – they are different things so you can’t average them and reduce the uncertainty by the seasonal correlation factor?

Reply to  Tim Gorman
June 27, 2023 8:45 am

In my world summers are hotter than winters – it is not a random effect. Nothing in your rambling comment has any relevance to that point.

Reply to  Bellman
June 27, 2023 11:17 am

The VARIANCE of the temperatures in summer are different than the variance of the temperatures in winter. Why do you always want to ignore that fact?

And why didn’t you address the Denmark example? An inconvenient fact perhaps?

Reply to  Tim Gorman
June 27, 2023 12:05 pm

This style of arguing is getting so tedious. Time will make a statement. I’ll correct it. At which point we jump to a completely different argument, and I’ll be accused of ignoring the new argument, regardless of whether it had any relevance to the original point.

Then for good measure he throws in some absurd irrelevance, such as his current obsession with Danish swans, and claim some sort of victory because I haven’t gotten round to trying to figure out what point he’s making this time.

Reply to  Bellman
June 27, 2023 12:58 pm

Malsrky! You never actually address anything. You provide no references. You provide nothing actually showing where I am wrong. You just invoke the argumentative fallacy of Argument by Dismissal.

You can’t even address the correlation between the swans and human births so you just dismiss it by calling it an irrelevance and an obsession. Argument by Dismissal.

Reply to  Tim Gorman
June 27, 2023 1:36 pm

You just invoke the argumentative fallacy of Argument by Dismissal.

Says someone dismissing my argument as Malsrky!

My argument was

What’s not uncertain is that summers will generally be hotter than winters. The variation between summer and winter is not random variation.

I really didn’t think I needed a reference for that, but of you do, try reading the head post – there are graphs and everything.

You can’t even address the correlation between the swans and human births so you just dismiss it by calling it an irrelevance and an obsession.

I’m not addressing it here becasue you have given no relevance as to it’s relevance here. Just what point dfo you think you are making, and how do you want me to address it? I’ve said I’ll accept it’s correct but I assume it’s just a spurious correlation. Do you really think it has the same correlation as summers being hotter than winters?

Still, just for the record could you provide the data or reference that shows a correlation between Danish swan population and birth rates?

Reply to  Bellman
June 27, 2023 12:57 pm

Yes, in general, Summer temperatures are higher than Winter temperatures for the same locality. However, the question is, “How much higher?” How well can one predict what the difference will be next season?

Reply to  Clyde Spencer
June 27, 2023 1:39 pm

Well according to the head post there’s a difference of about 2°C in the UAH data.

comment image

Reply to  Bellman
June 27, 2023 8:55 pm

And you think that 2°C difference is good for an El Niño year?

Rick C
June 26, 2023 2:07 pm

Willis: I did take quite a few statistic courses. My main observation in your analysis is that you show the claimed uncertainties in the whisker plots are at 95% confidence. But then in your analysis you use the 1-Sigma values to show the uncertainty range. To be consistent you should use the 95% confidence 2-Sigma range. That would result on the BE and UAH uncertainties not just touching but overlapping considerably.

Rud Istvan
June 26, 2023 3:15 pm

Good post, WE. You may be self taught, but apparently also well taught.

My personal climate uncertainty favorite comes from NASA concerning their sea level rise satellite measurements. They claim mm SLR precision when the technical manuals for Jason 3 and Sentinel 6 (the two most recent satellites for SLR) say the inherent accuracy is about 3.8 and 3.5 cm respectively. That imprecision cannot be fixed statistically by repeated measurement as they claim. Covered the details in three guest posts here about both ‘birds’ a while ago.

June 26, 2023 3:45 pm

“Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tends to be the difficult ones.”
— Donald Rumsfeld, February 12, 2002

June 26, 2023 3:52 pm

Dear Willis,
 
I am confused regarding your thoughts about uncertainty.
 
Firstly, because UAH and Berkley don’t actually measure temperature in a conventional sense, but provide modeled estimates, one cannot actually know the uncertainty surrounding each observation.
 
Here is an example, while according to UAH temperature is increasing across Australia, none of the hundreds of individual weather station datasets that I have examined as part of my http://www.bomwatch.com.au work, show any discernible trend. So, troposphere temperature is said to be increasing, whereas temperature at the surface, which is where people live, is not, which is weird.      
 
Secondly, the uncertainty of comparing two observations has been and still is the sum of the absolute uncertainties of each. You suggest it is the geometric mean (GM) of the two (Sqrt(u1^2 + u2^2)). I can’t find a reference to that and while the number may be the same as (|u1| + |u2|), why confuse the issue by calculating the GM?.
 
Thirdly, for linear regression (having removed the seasonal signal, which as a cycle has no uncertainty), you may be confused between the uncertainty of the location of the line about the points which is the 95% confidence interval (CI) for the line; and the 95% prediction interval – the uncertainty associated with predicting a value of y based on a new value of x (y is dependent and x is always the predictor). [I don’t know why you toss 1-sigma (one SD of what? – two times (1.96) times the standard error = 68% uncertainty?) into the mix when you can calculate and plot the 95% CI for the line]. Why do these uncertainties increase with time, while the technology of estimating temperature should have improved? 
 
New values are always predicted with a wider PI than suggested by the CI value. However, statistical tests are widely misused. Many people in their eagerness to show a good estimate, use the CI instead of the PI for that purpose or the wrong test. Most people using linear regression also fail to examine regression residuals. An additional point is that the further the distance from the centroid of the data, the more uncertain becomes a prediction.     
 
Finally, in comparing two regression lines the intercepts may be the same, but the slopes may be different. Also, the slopes may be the same, but the intercepts may differ. While easily resolved statistically, I don’t think it can be done graphically as you have depicted. Note that due to the way anomalies are calculated relative to a ‘reference’, intercepts are probably constrained to be the same. I am also not sure whether satellite data are originally calculated as anomalies relative to the same reference period and the cycle is added later, or whether they are originally calculated with the cycle embedded.     
 
I am personally quite dubious about grabbing off-the-shelf data without understanding fully what they mean, and for satellite data I frankly have no idea what they mean of how ‘accurate’ they are.

While I am quite comfortable with temperature data measured in Stevenson screens, satellite data could be just a bunch of numbers joined together for example. It would be useful for somebody to provide a mun-and-dad rundown on how a particular SAT-T value is derived.

 
Yours sincerely,

 
Dr Bill Johnston

http://www.bomwatch.com.au&nbsp;

Reply to  Bill Johnston
June 26, 2023 7:20 pm

Secondly, the uncertainty of comparing two observations has been and still is the sum of the absolute uncertainties of each. You suggest it is the geometric mean (GM) of the two (Sqrt(u1^2 + u2^2)). I can’t find a reference to that and while the number may be the same as (|u1| + |u2|), why confuse the issue by calculating the GM?.

Your point about “observations” is important. the Guide to Uncertainty in Measurements (JCGM 100:2008) and other metrology textbooks allows a geometric mean under certain circumstances such as when random errors of individual measurements that are used to calculate a measurand cancel. In other words PV = nRT, can have too high a temperature and too low a mole measurement so that there is some cancelation.

A mean of a distribution however, is a statistical calculation. It is not a functional relationship where random errors in measurements can cancel. It is more likely that a proper measure of uncertainty is related to the variance of the distribution. NIST TN 1900, Example 2 explains this very well.

bdgwx
Reply to  Bill Johnston
June 26, 2023 7:42 pm

Secondly, the uncertainty of comparing two observations has been and still is the sum of the absolute uncertainties of each.

That’s not right. When y = f(k, j) = k – j and when r(k, j) = 0 then u(y) = sqrt[u(k)^2 + u(j)^2].

You suggest it is the geometric mean (GM) of the two (Sqrt(u1^2 + u2^2)).

Willis is correct when r(k, j) = 0.

 I can’t find a reference to that

JCGM 100:2008 equation 10 for r(k, j) = 0 or equation 16 for r(k, j) > 0.

The correlation between k and j is important. For example, when r(k, j) = 0.5 equation 16 reduces to u(y) = sqrt[u(k)^2 + u(j)^2 – u(k)*u(j)]. In other words the combined uncertainty u(y) is less than the geometric mean.

Reply to  bdgwx
June 26, 2023 9:28 pm

But the uncertainty of an observation (or an estimate) is never zero. It is by definition 1/2 the interval range.

Cheers,

Bill

bdgwx
Reply to  Bill Johnston
June 27, 2023 5:35 am

It doesn’t matter if the uncertainty is zero or not. The formulas work out all the same. Anyway, for observations the uncertainty is a bit more complicated than just 1/2 the interval range.

Reply to  Bill Johnston
June 27, 2023 6:28 am

Plus a whole lot of other factors…

Reply to  bdgwx
June 27, 2023 4:38 am

That’s not right. When y = f(k, j) = k – j and when r(k, j) = 0 then u(y) = sqrt[u(k)^2 + u(j)^2].”

Nit picker. The difference is whether you sum them directly or you sum them in quadrature. Summing vectors is still “summing”!

” In other words the combined uncertainty u(y) is less than the geometric mean.”

Not if – u(k)*u(j) is equal to zero. When measuring different things with different devices you simply cannot assume the measurements are correlated.

steve_showmethedata
Reply to  Bill Johnston
June 29, 2023 1:04 am

“Thirdly, for linear regression (having removed the seasonal signal, which as a cycle has no uncertainty), you may be confused between the uncertainty of the location of the line about the points which is the 95% confidence interval (CI) for the line; and the 95% prediction interval – the uncertainty associated with predicting a value of y based on a new value of x (y is dependent and x is always the predictor).
Willis by using only the residual error variance and a t-statistic to calculate a CI about the regression line for a set of X values is not using either the correct CI for the fitted regression line or the PI for a new observation. In the context he is applying it is the CI of the fitted regression line should be used. Willis states about Figure 6 “Here are those two datasets, with their associated trends and the uncertainties (one standard deviation, also known as one-sigma (1σ) uncertainties) incorrectly calculated via linear regression of the data with the seasonal uncertainties removed.” and further on “However, when we calculate the uncertainties correctly, we get a very different picture”. “Incorrect vs correct” with no proof why he is correct and standard methods are wrong! Given how he is applying regression methods in the inferences about uncertainty which should involve only estimation error and not sampling error as well (see below) he is incorrect in using only the residual error variance and a t-statistic to calculate a CI about the regression line.

The regression is given by Y=Xb + e and when the usual conditions apply, independent and constant variance errors, e, with variance sig2 and a n-size sample so that Y is a vector length n and for simple linear regression X is an n x 2 dim matrix (1st column is a vector of 1’s). Then (1) the estimate of the prediction error about the fitted line conditional on x uses var_hat(Y_hat) = sig2_hat.x’ Inv (X’X)x where Y_hat=x’b_hat so that a 95% CL can be calculated as Y_hat +/- t(n-2).sqrt{var(Y_hat)}. For a new observation, denoted as sample value n+1 from the same population, then the prediction is Y_hat(n+1) = x(n+1)’b_hat + e_hat(n+1). Therefore (2) we have Var_hat{ Y_hat(n+1)}= sig2_hat.x(n+1)’ Inv (X’X)x(n+1)+ sig2_hat with the first term due to estimation error i.e. Var(b_hat) and the second term is the variance of the random error e(n+1)  (i.e. all conditioned on X and x). If the errors are not independent or constant variance we can use GLS estimation by replacing X’X with X’inv(V)X where V may be defined as Q+R (where Q=Z’DZ is defined using LMMs combined with different variance structures expressed by R such as AR(1) etc). So neither (1) nor (2) correspond to using only sig2_hat combined with a t-statistic with appropriate DFs but (1) is the correct method for the inference he is making. Dealing with seasonality is straightforward. We can fit dummy variables as part of the X for each month then fix predictions for one month in particular or fix to the average of the monthly parameters and include the estimation error of the associated parameter estimates within var_hat(Y_hat). Or we could fit a sinusoidal trend or cyclical smoothing spline for the seasonal trend and again fix these terms to hold them constant. That’s assuming the seasonal term does not interact with the X variable of interest.  Willis is calculating the confidence interval incorrectly in the same way of one saying the standard error of a mean is only the square root of the error variance without dividing by the sqrt root of the sample size. The above is well established in any text book on regression analysis but apparently Willis knows better!
A further, common mistake, which Willis has also fallen into, is to carry out a graphical statistical test of the difference between two or more regression lines (with null hypothesis of the difference in expectations of the underlying true lines being zero conditional on a given value of x) using the overlap (accept H0) or separation (reject H0) of lower confidence interval limit for the upper regression line in the paired comparison with the upper limit for the lower regression line. This overlap/no overlap is too conservative substantially falling short of the nominal rejection rate. For the same reasons stats packages that are good at analysing experimental designs (like GenStat my favourite) present a single standard error of the difference for comparing means which should be used in such comparison when combined with the appropriate t-statistic and not by comparing overlap in confidence bounds. (see the article below)
J Vasc Surg2002 Jul;36(1):194-5.
 doi: 10.1067/mva.2002.125015. A brief note on overlapping confidence intervals
Clinical researchers frequently assess the statistical significance of the difference between two means by examining whether the two 95% confidence intervals overlap. The purpose of this brief communication is to illustrate that the 95% confidence intervals for two means can overlap and yet the two means can be statistically significantly different from one another at the alpha = 0.05 level.

Bill also commented: “Finally, in comparing two regression lines the intercepts may be the same, but the slopes may be different. Also, the slopes may be the same, but the intercepts may differ. While easily resolved statistically, I don’t think it can be done graphically as you have depicted.
Agreed that a formal test that the regression lines are identical can be carried using the extra sums of squares principle and F-tests for normally distributed residuals or extra deviance for GLMs and an approximate test using scaled deviances and chi square statistics, when the maximal model is all parameters group-specific vs the minimal model of all parameters common. However, I developed a graphical (i.e. visual) test that generalises the SED approach to regression or curve fitting in general including regression smoothing splines (see http://dx.doi.org/10.1016/j.fishres.2014.05.002  and DOI: 10.1002/aqc.2373 with more detail given in the former paper I can send that paper for anyone interested just request it on ResearchGate https://www.researchgate.net/publication/263092719_A_nonparametric_model_of_empirical_length_distributions_to_inform_stratification_of_fishing_effort_for_integrated_assessments

Reply to  Willis Eschenbach
June 30, 2023 7:31 am

“The uncertainty of the seasonality trend is ≈ 0.”

Why do you think that? If you are just plugging the data into the standard uncertainty function it should a lot of uncertainty, given the size of the seasonal variation.

Of course it isn’t correct to just plug them in because they are not IID, but that’s the problem with using the seasonal data in the first place.

steve_showmethedata
Reply to  Willis Eschenbach
July 4, 2023 6:23 am

Willis. You say “As a result, the uncertainty of the residual trend has to be ~ equal to the uncertainty of the raw data.
I see nothing in your comment that refutes that.”

If you mean by “uncertainty of the raw data” what I think you mean which is my sig2 above and the “uncertainty of the residual trend” is my var_hat(Y_hat) above then saying var_hat(Y_hat) is equal to sig2 is clearly not what I am saying in my point (1) which is that var_hat(Y_hat) = sig2_hat.x’ Inv (X’X)x and does not equal sig_2. I can see no other way of interpreting your maths-free statements when combined with your Figure 6. Therefore your statement “I see nothing in your comment that refutes that” is clearly denying the obvious.

Editor
June 26, 2023 3:56 pm

w. ==> You are talking only the mathematical/statistical “uncertainty of the trend” — without any reference to the real-world uncertainty in the real-world data itself.

This is a deeper philosophy of science/measurement issue.

This true issue is not the mathematics or the statistics — it is how much confidence can we place in the values returned by either the climate model trends or the trends claimed by various temp groups?

Are we willing to bet out children’s economic future well-being on those numbers and trends being correct?

That’s the uncertainty we need to focus on.

June 26, 2023 4:48 pm

Willis;
Having graduated with Honors with a BS in Applied Mathematics and with selection to Pi Mu Epsilon [National Honorary Mathematics Society] I can tell you that you have done a very good job of explaining “Uncertain Uncertainties.” Climate Change Proponents have abused their use of “uncertainty” to the point of ridiculousness.

This reminds me of when I was Project Manager for the Radiation Monitors at a Nuclear Power Plant [50 years ago]. Just a week after finishing the calibration of the Rad Monitors the Resident Engineer wanted to sample a few. Lo and behold several of them were not within “Manufactures” specifications. The problem was that the Manufacturer indicated that they should be within 10% of the value predicted by the “Calibration Test” Radiation Source prediction curve. However, Natural occurring samples of Alpha, Beta and Gama emitter sources do not decay at an exact, by the clock, rate. There is randomness to natural source decay, DPS or CPS, Further, solar activity can also provide a trigger “source” to the Source which will trigger a decay. The Source does not emit an exact number of decays per second, like an NIST calibrated frequency generator, but a Binomial Distribution of DPS [like a pair of dice]. Eventually I concluded that the Rad Monitors needed to be calibrated to fit within the Chi Square Test assessment prediction of the overall stability of each channel analyzer counting system by using a radiation source.

I am sure that there are many similar problems in the process models of Climate Change. The oceanographic data buoys are one. How can you claim the buoy is Accurate to three decimal points when it is calibrated at 70F and used at 30 – 85F

June 26, 2023 5:39 pm

In the JCGM Guide to the Expression of Uncertainty in Measurement (GUM), B.2 Definitions, B.2.17, experimental standard deviation, states:

“NOTE 2 The expression s / sqrt(n) is an estimate of the standard deviation of the distribution of q and is called the experimental standard deviation of the mean.”

“NOTE 3 “Experimental standard deviation of the mean” is sometimes incorrectly called standard error of the mean.”

“Sometimes” is being kind, in climatology it is way over 90% of the time.

Screenshot 2023-06-26 at 6.34.26 PM.png
Reply to  karlomonte
June 26, 2023 6:58 pm

To my dear old friend karlomonte
 
Willis is not talking about the same variate being measured repeatedly, he is discussing variation about a regression line describing the relationship between x and y, x being time and y being temperature anomalies. [s/sqrt(n)] is actually the standard error of a sample mean, where s is the sample standard deviation (estimated sigma) and n is the number of samples. In a regression problem, there is only one sample at each time, thus the example you use does not fit that. There are other reasons for being confused by the definitions (see NOTEs 2 and 3 in your reference).

Duke University present some really useful statistical explanations, overviews and commentary and for regression I particularly recommend: https://people.duke.edu/~rnau/mathreg.htm. Note that they do not recommend Excel for regression analysis but have released a free, more powerful add-in tool here: https://regressit.com/index.html.      
 
The Duke reference points out (abbreviated quote) that “the standard error of the model is the sum of squared errors divided by n-1, rather than n under the square root sign because this adjusts for the fact that a “degree of freedom for error″ has been used up by estimating one model parameter (namely the mean) from the sample of n data points”.
 
Here also from Jim (https://statisticsbyjim.com/regression/standard-error-regression-vs-r-squared/)

  • The standard error of the regression provides the absolute measure of the typical distance that the data points fall from the regression line. S is in the units of the dependent variable.
  • R-squared provides the relative measure of the percentage of the dependent variable variance that the model explains. R-squared can range from 0 to 100%.

 
While time consuming, stats is fun, enjoy!
 

Yours sincerely
 
Dr Bill Johnston

http://www.bomwatch.com.au

Reply to  Bill Johnston
June 26, 2023 7:30 pm

The standard error of the regression provides the absolute measure of the typical distance that the data points fall from the regression line. S is in the units of the dependent variable.

The problem with this is that like most textbooks in statistics, there is never, ever, a discussion of what occurs when the data points have uncertainty themselves A monthly average, regressed with other monthly averages, never addresses the uncertainty in each monthly average. Most statistics experts are mathematicians and have never tried to analyze the results of experiments where the results are NEVER the same between experiments. Try getting a statistician to a regression analysis using data that has data points expressed as x ± y. I’ll bet most simply regress x, find the residual errors and leave it at that.

Reply to  Jim Gorman
June 26, 2023 10:23 pm

Well yes and no Jim. The metaphor you you have chosen appears to be a model of the central tendencies of the data with x (presumably time). However, residuals of raw monthly averages (i.e. the average of ~ 30 daily values for each month) over time will embed the undeducted cycle which although not affecting trend, will result in autocorrelation, which will invalidate inferences about the trend. So for inferences to be valid, the cycle has to be removed (or accounted for within the model, as factor”Month” or using Cos-Sin functions of time).

If you are interested in within-month uncertainty, you should remove day-of-year averages – and there they are – anomalies from the day-of-year cycle; all the uncertainity you could want.

I’m an experimenter Jim. While I have undertaken weather observations my day-job was agronomy/hydrology/soils research. I understand variation in glasshouse and field experiments scaling from individual plants in rows, to small plots, to hectares and between farms. The problem is always to extract a useful signal from the noise. To expose the signal in temperature data, it is vital to deduct the dominant cycles.

It is also the case that if the uncertainty of a mean is excessive, the mean will not be representative of central tendency, with the result that a parametric test will not provide useful information about the process generating the data. (I.e., the series will consist of random numbers that bear no relation to the hypothesis under test. Going down to the next level (to the raw data samples) will increase the variance, but unless you devise some way of cleansing the data, will not necessarily address the problem.

All the best,

Dr Bill Johnston

http://www.bomwatch.com.au

bdgwx
Reply to  Bill Johnston
June 28, 2023 11:28 am

I agree and so does Willis. We both present the uncertainty of the trends both with absolute values and anomaly values. The uncertainty of the trend using absolute values is significantly higher than that for anomalies because the former contains a lot more autocorrelation.

Reply to  Bill Johnston
June 26, 2023 9:46 pm

So Duke U. doesn’t understand measurement uncertainty either, big surprise.

Reply to  karlomonte
June 27, 2023 1:12 am

Dear old karlomonte,
 
While I don’t claim to be a statistician (see the disclaimer at https://bomwatch.com.au/wp-content/uploads/2023/06/Bomwatch-Brisbane-AP-overlap-Backstory-22-June-23.pdf), there appears to be a general lack of clarity (and nomenclature confusion) in what this all means, particularly on a case-by-case basis.
 
Terms including accuracy, uncertainty (and its derivations), measurement error, standard error, standard deviation, confidence interval and prediction interval have specific definitions. Of those, accuracy and uncertainty associated with temperature and rainfall observations seem to be the most commented-on by people who have never observed the weather. While I’m sure you don’t fall into that category, from your response I’m uncertain that you don’t.
 
The good people at Duke U have provided an explanatory memorandum relating to linear regression. Galton it turns-out was quite a character (https://people.duke.edu/~rnau/Notes_on_linear_regression_analysis–Robert_Nau.pdf).
 
However, as Shakespeare might have said, whilst one can take an armchair + warrior to water, drink of the knowledge instilled therein he may not. Such is the challenge of dealing with nah nah warriors in want for a fight.
 
All the work I have done at http://www.bomwatch.com.au has been preceded by an evaluation of methods. The data are there too. In addition to my basic biometry courses, old textbooks, work-related workshops and interacting closely with statisticians and biometricians in understanding the analysis of real-world data, Duke U provided an invaluable service that aided that research. So, in arguing about the stuff you argue about, what don’t you understand?
 
In response to previous posts on WUWT I recently provided comprehensive case studies of using paired t-tests for comparing differences between instruments operating in parallel at Townsville and Brisbane airports (https://www.bomwatch.com.au/bureau-of-meterology/why-statistical-tests-matter/).

Perhaps I also need to examine the use of linear regression for detecting trends over time.

Finally, how can time possibly be causal on temperature?
 
Yours sincerely,
 
Dr Bill Johnston
http://www.bomwatch.com.au
 

Reply to  Bill Johnston
June 27, 2023 6:35 am

Terms including accuracy, uncertainty (and its derivations), measurement error, standard error, standard deviation, confidence interval and prediction interval have specific definitions. 

Yeah, they are all in the GUM, cited above.

That you go to the word “accuracy” is an indication that you don’t understand measurement uncertainty either.

Formal measurement uncertainty uses statistics, but is not a subset thereof. Temperature uncertainty is a lot, lot more than just the smallest scale increment divided by 2 (or whatever).

Reply to  karlomonte
June 27, 2023 7:13 am

You nailed it!

Reply to  Tim Gorman
June 27, 2023 7:52 am

Thnx, Tim.

Reply to  karlomonte
June 28, 2023 2:14 am

Dear karlomonte,

I observed temperature data at an official weather station in rotation with colleagues for almost a decade. Put up your hand if you also have.

Uncertainty is a measurable property of the instrument, while accuracy is the non-measurable property of the eyeball reading the instrument.

Hence I think the factor contributing the greatest uncertainty is accuracy or precision contributed by the observer, not the instrument itself.

Precision can be measured as the frequency of decimal fractions (x.0, x.1, … x.9) per year, which for precisely observed Celsius thermometers should equal 10% per fraction/yr. Further, for a precisely observed instrument the average of those fractions (AvFrac) should equal 0.45 [N(x.o) + N(x.1) + … N(x.9)]/10). I can do a test of the hypothesis that AvFrac is the same over time (=0.45). Often (but not always), step-changes in AvFrac reflect changes in observers or in the instrument.

While this discussion has gone off in numerous unintended directions, weather station observations embed more information about the data generating process, than simply joining the dots, like many seem to assume.

Cheers,

Bill

Reply to  Bill Johnston
June 28, 2023 5:55 am

??

Reply to  Bill Johnston
June 27, 2023 6:46 am

Finally, how can time possibly be causal on temperature?

Your question is a good one. Temperatures vs time plots are time series. The intent is to be able to say with some probability what the future holds by extrapolating from the trend line. I dealt with this throughout my career by attempting to budget expenses, capital, people, etc. It is process filled with uncertainty.

Most time series analysis requires eliminating things that are not causal. I wouldn’t attempt to forecast revenue by averaging the sales of two products whose sale prices are vastly different. Those prices would basically contribute to the uncertainty in each average.

Yet climate science does just this by trying to use a mid-range temperature whose individual (Tmax and Tmin) long term trends have differing values. Example: if I gave you a series of mid-range temperature averages, can you tell me which part is increasing/decreasing? Question: does your local weather people forecast the average temperature or do they forecast high temps and low temps? Do they show an average high temp and an average low temp from past years or do they show the daily average temperature? Do HVAC people design heating and cooling systems based on average daily temperatures?

Climate science tries to convince people that they have good knowledge of what is going to occur based on averages. Why do the models not show forecasts for Tmax and Tmin? Why is the assumption made that Tmax is what is going to increase?

Reply to  Jim Gorman
June 27, 2023 1:12 pm

I would say that, strictly speaking, a time-series is a special case of a spurious correlation. It isn’t time that is actually causing the change in the dependent variable, but it is a convenient proxy because it is measured easily with high precision and can be compared against other variables that can be recorded with similar precision and accuracy. A goal should be to understand the variables that are actually responsible for changes in temperature and humidity with time.

Reply to  Bill Johnston
June 27, 2023 7:52 am

“Perhaps I also need to examine the use of linear regression for detecting trends over time.”

The problem I have with claiming that the data fits within “n” Sigma, is that the data is not binomial. I fully understand how RMS applies to alternating Current. And when used when the Data fits. But get the impression they are hammering a square log into a round hole.

RMS or root mean square current/voltage of the alternating current/voltage represents the D. C. current/voltage that dissipates the same amount of power as the average power dissipated by the alternating current/voltage. For sinusoidal oscillations, the RMS value equals peak value divided by the square root of 2. 

And; “One standard deviation, or one sigma, plotted above or below the average value on that normal distribution curve, would define a region that includes 68 percent of all data points. Two sigma above or below would include about 95 percent of the data, and three sigma would include 99.7 percent. 

However, the data collected are not binomial, they are essentially random or chaotic. Which makes me believe they are using the wrong tool, and it is used as a confusion factor. Would not a Chi Square goodness test be better?

[ I have forgotten a few things as it has been at least 40 years since I have used my math knowledge and that was in developing a Risk Analysis-Acceptance Process.]

Reply to  karlomonte
June 27, 2023 7:12 am

I swear that NONE of these people have ever built a stud wall in a house. I would really hate to see the wavy ceiling drywall that would result.

Reply to  Tim Gorman
June 27, 2023 7:55 am

Or the gaps between the studs and the header…lots of non-support.

Reply to  karlomonte
June 27, 2023 8:37 am

Yep!

June 26, 2023 5:55 pm

Willis,

The largest problem I see is that everyone assumes the data being used is accurate. When using a linear regression and calculating uncertainty, what you are doing is determining how well the trend line fits the data. That is a legitimate uncertainty, but it does not address at all the uncertainty in the data used to calculate the trend line.

Measurement uncertainty deals with the uncertainties in the data points used to calculate the linear regression trend line. That uncertainty is very important also.

Imagine if you will that each data point is 1 degree warmer. Now imagine if each data point was 1 degree lower. What would be the uncertainty when using the original trend where the data points have uncertainty. In other words, the “cloud” of the data points would expand.

You can’t just recalculate trend lines using the different points (higher/lower) because all you will have done is changed the intercept point, and the slope will remain the same. The error in the trend line won’t change at all.

I have spent a lot of time studying NIST TN 1900. It took me a while to understand their definition of a measurand. My background taught me that measuring devices controlled the measurement uncertainty when making dealing with a usually single item. However, experimental uncertainty is important from the standpoint of conducting multiple experiments to determine a most probable result. That explains my concentration on the uncertainty of each and every measurement. Don’t get me wrong, that is important, but you must also carefully define what the measurand actually is.

NIST has done this by defining the monthly average as the measurand that is being assessed. The variation in the days of data is then processed to arrive at an expanded experimental standard uncertainty. They basically treat a months worth of data as the number of experiments used to create the average and its uncertainty.

This will open your eyes to what the uncertainty of the average actually is. It far exceeds what climate science uses as uncertainty. NIST found that using data from a Stevenson shelter at the NIST campus for the month of May, 2012, the Tmax data had an average of 25.6 ±1.8° C 95% confidence level. I should point out that their procedure was done following the procedure defined in the GUM (4.2.3, 4.4.3, G.3.2).

I must reiterate that it took me a period of time to come to grips with the issue not being how accurately and precisely each individual measurement was, but instead, to recognize that the measurand actually being used was the monthly average. Consequently, the uncertainty of that measurand is what should be used to develop an actual uncertainty in the possible trend lines that could be expected from uncertain data.

Editor
Reply to  Jim Gorman
June 26, 2023 6:05 pm

Jim ==> See mine above.

Reply to  Kip Hansen
June 26, 2023 6:12 pm

I read yours after I posted mine. It took me a while to write and may still be unintelligible!
Difficult issues and climate science has done itself no favors by not addressing uncertainty in a scientific way. The NIST document has been around for going on 8 years, but it doesn’t appear to have filtered its way into climate science. Of course, doing so would ruin their millikelvin assertions.

Reply to  Jim Gorman
June 26, 2023 6:40 pm

everyone assumes the data being used is accurate

Nobody does that. All the major temperature data sets publish uncertainty estimates. And anybody who looks at the different data sets can assume that they are 100% accurate or they would all show the same values all the time.

When using a linear regression and calculating uncertainty, what you are doing is determining how well the trend line fits the data.

No you are not. The trend calculation is determining the best fit (for specific definitions of “best”), but the uncertainty int he slope is not determining how well the data fits the slope, the uncertainty of the slope is determining how much certainty there is in the slope (with all the usual caveats and assumptions). How well the data fits the slope (that is the standard deviation of the residuals) is part of that calculation, but it isn’t the purpose of determining the uncertainty of the slope.

Measurement uncertainty deals with the uncertainties in the data points used to calculate the linear regression trend line.

But is largely irrelevant to the uncertainty of the trend line. This is partly because the measurement uncertainty is usually small compared to the random variation in the values themselves, and partly because any errors in measurement are already present in the values themselves. If measurement errors are large compared to natural variation, they just result in larger variation.

Imagine if you will that each data point is 1 degree warmer. Now imagine if each data point was 1 degree lower. What would be the uncertainty when using the original trend where the data points have uncertainty. In other words, the “cloud” of the data points would expand.

Then you have a major systematic error, but the slope would be unchanged.

NIST has done this by defining the monthly average as the measurand that is being assessed.

For the example of a single station, using only maximum temperatures. The measurand in a monthly global anomaly is the average global anomaly over that month.

NIST found that using data from a Stevenson shelter at the NIST campus for the month of May, 2012, the Tmax data had an average of 25.6 ±1.8° C 95% confidence level.”

And what do you think they would have found if they were averaging thousands of stations to form a global average? And what do you think they would have found if they had done an actual global anomaly assessment, involving calculating the anomaly for each station, gridding, adjusting for errors etc? How would you estimate the uncertainty in that case?

Reply to  Jim Gorman
June 26, 2023 7:12 pm

Not entirely correct either Jim. Except for winter and summer when daily temperatures are more normally distributed around the mean, temperature at the start of a month shows a gradient across the month depending on if the trajectory is cooling (autumn) or warming (spring). In Autumn average T is systematically warmer in week(1), and cooler in week(4), and the converse for spring.

If you are working with daily temperatures, best deduct day-of-year averages from respective day-of-year data and work with anomalies. Within-month gradients are mostly ignored in the case of monthly averages.

Kind regards,

Bill Johnston

Reply to  Bill Johnston
June 26, 2023 7:38 pm

Hey, I don’t disagree with you. However, from Tmax and Tmin studies I have done, not even summer and winter distributions are normal unless close to a large body of water that evens temperatures year round. You may have individual months in summer and winter with little variance in given years but not always. I think this can vary greatly based on latitude and other geographical location.

Your point is why I have asked on occasion why annual values are used. It would seem more scientific to use periods that encompass whole seasons.

Reply to  Jim Gorman
June 26, 2023 10:35 pm

Thank goodness; I’m fresh out of puff!

All the best,

Bill

June 26, 2023 8:53 pm

Hi Willis,
Three more posts on uncertainty in customary land surface temperatures was covered a year ago on WUWT, in 3 parts. Here is a link to the first.
https://wattsupwiththat.com/2022/08/24/uncertainty-estimates-for-routine-temperature-data-sets/
In summary, there were 2 main schools of thought. One said that you can do most types of uncertainty stats only if the input numbers were IID (Indepentent Identically Distributed) or approached that purity. The other camp said that the Law of Large Numbers and/or Central Limit Theorem acted to allow less separated data to be used, like combining land and sea data, Tmax with Tmin, surface with lower tropsphere.and even including made-up extrapolations etc. so that their uncertainty estimates allowed division by a number related to number of observations to attain what seems to be a low number for confidence limits.
IIRC, this split of opinion has not been resolved.
It has much the same problem as youi show with your data here.
Geoff S

Reply to  Geoff Sherrington
June 27, 2023 6:43 am

 The other camp said that the Law of Large Numbers and/or Central Limit Theorem acted to allow less separated data to be used, like combining land and sea data, Tmax with Tmin, surface with lower tropsphere.and even including made-up extrapolations etc. so that their uncertainty estimates allowed division by a number related to number of observations to attain what seems to be a low number for confidence limits.

That seems like a strange idea. Elephants and tigers become herbivores so long as you have enough of both.
Are you sure you are representing their camp correctly?

Reply to  MCourtney
June 27, 2023 8:28 am

He is. The CLT and LLN *only* serve* to measure how closely you have calculated the average. They tell you NOTHING about how accurate the average value is.

Dividing by the number of observations ONLY tells you what the average observation is. Dividing the total uncertainty of the observations only tells you what the average uncertainty is. In both cases you lose the variance associated with the data. Variance is somewhat a measure of how accurate the mean is. A large variance implies the mean has a far lower chance of being the true value. What happens when you add random variables? The variance gets added as well. What is the variance from averaging NH temps with SH temps? Anomalies don’t help because the anomalies inherit the same variance of the two factors used to calculate the anomaly.

Every time you do an average you lose data needed to properly judge what the statistical analysis is telling you. Average the daily min/max, then average each of those for a month, then average each of those for a year. You have lost a tremendous amount of data that is truly needed to judge what the final value is telling you! Yet we *never* see true propagation of uncertainty or variance in climate science. How accurately they have calculated all of these averages is substituted instead. Except how accurately you calculate the average has nothing to do with how accurate that average is!

Reply to  Tim Gorman
June 27, 2023 1:18 pm

It shouldn’t be a surprise that when one calculates summary statistics, information is lost in the compression.

bdgwx
Reply to  MCourtney
June 27, 2023 12:28 pm

I suspect Bellman and I are in the “other camp”. That is not a position either of us support.

What Bellman and I support based on the law of propagation of uncertainty is that for a measurement model that computes an average such as y = Σ[x_i, 1, N] / N it is necessarily the case that the uncertainty of y is u(y) = u(x) / sqrt(N) when u(x) = u(x_i) for all x_i and when r(x_i, x_j) = 0 for all combinations of x_i and x_j. This can be found in JCGM 100:2008 equation 10 or 16 or NIST TN 1297 equation A-3 and pretty much all other texts on uncertainty. It can also be verified with the NIST uncertainty machine. The aforementioned equations/tools can be used for correlated inputs when r(x_i, x_j) > 0 as well.

Reply to  bdgwx
June 27, 2023 12:52 pm

And you still continue to abuse the GUM Eq. 10.

Averaging is NOT a “measurement model” because n is not specified!

Reply to  bdgwx
June 28, 2023 5:29 am

Quit cherry picking when you don’t know what you are talking about. F.rom TN1297:

A.4 As an example of a Type A evaluation, consider an input quantity Xi whose value is estimated from n independent observations Xᵢ,ₖ of X obtained under the same conditions of measurement. In this case the input estimate x is usually the sample mean

xᵢ = ᵢ = (1/n) Σ₍₁ ₋ ₙ₎Xᵢ,ₖ

(A-4)

and the standard uncertainty u(x ) to be associated with x is the estimated standard deviation of the mean

u(xᵢ) = s(Ẋᵢ) = [(1/(n(n-1))) Σ₍₁ ₋ ₙ₎(Xᵢ,ₖ – Ẋᵢ)²]½

Look really closely at the formula for the estimated sample mean. It is exactly what you describe as the measurement model, i.e., y = Σ[x_i, 1, N] / N.

However, examine the formula for the estimated standard deviation of the mean. The formula for u(xᵢ) shows no dividing by N as you assert. If it did, s(Ẋᵢ) would be (s(Ẋᵢ)/N, and it is not.

The only way to make everything agree with your assertion is to have:

xᵢ = Ẋᵢ/N = [(1/n) Σ₍₁ ₋ ₙ₎Xᵢ,ₖ]/N

which makes no sense whatsoever.

Reply to  Geoff Sherrington
June 27, 2023 7:09 am

The other camp said that the Law of Large Numbers and/or Central Limit Theorem acted to allow less separated data to be used, like combining land and sea data, Tmax with Tmin, surface with lower tropsphere.and even including made-up extrapolations etc. so that their uncertainty estimates allowed division by a number related to number of observations to attain what seems to be a low number for confidence limits.”

These theorems ONLY tell you how accurately you have calculated the value of the population mean. They do *NOT* tell you whether that population mean is accurate or not. The accuracy of the population mean derives from the uncertainty of the data elements, not from how closely you have calculated the population mean. You may have *all* the population data and can very accurately calculate what the mean is of that data. But if all of your data elements are incorrect then the mean you calculate will be incorrect. If all of your data has uncertainty then the mean you calculate from the data will be uncertain as well.

Large amounts of data does *NOT* decrease uncertainty – UNLESS every one of those data elements is from the same measurand in the same environment and measured by the same thing. And even then you need to justify that the uncertainty is random and Gaussian. You can’t just assume it.

Hivemind
June 26, 2023 11:32 pm

Even though you’ve never taken a class in statistics, I think you should be teaching the subject. Especially to the climate ‘scientists’.

Editor
June 27, 2023 2:46 am

Figure 6 strikes me as curious. How can the oldest data have the least uncertainty.

On a different tack, I would argue that a linear trend is only informative to the point that it is understood whether the effect being measured has a linear characteristic. For example, a relatively small section of a sine wave – even a selected x.5 cycles – has a non-zero linear trend. But that linear trend is meaningless because nothing linear is happening. If our current climate is in the upward part of cyclical activity, then its linear trend is meaningless.

June 27, 2023 3:46 am

Dear Willis and my dear old friend karlomonte,

Thinking about this over the washing-up, the underlying question is: how can the same satellites, carrying the same sensing gizmos over the same time periods derive dissimilar temperature trends when analysed by two different groups?

If gave someone data say for Charleville, and the same supporting information relating to site changes that affected the data (https://www.bomwatch.com.au/bureau-of-meteorology/charleville-queensland/), except by ignoring non-climate changes how could they come-up with a non-zero trend? Likewise for Rockhampton, Meekatharra or Rutherglen. In other words, what is going on that two groups estimating trends in the same base-data do not agree?

As it surely cannot be the data, it can only be about the approach. So how do those approaches differ and which is the most believable, if either?

I don’t understand for example, why UAH does not deduct the untrending influence of the SOI on Australian temperature anomalies.

Perhaps the good folks at Berkley Earth or UAH could explain.

Yours sincerely,

Dr Bill Johnston

http://www.bomwatch.com.au

Reply to  Bill Johnston
June 27, 2023 7:11 am

The uncertainty of the UAH process has never been studied, to my knowledge (and yes, I tried to see what was out there).

The microwave sounding units (MSRs) on the satellites are horn antennas that convert microwave radiation from oxygen molecules in the atmosphere to voltage; these voltage data are then converted to temperatures (in K) using some process I’ve not been able to uncover. For the lower troposphere (LT) microwave frequencies, the temperatures range from roughly 180K to 275K; they are never greater than a few degrees above 0°C. The distribution of the measured temperatures across the globe is highly skewed.

Air temperature in the LT region is of course not constant but decreases linearly with altitude (lapse rate), and the height of the troposphere varies from the equator to the poles. The satellite temperatures are then a convolution of the MSR response function with the lapse rate. High mountain areas are problems.

The satellite sampling is highly non-uniform between the equator and the poles. This is because the satellites are in polar orbits; above 85° latitude the scan spots overlap and UAH does not report data for 85-90 degrees. At high latitudes (~ >70°) grid points are sampled several times each day. Toward the equator by 30° latitude there can be as many as threes days between when a grid point is sampled.

Averages are not of Tmax – Tmin as is done for historic surface data, due to the non-continuous sampling. All points measured during a single month are assembled and averaged, missing days are estimated, filled in.

What has been published about “uncertainty” is a few comparisons between satellite results and radiosonde data, looking only the slopes of linear fits. From these they have estimated tiny numbers for uncertainties of the slopes.

Reply to  karlomonte
June 27, 2023 7:37 am

One more point: the total relative uncertainty of irradiances measured with thermopile instruments (such as pyranometers) is about 5%. If a similar relative uncertainty is assumed for the NOAA satellites, the uncertainty for 250K would be ±12K!

Nick Stokes
Reply to  karlomonte
June 27, 2023 7:22 pm

the uncertainty for 250K would be ±12K!”

So do you actually believe the uncertainty of the UAH global average is ±12K?

Reply to  Nick Stokes
June 28, 2023 5:58 am

Stokes doesn’t understand what an engineering estimate is.

What a surprise.

And notice that Nitpick Nick deceptively left the “If” out of his “quote” of my statement.

Nick Stokes
Reply to  karlomonte
June 28, 2023 1:30 pm

You didn’t answer.
Or do you believe they don’t have similar relative uncertainty? If so, why bring them up?

Reply to  Nick Stokes
June 28, 2023 1:42 pm

Are you really this daft or is this just part of your usual sophistry?

If (there’s that word again) you read everything I wrote, you might grasp (yeah, I know, big assumption) that I specifically stated that the UAH temperature measurement uncertainty is UNKNOWN, and is effected by a great many factors that are simply ignored.

If you had any real-world metrology experience, you might grasp that a 5% relative uncertainty can be a four-minute mile.

Reply to  karlomonte
June 28, 2023 4:01 pm

A 5% error in irradiance doesn’t have a linear dependance on temperature (it’s T^4).

Reply to  Phil.
June 28, 2023 4:12 pm

Oh.

OK.

What is the expanded uncertainty of a hemispherical pyranometer?

Reply to  karlomonte
June 29, 2023 6:04 am

“Unknown” – unless you are a cult member in the religion of CAGW.

Reply to  karlomonte
June 28, 2023 2:28 am

Dear karlomonte,

I follow what you say but how does that relate to surface temperatures?

How can there be no describable trend at Oodnadatta, Sydney Observatory or Darwin et al, but there is a trend in the troposphere?.

Further, how can two groups using precisely the same base-data produce different trends?

Casting everything else aside, those questions are crucial.

Yours sincerely,

Dr Bill Johnston

Reply to  Bill Johnston
June 28, 2023 5:58 am

YOU asked about the UAH, and I gave you the best answer possible.

Reply to  karlomonte
June 28, 2023 1:32 pm

Dear karlomonte,

I get that and thank you for the summary. Putting aside past disagreements and grumpiness, there are still unanswered questions, which I think are crucial.

For instance, I know and have presented unequivocal evidence at http://www.bomwatch.com.au that Blair Trewin and others at the BoM create trends in Australia’s temperature records that roughly mimic UAH. But their trends are BS. (see: https://wattsupwiththat.com/2023/06/26/uncertain-uncertainties/#comment-3739498).

There is no trend and when you think about it there can be no trend. Since we started BomWatch there has not been a peep from Trewin on the Conversation etc. so he knows what he has done and is doing, represents malfeasance in public office.

The trend story has been made up to support the warming narrative and in Australia’s case this started in the late 1980s. I know the history of it and I know of the people who were involved. Those same people helped put together the IPCC, and Neville Nicholls was an editor of AR1, when they hardly had any data to report.They needed to ‘prove’ warming which is where homogenisation came from.

However, there is no actual measurable trend in maximum T data at sites across Australia going back to 1856 or so at some places.

So why is UAH finding a tropospheric trend, when surface data show no trend? I also don’t understand why they don’t remove the obvious SOI signal. The SOI has no trend, but it possibly contributes or amplifies trend in UAH.

I’m not expecting an answer, I am just putting this out there,

Yours sincerely,

Bill Johnston

Reply to  Bill Johnston
June 28, 2023 2:18 pm

The UAH has 10000+ points for the globe (minus the 85° high latitudes), in a 2.5° x 2.5° grid. Australia is approximately 40° latitude by 25° longitude. From this, there are about 10×16 = 160 grid points, many of which overlap the ocean. So only about 2% of the UAH grid points.

The polar-orbit NOAA satellites do not measure Tmax – Tmin at all, not even close. At low latitudes (which is most of Australia) there can be several days between scans by the satellite. Instead they collect whatever points they get for a single month and average them (and don’t report any standard deviations).

They are not measuring the same quantities as the surface stations.

It might be illuminating to see what a single UAH grid point shows, such as the one corresponding to Alice Springs. But it would be difficult to isolate a single point.

Reply to  karlomonte
June 28, 2023 3:18 pm

Thanks karlomonte, So what are they actually measuring and how does this come back as a degC signal?

Regardless, would you not agree it is a conundrum that UAH trend is not replicated in surface temperature records.

b.

Reply to  Bill Johnston
June 28, 2023 3:57 pm

Can you not read? I tried to tell you, not going to type it all out again.

bdgwx
Reply to  Bill Johnston
June 28, 2023 8:02 pm

They are measuring microwave emissions from O2 molecules and through a complex model built on top of other models it is mapped into a meaningful temperature.

Reply to  karlomonte
June 29, 2023 6:33 am

It’s not obvious to me that data samples of a time series where each data point has a different measurement interval (e.g. Kansas City may have a time interval of 8 hours per measurement while Alice Springs may only get measured every 24 hours – if that) can actually be combined into a meaningful data set. There is too much lost data to actually know what is happening in-between time intervals.

A measurement protocol like this becomes much more impacted by weather incidents – increasing the variance of the data and its resulting uncertainty.

If you measure a point out in the South Pacific once a day and a point in Kansas multiple times in a day how do you relate the two? What does the average actually tell you?

It’s why I’ve always been a proponent of measuring temperature at the same GMT times, e.g. once at 0000GMT and once at 1200GMT. You get two snapshots of the complete globe. *That* would give you a far better index of the global conditions over time.

Reply to  Tim Gorman
June 29, 2023 7:38 am

I agree completely; the UAH really isn’t a time-series at all in this respect, the time between samples varies with latitude, the spin of the planet, and the satellite polar orbit. It might get daily Tmax and Tmin above 70-80° latitude, but certainly not toward the equator.

Reply to  Bill Johnston
June 28, 2023 7:09 am

Things you can’t measure directly require a functional relationship with other things related to it in order to determine it. E.g p = nrt/v.

If your functional relationship is different than someone else’s then you will get different results for the same data. That is what is happening. Which one is closest to being right? Who knows? But the inherent uncertainty karlom described legislates against either of them being 100% accurate out to the tenths digit let alone the hundredths digit. If the inherent uncertainty is actually calculated and stated you would find that both are probably within the uncertainty interval! Which means you will *never* know which one is the true value!

Reply to  Tim Gorman
June 28, 2023 3:22 pm

I don’t know what you are talking about Tim and I’m not even sure you know at this point. If you have never observed a thermometer, and seem to know little about least squared regression, why are you an expert

b.

Reply to  Bill Johnston
June 28, 2023 3:58 pm

So now your act is to accuse people of ignorance, trying to paper over your own.

Reply to  Bill Johnston
June 29, 2023 1:03 pm

Bill,

I have run my own personal weather station for 21 years. It has a +/- 0.5C uncertainty interval. I have run numerous studies on the data it records using both the stated values and the uncertainty intervals. You do *NOT* get the same trend line when considering the uncertainty interval possible values. I simply do not assume the stated values are 100% accurate as you apparently do.

Least squares regression is only as good as the data you put into it. If that data has an uncertainty interval then the trend line has uncertainty as well. That’s an inconvenient truth for many to accept. It is *still* the truth.

Reply to  Bill Johnston
June 28, 2023 7:53 am

And more importantly, they are NOT measuring the same quantities! I haven’t looked, but the number of UAH grid points that cover Australia is not large, probably only about 50; coupled with the intermittent satellite coverage, they should be expected to differ.

bdgwx
Reply to  Bill Johnston
June 28, 2023 11:24 am

BJ: How can there be no describable trend at Oodnadatta, Sydney Observatory or Darwin et al, but there is a trend in the troposphere?.

Temperature changes are not homogenous either horizontally or vertically. Some areas or heights may warm or cool more than others.

BJ: Further, how can two groups using precisely the same base-data produce different trends?

RSS, UAH, and STAR do not use the exact same data. But even if they did they each make methodological choices for how to bias correct the observations and infill their grids.

Reply to  bdgwx
June 28, 2023 12:32 pm

Have you not heard?

40 years of expert failure: New NOAA STAR satellite temperatures only show half the warming that climate models do

An all new reanalysis of the STAR satellite data finds markedly lower temperature trends for the last 40 years. The big deal about this is that this third dataset suddenly supports the original UAH satellite data, not the other RSS system, and not the “surface thermometers” sitting near hot tarmacs and absolutely not the climate models.

Too bad you can no longer claim a difference between STAR and UAH. That leaves RSS as the odd man out in satellite temperatures!

bdgwx
Reply to  Jim Gorman
June 28, 2023 1:40 pm

There are differences. For example, STAR has a lower trend from 1979 to 2002 and a higher trend from 2022 to 2022. The trend from 1979-2022 is similar to UAH because the lower trend in the early part partially offsets the higher trend in the later part. In other words, STAR shows more of acceleration than does UAH. In fact, STAR has the highest acceleration in the rate of warming than either UAH or RSS. That is a pretty important difference.

Reply to  karlomonte
June 27, 2023 8:31 am

You totally nailed it. I’m going to save this in a notepad file and use if you don’t mind.

Reply to  Tim Gorman
June 27, 2023 12:15 pm

No problem!

Reply to  Bill Johnston
June 27, 2023 1:32 pm

I recently read about a study where different groups were given the same data, asked to analyze it, and form conclusions. They got very different results, not unlike the Rashomon Effect.