Geoff Sherrington
Part One opened with 3 assertions.
“It is generally agreed that the usefulness of measurement results, and thus much of the information that we provide as an institution, is to a large extent determined by the quality of the statements of uncertainty that accompany them.”
“The uncertainty in the result of a measurement generally consists of several components which may be grouped into two categories according to the way in which their numerical value is estimated: A. those which are evaluated by statistical methods, B. those which are evaluated by other means.”
“Dissent. Science benefits from dissent within the scientific community to sharpen ideas and thinking. Scientists’ ability to freely voice the legitimate disagreement that improves science should not be constrained. Transparency in sharing science. Transparency underpins the robust generation of knowledge and promotes accountability to the American public. Federal scientists should be able to speak freely, if they wish, about their unclassified research, including to members of the press.”
They led to a question that Australia’s Bureau of Meteorology, BOM, has been answering in stages for some years.
“If a person seeks to know the separation of two daily temperatures in degrees C that allows a confident claim that the two temperatures are different statistically, by how much would the two values be separated?”
Part Two now addresses the more mathematical topics of the first two assertions.
In short, what is the proper magnitude of the uncertainty associated with such routine daily temperature measurements? (From here, the scope has widened from a single observation at a single station, to multiple years of observations at many stations globally.)
We start with where Part One ceased.
Dr David Jones emailed me on June 9, 2009 with this sentence:
“Your analogy between a 0.1C difference and a 0.1C/decade trend makes no sense either – the law of large numbers or central limit theorem tells you that random errors have a tiny effect on aggregated values.”
The Law of Large Numbers. LOLN, and the Central Limit Theorem. CLT, are often used to justify estimations of small measurement uncertainties. A general summary could be like “the uncertainty of a single measurement might be +/-0.5⁰C for a single measurement, but if we take many measurements and average them, the uncertainty can become smaller.”
This thinking has to bear on the BOM table of uncertainty estimates shown in Part One and more below.
If the uncertainty of a single reading is indeed +/-0.5⁰C, then what mechanism is at work to reduce the uncertainty of multiple observations to lower numbers such as +/-0.19⁰C? It has to be almost total reliance on the CLT. If so, is this reliance justified?
……………………………………..
Australia’s BOM authors have written a 38-page report that describes some of their relevant procedures. It is named “ITR 716 Temperature Measurement Uncertainty – Version 1.4_E” (Issued 30 March 2022). It might not yet be easily available in public literature.
http://www.geoffstuff.com/bomitr.pdf
The lengthy tables in that report need to be understood before proceeding.
…………………………………………………..
(Start quote).
Sources of Uncertainty Information.
The process of identifying sources of uncertainty for near surface atmospheric temperature measurements was carried out in accordance with the International Vocabulary of Metrology [JCGM 200:2008]. This analysis of the measurement process established seven root causes and numerous contributing sources. These are described in Table 3 below. These sources of uncertainty correlate with categories used in the uncertainty budget provided in Appendix D.
Uncertainty Estimates
The overall uncertainty of the mercury in glass ordinary dry bulb thermometer and PRT probe to measure atmospheric temperature is given in Table 4. This table is a summary of the full measurement uncertainty budget given in Appendix D.
Table 4 – Summary table of uncertainties and degrees of freedom (DoF) [JCGM 100:2008] for ordinary dry bulb thermometer and electronic air temperature probes also referred to as PRT probes.

A detailed assessment of the estimate of least uncertainty for the ordinary dry bulb thermometer and air temperature probes is provided in Appendix D. This details the uncertainty contributors mentioned above in Table 3.
(End quote).
………………………………………..
There are some known sources of uncertainty that are not covered, or perhaps not covered adequately, in these BOM tables. One of the largest sources is triggered by a change in the site of the screen. The screen has shown itself over time to be sensitive to disturbance such as site moves. The BOM, like many other keepers of temperature records, has engaged in homogenization exercises that the public has seen successively as “High Quality” data set that was discontinued, then ACORN-SAT versions 1, 2, 2.1 and 2.2.
The homogenization procedures are described several reports including:
http://www.bom.gov.au/climate/data/acorn-sat/
The magnitude of changes due to site shifts are large compared to changes from the effects listed in the table above. They are also widespread. Few if any of the 112 or so official ACORN-SAT stations have escaped adjustment for this effect.
This table shows some daily adjustments for Alice Springs Tmin, with differences between raw and ACORN-SAT version 2.2 shown, all in ⁰C. Data are taken from
http://www.waclimate.net/acorn2/index.html
| Date | Min v2.2 | Raw | Raw minus v2.2 |
| 1944-07-20 | -6.6 | 1.1 | 7.7 |
| 1943-12-15 | 18.5 | 26.1 | 7.6 |
| 1942-04-05 | 5.8 | 13.3 | 7.5 |
| 1942-08-22 | 8.1 | 15.6 | 7.5 |
| 1942-09-02 | 3.1 | 10.6 | 7.5 |
| 1942-09-18 | 13.4 | 20.6 | 7.2 |
| 1942-07-05 | -2.8 | 4.4 | 7.2 |
| 1942-08-23 | 4.1 | 11.1 | 7.0 |
| 1943-12-12 | 19.8 | 26.7 | 6.9 |
| 1942-05-08 | 7.7 | 14.4 | 6.7 |
| 1942-02-15 | 18.4 | 25.0 | 6.6 |
| 1942-10-18 | 11.7 | 18.3 | 6.6 |
| 1943-07-05 | 2.3 | 8.9 | 6.6 |
| 1942-05-10 | 4 | 10.6 | 6.6 |
| 1942-09-24 | 5.1 | 11.7 | 6.6 |
These differences add to the uncertainty estimates being sought. The are different ways to do this, but BOM seems not to include them in the overall uncertainty. They are not measured differences, so they are not part of measurement uncertainty. They are estimates by expert staff, but nevertheless they need to find a place in overall uncertainty.
We question whether all or even enough sources of uncertainty have been considered. If the uncertainty was as small as is indicated, why would there be a need for adjustment, to produce data sets like ACORN-SAT? This was raised with BOM by this letter.
………………………………………
(Sent to Arla Duncan BOM Monday, 2 May 2022 5:42 PM )
Thank you for your letter and copy of BOM Instrument Test Report 716, “Near Surface Air Temperature Measurement Uncertainty V1.4_E.” via your email of 1st April, 2022.
This reply is in the spirit of seeking further clarification of my question asked some years ago,
“If a person seeks to know the separation of two daily temperatures in degrees C that allows a confident claim that the two temperatures are different statistically, by how much would the two values be separated?”
Your response has led me to the centre of the table in your letter, which suggested that historic daily temperatures typical of many would have uncertainties of the order of ±0.23 °C or ±0.18 °C.
I conducted the following exercise. A station was chosen, here Alice Springs airport BOM 15590 because of its importance to large regions in central Australia. I chose a year, 1953, more or less at random. I chose daily data for granularity. Temperature minima were examined. There is past data on the record for “Raw” from CDO, plus ACORN-SAT versions so far numbered 1, 2, 2.1 and 2.2; there is also the older High Quality BOM data set. With a day-by-day subtraction, I graphed their divergence from RAW. Here are the results.

It can be argued that I have chosen a particular example to show a particular effect, but this is not so. Many stations could be shown to have a similar daily range of temperatures.
This vertical range of temperatures at any given date can encompass a range of up to some 4 degrees C. In a rough sense, that can be equated to an uncertainty of +/- 2 deg C or more. This figure is an order of magnitude greater than your uncertainty estimates noted above.
Similarly, I chose another site, this time Bourke NSW, #48245 an AWS site, year 1999.

This example also shows a large daily range of temperatures, here roughly 2 deg C.
In a practical use, one can ask “What was the hottest day recorded in Bourke in 1993?
The answers are:
- 43.2 +/- 0.13 ⁰C from the High Quality and RAW data sets
- 44.4 +/- 0.13 ⁰C from ACORN-SAT versions 2.1 and 2.2
- or 44.7 +/- 0.13 ⁰C from ACORN-SAT version 1
The results depend on the chosen data set and use the BOM estimates of accuracy for a liquid-in-glass thermometer and a data set of 100 years duration from the BOM Instrument Test Report 716 that was quoted in the table in your email.
This example represents a measurement absurdity.
There appears to be a mismatch of estimates of uncertainty. I have chosen to use past versions of ACORN-SAT and the old High Quality data set because each was created by experts attempting to reduce uncertainty.
How would BOM resolve this difference in uncertainty estimates?
(End of BOM letter)
……………………
BOM replied on 12 July 2022, extract follows:
“In response to your specific queries regarding temperature measurement uncertainty, it appears they arise from a misapplication of the ITR 716 Near Surface Air Temperature Measurement Uncertainty V1.4_E to your analysis. The measurement uncertainties in ITR 716 are a measurement error associated with the raw temperature data. In contrast, your analysis is a comparison between time series of raw temperature data and the different versions of ACORN-SAT data at particular sites in specific years. Irrespective of the station and year chosen, your analysis results from differences in methodology between the different ACORN-SAT datasets and as such cannot be compared with the published measurement errors in ITR 716. We recommend that you consider submitting the results of any further analysis to a scientific journal for peer review.”
(End of part of my reply).
BOM appears not to accept that site move effects should be included as part of uncertainty estimates. Maybe they should be. For example, in the start of the Australian Summer there are often news reports that claim a record new hot temperature has been reached on a particular day at some place. This is no easy task. See this material for the “hottest day ever in Australia.”
The point is that BOM are encouraging use of the ACORN-SAT data set as the official record, while often directing inquiries also to the raw data at one of their web sites. This means that a modern temperature like one today, from an Automatic weather Station with a Platinum Resistance Thermometer, can be compared with an early temperature after 1910 (when ACORN-SAT commences) taken by a Liquid in Glass Thermometer in a screen of different dimensions, shifted from its original site and adjusted by homogenisation.
Thus the comparison can be made between raw (today) and historic (early homogenised and moved).
Surely, that procedure is valid only of the effects of site moves are included in the uncertainty.
………………………………
We return now to the topic of the Central Limit Theorem and the Law of Large Numbers, LOLN.
The CLT is partly described in the BIPM GUM here, a .jpg file to preserve equations:
These words are critical – “even if the distributions of the Xi are not normal, the distribution of Y may often be approximated by a normal distribution because of the Central Limit Theorem.” They might be the main basis for justifying uncertainty reductions statistically. What is the actual distribution of a group of temperatures from a time series?
These 4 histograms below were chosen to find if there was a signature of change from 2 years before Alice Springs weather station changed from Mercury-in-Glass to Platinum Resistance thermometry, compared to 2 years after the change of instrument. Upper pair (Tmax and Tmin) are before the change on 1 November 1996, lower pair are after the change. Both X and Y axes are scaled comparably.
In gross appearance, only one of these histograms visually approaches a normal distribution.
The principal question that arises is: “Does this disqualify the use of the CLT in the way that BOM seems to use it?”
Known mathematics can easily devolve these graphs into sub-sets of normal distributions (or near enough to). But that is an academic exercise, one that becomes useful only when the cause of each sub-set is identified and quantified. This quantification step, linked to a sub-distribution, is not the same as the categories of sources errors listed in the BOM tables above. I suspect that it is not proper to assume that their sub-distributions will match the sub-distributions derived from the histograms.
In other words, it cannot be assumed carte-blanche that the CLT can be used unless measurements are made of the dominant factors contributing to uncertainty.




In seeking to conclude and summarise Part two, the main conclusions for me are:
- There are doubts whether the central Limit Theory is applicable to the temperatures of the type described.
- The BIPM Guide to Uncertainty might not be adequately applicable in the practical sense. It seems to be written more for controlled environments like national standards institutions where attempts are made to minimise and/or measure extraneous variables. It seems to have limits when the vagaries of the natural environment are thrust upon it. However, it is useful for discerning if authors of uncertainty estimates are following best practice. From the GUM:
The term “uncertainty”
The concept of uncertainty is discussed further in Clause 3 and Annex D.
The word “uncertainty” means doubt, and thus in its broadest sense “uncertainty of measurement” means doubt about the validity of the result of a measurement. Because of the lack of different words for this general concept of uncertainty and the specific quantities that provide quantitative measures of the concept, for example, the standard deviation, it is necessary to use the word “uncertainty” in these two different senses.
In this Guide, the word “uncertainty” without adjectives refers both to the general concept of uncertainty and to any or all quantitative measures of that concept.
- There are grounds for adopting an overall uncertainty of at least +/- 0.5 degrees C for all historic temperature measurements when they are being used for comparison to each to another. It does not immediately follow that other common uses, such as in temperature/time series should use this uncertainty.
This article is long enough. I now plan a Part Three, that will mainly compare traditional statistical estimates of uncertainty with newer methos like “bootstrapping” that might well be suited for the task.
…………………………………..
Geoff Sherrington
Scientist
Melbourne, Australia.
30th August, 2022


Geoff,
Do you accept that the NIST uncertainty machine, which uses the technique specified in the GUM document (JCGM 100:2008) produces the correct result when given the measurement model y = f(x1, x2, …, xn) = Σ[xi, 1, n] / n or in the R syntax required by the site (x1 + x2 + … + xn) / n?
Do you accept that the uncertainty evaluation techniques 1) GUM linear approximation described in JCGM 100:2008 and 2) GUM monte carlo described in JCGM 101:2008 and JCGM 102:2011 produce correct results [1]?
The usual nonsense from bgwxyz…edition 674…
You’re counting?
It’s an automated process…
RTFM on the link and it tells you the answer 🙂
So can you read and you tell us the answer because it is expressly told to you.
I’ve read the manual several times. It does not mention Geoff.
BTW…the question is not posed to Geoff per se, but anyone who wants to give their yay or nay to the GUM procedures and NIST uncertainty machine.
What do you think? Do you think the GUM procedures produce the correct result? Do you think the NIST uncertainty machine produces the correct result?
“I’ve read the manual several times. It does not mention Geoff”
There is a difference between reading and seeing words. Someone who thinks such a childish reply is sagacious ain’t going to comprehend much of it.
There seems to be some confusion here. Let me see if I can clear it up now. I’m not asking if NIST thinks their own calculator and GUM procedures produce the correct results. I’m assuming they do since…ya know…it’s their product and they specifically cite the GUM. I’m asking if Geoff accepts the NIST uncertainty machine and the GUM procedures. And like I said above I’m inviting anyone to answer the question.
Just another verse of your song and dance routine that bypasses the truth. You’ve been told the answer to this question innumerable times yet you continue to pluck the same old out-of-tune banjo.
Why is this?
There is no confusion. You are cut and pasting stuff you don’t understand in place of a good argument. Be specific. You read it. Point out how this addresses Geoff’s concerns.
The BOM Near Surface Air Temperature Measurement Uncertainty report relies heavily on the methods and procedures in the GUM. If Geoff (or anyone else for that matter) do not accept the methods and procedures in the GUM then they aren’t going to accept the uncertainty analysis and resulting combined uncertainties in appendix D and any answer to the question: “If a person seeks to know the separation of two daily temperatures in degrees C that allows a confident claim that the two temperatures are different statistically by how much would the two values be separated.”
And no. The NIST uncertainty machine manual does not tell me if Geoff, a random WUWT commentor, or anyone else for that matter accept their product. It would be rather bizarre if they mentioned any of us don’t you think?
What is not being accepted are your faulty interpretations and cherry picking of what you read in the GUM.
HTH
The NIST calculator is for multiple measurements of the SAME THING *only*. It simply doesn’t work for situations where you have multiple measurements that are of different things.
This has been pointed out to you MULTIPLE times and you continue with your delusional view of how to handle uncertainty!
+1000 and they even expressly say that and discuss things you might try if you have problematic measurement or data.
LdB said: “+1000 and they even expressly say that and discuss things you might try if you have problematic measurement or data.”
Where do they say that? And how do you reconcile you belief with their examples where the measurement model combines input quantities of different things?
3.3.2 In practice, there are many possible sources of uncertainty in a measurement, including:
a) incomplete definition of the measurand;
b) imperfect reaIization of the definition of the measurand;
c) nonrepresentative sampling — the sample measured may not represent the defined measurand;
1.2 This Guide is primarily concerned with the expression of uncertainty in the measurement of a well-defined physical quantity — the measurand — that can be characterized by an essentially unique value. If the phenomenon of interest can be represented only as a distribution of values or is dependent on one or more parameters, such as time, then the measurands required for its description are the set of quantities describing that distribution or that dependence.
2.2.3 The formal definition of the term “uncertainty of measurement” developed for use in this Guide and in
the VIM [6] (VIM:1993, definition 3.9) is as follows:
uncertainty (of measurement) parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand
NOTE 1 The parameter may be, for example, a standard deviation (or a given multiple of it), or the half-width of an
interval having a stated level of confidence.
NOTE 2 Uncertainty of measurement comprises, in general, many components. Some of these components may be
evaluated from the statistical distribution of the results of series of measurements and can be characterized by experimental standard deviations. The other components, which also can be characterized by standard deviations, are evaluated from assumed probability distributions based on experience or other information.
NOTE 3 It is understood that the result of the measurement is the best estimate of the value of the measurand, and that all components of uncertainty, including those arising from systematic effects, such as components associated with
corrections and reference standards, contribute to the dispersion.
-combined standard uncertainty
standard uncertainty of the result of a measurement when that result is obtained from the values of a number of other quantities, equal to the positive square root of a sum of terms, the terms being the variances or covariances of these other quantities weighted according to how the measurement result varies with changes in these quantities
2.3.5
expanded uncertainty
quantity defining an interval about the result of a measurement that may be expected to encompass a large fraction of the distribution of values that could reasonably be attributed to the measurand
3.1.4 In many cases, the result of a measurement is determined on the basis of series of observations obtained under repeatability conditions (B.2.15, Note 1).
3.1.5 Variations in repeated observations are assumed to arise because influence quantities (B.2.10) that can affect the measurement result are not held completely constant.
3.1.6 The mathematical model of the measurement that transforms the set of repeated observations into the measurement result is of critical importance because, in addition to the observations, it generally includes various influence quantities that are inexactly known. This lack of knowledge contributes to the uncertainty of the measurement result, as do the variations of the repeated observations and any uncertainty associated with the mathematical model itself.
——————————————————————————
Different temperatures taken using different devices are not repeated observations of the same measurand.
I’m not a scientist and yet even I understand that temperature readings taken at two different times or two different places (or a combination thereof) are not the same thing. What a maroon.
I don’t see how it does measurements at all. The manual clearly states that there must be a y = f{X1 . . . Xn} relationship for the data sets. This doesn’t exist in our case.
JS said: “The manual clearly states that there must be a y = f{X1 . . . Xn} relationship for the data sets.”
y = f(X1 . . . Xn) = (X1 + … + Xn) / n is a functional relationship between the inputs X1…Xn and the output y.
You can write the equation but how do you evaluate it?
What is the value of X1?
What is the value of X1 if it is a function of time and space?
What is the value of Xn?
What is the value of Xn if it is a function of time and space?
How do you combine f(X1(t,s)) with f(X2(t,s))
You *must* know this in order to actually define
y = f(X1…Xn)
And if “n” is the number of elements then it is *NOT* a functional relationship, it is a statistical descriptor.
-A functional relationship is defined by measurands, measurands in – measurands out.
-How do you measure “n” physically?
-What units is it measured in?
Right there is the problem .. LEARN TO READ
They don’t claim it produces the right answer it produces AN ANSWER whether or not it is correct depends on the measurement function F and the actual distribution. They even give examples with problems if you actually read and how you might address them.
Geoff’s post is about “measurement function F” if you want to follow it in the NIST manual.
They discuss all that because they are not eco activist whack jobs or climate scientists and the actual right answer matters to them not the preferred answer to fit a narrative.
His use of the NIST Calculator is a classic example of garbage-in-garbage-out.
I was thinking of the claim that a million monkeys pounding on a typewrite will eventually output Shakespeare writings.
Yes, GIGO!
In budgerigar’s case, anything but garbage out, just as impossible as monkeys writing Shakespeare.
LdB said: “They don’t claim it produces the right answer it produces AN ANSWER whether or not it is correct depends on the measurement function F and the actual distribution.”
First, I think you’re misunderstanding what I mean by “correct”. All I mean by that is that it produces u(y) which is the standard uncertainty of the measurement model Y. Does Geoff, you, or anyone else willing give their yay or nay accept that?
Second, are you suggesting that the NIST uncertainty machine does not produce the correct result when the measurement model is Y = Σ[(1/n)*x_i, 1, n] or more concretely Y = (1/2)*x0 + (1/2)*x1, but that it works in all other cases?
Not a measurement model, this is beyond goofy.
“ measurement model is Y = Σ[(1/n)*x_i, 1, n]”
As MC points out, this is *NOT* a measurement model!
It is a statistical description of a population.
Let me see you plug those numbers into the UM. It won’t work, and I’ll tell you why.
The UM is designed to give the uncertainty in your measurements of something, which are then plugged into the measurement model.
Look at the thermal expansion coefficient example.
To measure the coefficient of linear thermal expansion of a cylindrical copper bar, the length L0 = 1.4999m of the bar was measured with the bar at temperature T0 = 288.15K, and then again at temperature T1 = 373.10K, yielding L1 = 1.5021m.
The measurement model is
A= (L1 – L0) / (L0(T1 – T0))
.Their setup looks like this:
That can’t be done with your equation, because it is not a measurement model. The measurement model for the temperature data set would look something like
L0 = “03-MAR-2021”
L1 = “ASN00023034”
T = L0 somethingsomething L1
It can’t be done, because they’re no relationship between the date and the station, and the temp.
JS said: “Let me see you plug those numbers into the UM.”
Select 4 inputs for simplicity…
y = 0.25*x0 + 0.25*x1 + 0.25*x2 + 0.25*x3
let x0 = 11±1, x1 = 12±1, x2 = 13±1, x3=13±1
The result is y = 12.25 and u(y) = 0.5
I don’t know how UM got u(y) = 0.5. This is not standard uncertainty
I used three different uncertainty calculators and they each got 2.
Doing it manually I get sqrt(1+1+1+1) = 2
go here:
https://nicoco007.github.io/Propagation-of-Uncertainty-Calculator/
or here
https://hub-binder.mybinder.ovh/user/sandiapsl-suncal-web-xa6cvj36/voila/render/Uncertainty%20Propagation.ipynb?token=9oioyqxxSI–5MfMvfuVaA
go here:
https://thomaslin2020.github.io/uncertainty/#/
I’ve even attached a screenshot from one of them. NIST is dividing the standard uncertainty by 4.
That is *NOT* standard uncertainty.
This is where you keep going wrong! (as do they)
Not a surprise considering their temperature averaging example uses sigma/root(N).
Does the equation need to be entered as (x1+x2+x3+x4)/4?
I screwed up the formula. I forgot the parenthesis.
But yes, it’s because they divide by four – thinking they are finding the uncertainty of the average when all they are getting is the average uncertainty. They are *NOT* the same thing.
You are misusing the GUM procedures. The GUM does *NOT* say that you lower the uncertainty by dividing by N.
We have been over this before. From the NIST manual:
“”The NIST Uncertainty Machine (https://uncertainty.nist.gov/) is a Web Based software application to evaluate the measurement uncertainty associated with an output quantity defined by a measurement model of the form y = f(x1,…, xn).””
Please notice the terms “measurement uncertainty” and “measurement model”. A measurement model IS NOT calculating a mean of unrelated measurements. A measurement model is “length + width” or “π•r•r” or something similar.
A mean can provide a better estimate of a “true value” if all values and errors are random (i.e., a normal distribution) and if and only if the same thing is being measured multiple times by the same device.
Section 2 goes on to say:
“”Section 12, beginning on Page 27, shows how the NIST Uncertainty Machine may also be used to produce the elements needed for a Monte Carlo evaluation of uncertainty for a multivariate measurand:””
Please note the term “multivariate measurand”! This doesn’t refer to a mean of many independent measurements of different measurands. It is referring to a measurement that has several underlying measurements such as the volume of a cylinder or the volume of a conical device.
Please quit cherry picking formulas without studying and understanding the assumptions that must be met to use them.
These apply to a known distribution.
There is no issue with using this sort of analysis on measurements with purely random errors. Treating lots of systematic errors like random errors is the issue.
Robert B said: “These apply to a known distribution.”
The NIST uncertainty machine certainly does. The manual even says so. But the GUM only says that the standard uncertainty u(xi) be known. There is no requirement that the distribution of u(xi) itself be known.
Robert B said: “There is no issue with using this sort of analysis on measurements with purely random errors. Treating lots of systematic errors like random errors is the issue.”
NIST TN 1297 says that there are components of uncertainty arising from systematic effects and arising from random effect and that both can be evaluated by either the type A or B procedure in the GUM. You would then combine all components of uncertainty regardless of whether they are systematic or random using “the law of propagation of uncertainty” in appendix A (which happens the same as GUM section 5). If each observation has a different systematic effect like might be the case if they are from independent instruments then you can model that as a random variable (GUM C.2.2) since it would have a probability distribution. In that case you can treat the set of systematic effects as if they were one random effect. Don’t hear what isn’t being said. It isn’t being said that all systematic effects can be treated this way. Nor is is being said that these observations don’t also have a systematic effect that is common to them all.
Nice word salad.
Do you understand any of this? Does “the law of propagation of uncertainty” mean that they are speaking from authority?
Its about calculating an error of a function when the variables have a random error. The scientists and engineers that you want to argue with (belittle?) have been doing it for years in their work. It’s clear that you have read it for the first time this evening.
I’m not sure that I can dumb it down enough for you, but I’ll give it go.
I can tell what the outside temperature is, sheltered from sun and wind, by the clothes that I need to wear. Under 17°C and I’ll need light second layer over my shirt. Over 19°C and anything other than a shirt is too warm. Similar for woolly socks, beanie, scarf etc. Quite a lot of uncertainty but I can assume errors are a symmetrical distribution (including the many systematic errors from my dubious calibration) and if I do the measurement 10 000 times I’ll get the outside temperature to 0.02°C. I can justify it using the arguments above but it’s still effing stupid.
Robert B said: “Do you understand any of this?”
I would never be so foolish as to claim that I understand all of it. But yes, I do understand some of it.
Robert B said: “Does “the law of propagation of uncertainty” mean that they are speaking from authority?”
No. It means they are speaking from the equation σ_z^2 = Σ[(∂f/∂w_i)^2 * σ_i, 1, N] + 2 * ΣΣ[∂f/∂w_i * ∂f/∂w_j * σ_i * σ_j * p_ij, i+1, N, 1, N-1]. They call this equation the “law of propagation of uncertainty”. Some texts call it the “law of propagation of error”.
Robert B said: “The scientists and engineers that you want to argue with (belittle?) have been doing it for years in their work.”
I’m not arguing with the scientists and engineers that developed uncertainty. And I’m certainly not belittling them or anyone else. Why would I? I happily accept the GUM and NIST contributions to the field.
Robert B said: “It’s clear that you have read it for the first time this evening.”
I’ve been studying the GUM ever since I was told that I had to use it early last year. I’ve read it “cover-to-cover” multiple times since then. It’s a lot to take in.
Robert B said: “Quite a lot of uncertainty but I can assume errors are a symmetrical distribution (including the many systematic errors from my dubious calibration) and if I do the measurement 10 000 times I’ll get the outside temperature to 0.02°C.”
I’m guessing that was sarcasm. Right?
When are you going to stop telling lies about what you’ve read in the GUM?
You lie a lot. Very doubtful that you understood anything at all that you have read.
I can gather from others that you have used this source to cut and paste sections as if you are arguing a point, many times before. That all you can add is “They call this equation the “law of propagation of uncertainty”. Some texts call it the “law of propagation of error” outs you as not spending even an hour to grasp any of the concepts – or it’s completely beyond your capabilities.
Belittling of Geoff with a childish comment, and everyone one else reading this with the childish stunt of cut and pasting complex stuff to pretend that you have put forward a insightful argument. You get caught out and you double down on doing it again. Nearly everyone reading this might not have delved deep into the theory but have done some propagation of errors for at least simple functions and so can appreciate much better the arguments than a twit who merely cuts and pastes as if doing a high-school assignment.
“That’s sarcasm right”. That’s how I feel when I see a change in measured average ocean temperatures of a thousandth of a degree being taken seriously because 10 000 measurements automatically reduce the uncertainty by a factor of 100 – or didn’t you realize that is what you are defending?
Robert B said: “Very doubtful that you understood anything at all that you have read.”
I would never be foolish enough to say that I understand everything in the GUM. But I can confidently say that I understood some of it.
Robert B said: “I can gather from others that you have used this source to cut and paste sections as if you are arguing a point, many times before.”
Correct. I use the GUM all of the time. I was told I had to. Geoff cited it in his previous article in the series so I feel like it is a source that Geoff might consider seriously.
Robert B said: “Belittling of Geoff with a childish comment”
Geoff, be honest…brutally honest if you must. Have I offended you in anyway?
Robert B said: ““That’s sarcasm right” That’s how I feel…”
That was a serious question. I’m terrible at sarcasm. I’m so bad, in fact, that my family often calls me Sheldon. I honestly thought it was sarcasm because assessing the temperature by repeated evaluations of the clothing you wear seemed absurd to me. But I’ve learned I should always ask and not assume lest I be accused of putting words in people’s mouths.
You’re terrible at a lot of things.
What’s so hard to understand?
I can also assume that the LAW of large numbers makes it correct. I can also cut and paste equations to support my assertion, when they are merely the theory behind proper error propagation. Arguing that I must disagree with the theory if I disagree with how it’s applied to a specific case is offensive – pigeon on a chess board stuff.
Yes. I am terrible at a lot of things. Picking up on sarcasm is the least of them.
The GUM is hard to understand.
I thought the law of propagation of uncertainty made it correct. Anyway, it seems like you are super knowledgeable when it comes to the GUM. Maybe you could help us go through some examples.
Let y = (1/2)x1 + (1/2)x2, u(x1) = 0.5, and u(x2) = 0.5. Using the methods, procedures, and symbology from the GUM compute u_c(y). I’d be very interested in seeing how you approach the problem if you don’t mind.
Another round of “stump the professor”?
Go play with the NIST web page instead.
Where does the 1/2 come from? What measurement model are you using?
I am having a difficult time coming up with a measurement model where the output would be half of each measurement added together.
If you are trying to justify this as an example of an average then the actual measurements are x1 and x2. The 1/2 is meaningless. The uncertainties then add by root-sum-square.
To use the UM correctly there must be a function f{x} that gives y. In our case, what is the f{“25-March-1996”} that gives 13.4C?
Short answer: there isn’t one. The UM is unsuitable for our purpose.
I’m not sure I’m understanding the intent of using a date as an input. Can you clarify what the goal is here?
Temperatures are time-vaying. In order to have a value to evaluate you must specify a point in time where it will exist.
It’s easy to write y = f(X1…XN), it is far more difficult to actually evaluate X1 … XN. If you can’t evaluate them then you can’t calculate “y”.
You nailed it!
You STILL don’t divide each component in the relationship by the number of elements in order to determine overall uncertainty!
“ If each observation has a different systematic effect like might be the case if they are from independent instruments then you can model that as a random variable (GUM C.2.2) since it would have a probability distribution. “
You are STILL trying to say that this applies to all situations! It ONLY applies if you have multiple measurements of the same thing. If you have multiple measurements of different things you don’t even have a true value which the random variable probability distribution can center on!
Nick Stokes has now hopped onto this same train of stupidity, see below.
Have you ever seen Nick Stokes and bdgrx in the same room at the same time? 🤨
Fortunately, no.
bdgwx,
Define “correct results”.
Geoff S
Correct as in given measurement model Y the NIST uncertainty machine will compute u(Y) that represents the uncertainty of Y.
Correct as in given measurement model Y the procedure described in section 5 can be used to compute u(Y) that represents the uncertainty of Y.
Correct as in given measurement model Y the procedure defined in JCGM 101:2008 (monte carlo) can be used to compute u(Y) that represents the uncertainty of Y.
Correct as in given example E2 in NIST TN 1900 the uncertainty of the average of the Tmax observations u(Tmax_avg) = s/sqrt(m) = 0.872 C.
By your repetition of this garbage over and over, at this point you are just a nutter.
No NIST doesn’t say that is the “right answer” they expressly tell you that in the sections above. It is “an answer” whether it is right or not needs proper evaluation of the measurement function F and the distribution.
It isn’t that hard to understand just read it again.
Just to confirm:
You do not accept the NIST uncertainty machine as being capable of computing u(y) of the measurement model Y. Yes/No?
You do not accept that the methods and procedures in the GUM are adequate to compute u(y) of the measurement model Y. Yes/No?
You put garbage into the NIST web page, you get garbage out of the NIST web page.
Clown, do you think a computer program can tell you the measurement uncertainty of a system for which it knows nothing about the nature or distribution of the measurement errors, let alone what it’s supposed to be measuring?
Yes. The NIST uncertainty machine is capable of doing that. I reviewed the source code and I don’t see anything in there specific to a particular measurement model. That is not surprising since it would be rather odd to program it that way as it would then be limited to only those measurement models programmed. That’s one of the points of the tool. That is it works for any measurement model as long as it can be defined with an R expression.
The problem here is with the operator, who chooses cluelessness.
It’s been pointed out to you MANY times that the NIST machine has no input available for a a data set consisting of different measurements of different things, only multiple measurements of the same thing. It doesn’t even offer a way to enter skewness or kurtosis – primary statistical descriptors for a data set consisting of multiple measurements of different things using different devices.
The NIST machine simply can’t handle data sets generated by multiple measurements of different things. Standard statistical descriptions like mean and standard deviation can’t do it either!
It only works when you have multiple measurements of the same thing.
The average of a quantity of temperatures is *NOT* a measurement model, it is the calculation of a statistical description of a probability distribution.
From the GUM:
Y = f (X1, X2, …, XN ) Eq. (1)
4.1.5 The estimated standard deviation associated with the output estimate or measurement result y, termed combined standard uncertainty and denoted by uc(y), is determined from the estimated standard deviation associated with each input estimate xi, termed standard uncertainty and denoted by u(xi) (bolding mine, tg)
5.1.2 The combined standard uncertainty u_c(y) is the positive square root of the combined variance u_c^2 ( y), which is given by
u_c^2(y) = Σ(∂f/∂x_i) u^2((x_i) where the sum is from 1 to N Eq (10)
You’ve been given this multiple times. When you are doing an average the function Y is:
Y = f(x1, x2, …, xN, N)
The undertainty of each component is considered separately.
Thus the uncertainty formula for temperatures becomes
u_c^2(y) = u^2(x1) + u^2(x2) + … + u^2(xN) = u^2(N)
You continue to want to find the uncertainty of x1/N, x2/N, etc when that is *NOT* what the GUM says. You add the uncertainties of each individual component, including N.
Look up how they handle the relationship between the mass of a spring, gravity, extension, etc.
m = kX/g
They find the uncertainty of each component, “k”, “X”, and “g”. They do *NOT* divide the uncertainty of “k’ and the uncertainty of “X” by g when determining the total uncertainty. The constant “g” does *NOT* lower the uncertainty of “k” and “X”! Which is what you are trying to do by trying to find the uncertainty of x1/N, x2/n, etc.
He doesn’t care, reality doesn’t give him the answer he wants/needs to see.
I don’t know how long he’s been commenting here, but he seems like a basement keyboard warrior and this is how he derives satisfaction.
He is also a dyed-in-the-wool data mannipulator who believes it is possible and ethical to modify old temperature data for “biases”.
This has been an interesting subthread. I was told that I had to use the methods and procedures described in the GUM for assessing uncertainty. Naturally I was expecting an overwhelming yes response to my questions. Yet here we are. Not a single person has said they are willing to accept the evaluation techniques described in the GUM for assessing uncertainty or the NIST uncertainty machine. In fact, there has been a surprising amount of resistance here so far at least.
So this begs the question…since BOM expressly relied heavily on the methods and procedures in the GUM do you accept any of the uncertainty estimates whatsoever from the Near Surface Air Temperature Measurement Uncertainty report? If no, then what methods and procedures do you recommend using?
How about an adjudged uncertainty for the measurement you are taking? Every single measurement has that. They add up. They don’t go away.
Notice how bgwxyz has had exactly zero to say about Geoff’s distribution graphs, instead he endlessly spams this root(N) garbage.
Perhaps you can go into detail. Considering pg. 34 for the PRT uncertainty in the isolated case as starting point what would you do differently? How would you compute the final “Combined Standard Uncertainty”?
Read Pat Frank 2010 and skip down to 2.3.2 Case 3b.
Once you’ve read and understood that, you have permission to speak from all of us.
I’ve read it many times. I’ve even had a few conversations with Dr. Frank regarding it. Thank you for allowing me to discuss this with you.
Considering pg. 34 for the PRT uncertainty in the isolated case as starting point what would you do differently? How would you compute the final “Combined Standard Uncertainty”?
Re-read the paper (again), because somehow you’re asking (again) how to do what the paper tells you to do. Maybe it’s willful helplessness?
Moreover, skip to this part:
“Thus when calculating a measurement mean of temperatures appended with an adjudged constant average uncertainty, the uncertainty does not diminish as 1/sqrt(N).”
CC said: “Re-read the paper (again), because somehow you’re asking (again) how to do what the paper tells you to do.”
Are you saying you would scrap all of the uncertainty analysis on pg. 34 and replace it with σ’u from Frank 2010 pg. 974?
Dr. Frank said: “Thus when calculating a measurement mean of temperatures appended with an adjudged constant average uncertainty, the uncertainty does not diminish as 1/sqrt(N).”
Let’s talk about this. Where did Dr. Frank get the formula σ’u = sqrt[N * σ’_noise_avg^2 / (N-1)]? I’m asking because he cites Bevington 2003 pg. 58 with w_i = 1. Yet that formula is no where to be found on pg. 58 or anywhere in Bevington. I assumed that he derived it from equation 4.22 but I don’t see how since I cannot replicate it nor does he show his work. And he repeats this formula in multiple places. Bellman and I have asked him multiple times where he gets it and all we get is either deflections and diversions or silence. And as best we can tell it is inconsistent with the formulas in the GUM and his own reference Bevington itself.
Again?!?? Let’s not and say we did…
Here are some excerpts from the BOM report.
The last measurand applies to aggregated data sets across many stations and over extended periods. This aggregation further mitigates random errors and is suitable for use in determining changes in trends and overall climatic effects. These are measurements constructed from aggregating large data sets where there is supporting evidence or experience of the performance of the observation systems. Typically, these will be for groups of stations with several years operation and where there are nearby locations to verify the quality of the observations.”
“The last measurand applies to aggregated data sets across many stations and over extended periods. This aggregation further mitigates random errors “
How can it do that? This is the very same problem *YOU* exhibit. When aggregating “many stations” there is simply no guarantee that you will wind up with an identically distributed distribution necessary for random errors to to totally cancel. You may get root-sum-square cancellation but that still grows the uncertainty with each station added, it just doesn’t add up as fast as direct addition. BUT IT DOES *NOT* REDUCE THE TOTAL UNCERTAINTY.
Concerning inspection of the measurement stations:
“The Inspection Process aims to verify if the sensor and electronics are performing within specification. This involves the comparison of the field instrument with a transfer reference. The Inspector makes a judgement on whether the sensor is in good condition and reporting reliably based on this comparison and the physical condition of the sensor. If the inspection differences is less than or equal to 0.3°C, there is no statistically detectable change in the instrument and it will be left in place. A difference of greater than the tolerance of either 0.4 or 0.5°C (see Table 2) implies a high likelihood the sensor is faulty, and it is replaced. In the case of an observed difference of between 0.3°C and the tolerance, the inspector has the authority and training to decide if the instrument needs replacement. This discretion is to allow for false negatives due to the observing conditions. ” (bolding mine, tg)
This means the actual assumed uncertainty that *must* be assumed for each station is at least +/- 3C.
Yet in Table 8 the BOM gives an uncertainty factor ranging from +/- .16C to +/- .23C.
And in Table 9, they show the long term measurement uncertainties as ranging from +/- 0.09C to +/- 0.14C.
As I said, this is no different than what you are trying to push here. Multiple measurements of different things can decrease uncertainty. That is a physical impossibility as well as a violation of the requirements for statistical analysis.
Debunked almost as soon as published….
https://noconsensus.wordpress.com/2011/08/08/a-more-detailed-reply-to-pat-frank-part-1/
Criticized is NOT “debunked.” As of today, Pat Frank’s analysis stands unrefuted.
“unrefuted?”
I spoon fed you a link that does just that. And FYI, the paper is essentially uncited – the method that we use scientific literature to advance science – as his alt,statistics are justifiably rejected in superterranea. Even counting his multiple pocket pool self cites….
Word salad time!
Oh, malarky!
Take just one criticism: “This standard deviation is the difference betweeen true temepratures at different stations so it is again ‘weather noise’.”
Since when is natural variation, i.e. standard deviation, “weather noise”? This is just a meaningless criticism, not a refutation of anything!
Take this one: “Again, ‘s’ is defined as a ‘magnitude’ uncertainty and is incorrectly calculated.”
The correct calculation is never given. So this is just one more unsupported criticism and is not an actual refutation of anything.
Or this one: “This is where Pat makes the claims that stationary noise variances are unjustified. This is key to the conclusions and equations presented throughout the paper. I also believe that has problems but they are more subtle than the more obvious problems of the previous equations. ”
Again, a criticism with no actual backup shown. WHY is the claim unjustified? Just saying it is means nothing.
As usual, you are just spouting religious dogma and claiming it to be the “truth”. This isn’t a religious issue, it’s a scientific issue. Actual refutation must be shown.
BTW, did you read *ANY* of Pat’s rebuttal messages at all?
“BTW, did you read *ANY* of Pat’s rebuttal messages at all?”
Long ago. He, and a very few of you here are in your own alt.world. Complete with claiming “victory” and walking away.
The best metric is the Darwinian lack of response to his papers. In spite of the Dr. Evil conspiracy theories that have all climate scientists in a secret cabal, if he had any valid points, the above grounders would be running over each other to cite him. Instead, he channels the regular Simpson’s scene where Homer says something so ridiculous that 15 seconds of silence ensues, followed by Maggie changing the subject.
Oh, BTW, I just read him going full QAnon in another thread, on COVID. W.r.t. his health, he seems fine from the last pic I saw. A little sun, some crunches, easy peasy…
What is “QAnon”, blob? Sounds like ALAnon…
In other words you have absolutely NOTHING to offer in rebuttal. Just more ad hominems.
ROFL!!
It is unrefuted because climate science fails to understand they are dealing with measurements. They wouldn’t last a minute in a job where tolerances of individual parts are a required part of maintaining employment.
They would measure the tolerance of 16 bearings from all over the machine, average them and work out the uncertainty, and then claim they knew the tolerance of each bearing to 0.001 inches.
I have used several examples like that.
One was a mechanic who ran around the shop measuring all the brake rotors and then telling the customer his rotors needed replacing because the average measurement showed too much wear.
Or the quality guy tracking door production who had a mean of 7′ and an SD of +/- 2″ yet divided the SD by the sqrt of the sample size and told the boss that they were putting out doors of 7′ that were accurate to +/-0.01″.
JS said: “They would measure the tolerance of 16 bearings from all over the machine, average them and work out the uncertainty, and then claim they knew the tolerance of each bearing to 0.001 inches.”
No they wouldn’t. No one is saying that u(X_avg) can be used as an estimate of u(X_i). They are completely different things.
The only statement being made by scientists of all disciplines (climate or otherwise) relevant to this discussion is that u(X_avg) < u(X_i) when the sample X has correlations r(X_i, X_j) < 1. And when r(X_i, X_j) = 0 and u(x) = u(X_i) for all X_i then u(X_avg) = u(x)/sqrt(n).
The contrarians here are distorting and misrepresenting that statement to form the erroneous strawman argument that increasing n decreases u(X_i) as a way of undermining the entirety of the theory of uncertainty analysis developed over at least the last 80 years. I will say they’re doing a good job as not a single commentor including the article author will accept the NIST uncertain machine results or the methods and procedures defined in the GUM. They even have the guy who posted a real NIST certificate questioning the methods and procedures NIST uses.
Liar, no one is doing anything of the sort.
Looking at the Uncertainty Machine website, there is the description of the purpose of the Machine, thus:
Now, what function f is there that produces temperature observations based on some other variable? To use the Machine with these numbers must be wrong, as there is no “algorithm that, given vectors of values of the inputs, all of the same length, produces a vector of values of the output”.
JS said: “Now, what function f is there that produces temperature observations based on some other variable?”
There are countless functions that can produce a temperature output from other variables. For example you could do y = h / ((R/g)* ln(p1/p2)) to produce the mean temperature between between pressure surfaces p1 and p2 with thickness h. Or you could do y = PV/nR to produce the temperature of an ideal gas given pressure (P), volume (V), gas constant (R), and moles (n). Or you could do y = (Tmin + Tmax) / 2 to produce the daily mean temperature. The possibilities are endless.
JS said: “To use the Machine with these numbers must be wrong, as there is no “algorithm that, given vectors of values of the inputs, all of the same length, produces a vector of values of the output””
This is an R thing. The only requirement for y is that it be a function that generates an output with the same vector length as its inputs. 99% of the time you’ll be plugging in scalar values as inputs and producing a single scalar output. The concept of vectors is really more of an advanced use case. And that’s it. That’s the only requirement. You are free to make y as arbitrarily complex as you want as long as it can be defined with an R expression. It can even be so complicated that it executes an algorithm with loops, decision trees, calls to other R functions, etc. To say the NIST uncertainty machine is powerful is a massive understatement.
Temperature observations are not dependent upon the day on which they are recorded. What is the function that can be applied against 1 May to give 17.3 C and against 3 May to get 19.5 C and 4 May to get 10.6 C and etc.? A computer can take 1,3, and 4 and calculate a function f that will come arbitrarily close to 17.3, 19.5, and 10.6, but it will have no predictive power.
And I think that’s a big part of the problem.
JS said: “What is the function that can be applied against 1 May to give 17.3 C and against 3 May to get 19.5 C and 4 May to get 10.6 C and etc.?”
Assuming these temperature values are daily means you have a couple of options. The first is the simplest and most widely used.
y = (Tmin + Tmax) / 2
The second can be used when you have a much larger sample of temperature observations throughout the days. w_i is the weight given to each temperature. For example, if you have 24 hourly observations w_i = 1 h for all T_i.
y = Σ[w_i * T_i] / Σ[w_i]
The methods and procedures defined in the GUM and the NIST uncertainty machine work perfectly fine with both cases.
Just like Nitpick Nick, you ignore JS’ main point.
Typical of trendology.
NOTHING YOU LISTED IS AN AVERAGE EXCEPT (Tmin+Tmax)/2. And THAT IS NOT A MEASUREMENT!
NOTHING YOU LISTED IS A STANDARD DEVIATION OF THE SAMPLE MEANS!
And, once again, the NIST machine is not powerful enough to handle distributions with skewness and kurtosis. So just how powerful can it be?
If you have a skewed population the standard deviation is pretty much useless. Unless you are a climate scientist I guess.
“No they wouldn’t. No one is saying that u(X_avg) can be used as an estimate of u(X_i). They are completely different things.”
Again, you aren’t calculating the uncertainty of the average, you are calculating the average uncertainty. You get a value that, when multiplied by “N”) gives you back the total uncertainty. You already had to calculate the total uncertainty in order to divide it by “N” so what is the purpose?
“ u(X_avg) < u(X_i) when the sample X has correlations r(X_i, X_j) < 1. And when r(X_i, X_j) = 0 and u(x) = u(X_i) for all X_i then u(X_avg) = u(x)/sqrt(n).”
Again, you are *NOT* calculating u(X_avg), you are calculating the average uncertainty.
If m = kX/g then does u^2(m) = u^2(k/g) + u^2(X/g) ???
“The contrarians here are distorting and misrepresenting that statement to form the erroneous strawman argument that increasing n decreases u(X_i) as a way of undermining the entirety of the theory of uncertainty analysis developed over at least the last 80 years.”
Stop whining. No one is saying that
They are saying that the average uncertainty is not the same thing as the uncertainty of the average.
You’ve been given example after example that uncertainty can’t be decreased by just dividing by a constant. The uncertainty of a constant is zero and it neither adds to or subtracts from the total uncertainty of the average.
The example that breaks your method is if all individual uncertainties are the same. In that case u(X_avg) will equal u(X_i) no matter how large your sample is. That’s because you are calculating the average uncertainty and not the uncertainty of the average!
“No they wouldn’t. No one is saying that u(X_avg) can be used as an estimate of u(X_i). They are completely different things.”
Then what use is u(X_avg)? How can it tell you if the distribution changed or not?
The uncertainty of X_avg is *NOT* the standard deviation of the sample means. It is not the standard deviation of the population unless you have a normal distribution or something similar.
“ I will say they’re doing a good job as not a single commentor including the article author will accept the NIST uncertain machine results or the methods and procedures defined in the GUM. “
Stop lying. What we won’t accept is YOUR MISUSE of the GUM.
“And when r(X_i, X_j) = 0 and u(x) = u(X_i) for all X_i then u(X_avg) = u(x)/sqrt(n).”
Again, this appears to be the standard deviation of the sample means. That is *NOT* the accuracy of the mean!
See the attached picture. The target on the right shows the situation where the standard deviation of the sample means is small but the accuracy is terrible. The target on the right shows where the standard deviation of the sample means is wide but the accuracy is good.
You keep trying to convince us that the target on the right is giving us the accuracy of the mean. It doesn’t!
He will either never understand, or refuse to understand.
“They would measure the tolerance of 16 bearings from all over the machine, average them and work out the uncertainty, and then claim they knew the tolerance of each bearing to 0.001 inches.”
That’s EXACTLY what they would do!
JM said: “Criticized is NOT “debunked.” As of today, Pat Frank’s analysis stands unrefuted.”
BOM is refuting it [1].
Lenssen et al. 2019 are refuting it [2].
Rohde et al. 2013 are refuting it [3].
Morice et al. 2020 are refuting it [4].
Brohan et al. 2006 are refuting it [5].
Haung et al. 2020 are refuting it [6].
Vose et al. 2021 are refuting it [7].
And those are only the ones I bothered mentioning. Those alone will cause this post to get moderated and delayed. Everybody is refuting Frank 2010…everybody.
Climastrologers all, who understand uncertainty no better than YOU.
Do any of these refute the assertion that the GCM’s turn into y=mx+b evaluations? If not, then they haven’t refuted anything.
TRYING to refute is not to succeed at refuting. Frank’s work stands.
Janice Moore said: “TRYING to refute is not to succeed at refuting.”
You don’t think that shoe fits on the other foot?
At any rate, where did Frank get the formula σ_avg = sqrt[N * σ^2 / (N-1)]? How do you know it’s right? Why is he combining the Folland 2001 and Hubbard 2002 uncertainties? And why did he not even attempt to propagate the instrumental uncertainty through the gridding, infilling, and spatial averaging steps? And most importantly do you really think one publication with a mere 6 citations really takes down the entirety of the field of uncertainty analysis in one fell swoop?
These questions have been answered again and again, FOOL.
Did you actually bother to read any of Frank’s answers to those criticizing his work? I’m guessing no.
Give me a break.
From just your number 2 link:
“ Since monthly temperature anomalies are strongly correlated in space, spatial interpolation methods can be used to infill sections of missing data.”
This statement alone disqualifies the entire document. Monthly temperature anomalies are *NOT* strongly correlated in space. Temperature correlation is a function of distance, terrain, geography, elevation, barometric pressure, humidity, wind, etc. Even stations as close as 20 miles apart can have different temperature and anomalies because of intervening bodies of water, being on the east side of a mountain vs the west side, having vastly different elevations, having different humidity because of intervening land use, and even the terrain under the measurement device.
My guess is that the other links suffer from the very same assumption. Every paper I have read on homgenization of temperatures and infilling data suffer from this. The assumption is made for convenience sake and not from any actual scientific fact or observation.
In other words you are just doing your typical cherry picking without actually understanding the scientific concepts being put forth and questioning their validity.
blob is in da house!
You may think debunked but you’ll have to explain why the GCM’s all end up with an output of y=mx+b that never ends. What do you think the error bars and uncertainty should be to encompass this kind of prediction?
Nobody will accept the uncertainties because they don’t meet the requirements of using the GUM when you have single measurements of different things with different devices.
The GUM is not a process control design document for Quality Control of manufacturing parts. It is designed to allow you to achieve the most accurate measurement of each individual part with the best uncertainty possible. That allows Quality folks to know when a machine or process is going out of control.
Using averages can and will cover up problems. Especially if the errors are random a process can go entirely out of control by only looking at averages. Using statistics is fine but one needs to know the assumptions and limits behind them. That is why you are having a problem convincing folks that have dealt with this of what you believe is correct. Liars figure and figures lie. Take that as your mantra and you’ll will be on the road to being a physical scientist.
You were told to use the GUM for assessing the uncertainty of MULTIPLE MEASUREMENTS OF THE SAME THING!
Yet you insist on trying to apply the GUM to multiple measurements of different things.
EVERYONE here accepts the GUM for assessing uncertainty of MULTIPLE MEASUREMENTS OF THE SAME THING!. Same for the NIST machine – as long as the measurements define a probability distribution that is identically distributed.
Stop your whining.
No I do not accept that “the NIST uncertainty machine, which uses the technique specified in the GUM document (JCGM 100:2008) produces the correct result”
You have missed the whole point of the article. All sources of uncertainty must be considered other wise it is a point less exercise.NIST know what they are doing and I am sure the calculator is robust but the onus is still on the user to use it properly with robust assumptions and intellectual honesty.
Below are two examples of NIST’s work and attention to detail re UoM, 2 numbers accompanied by 4 paragraphs of explanation.
number 2
Your own certificate says it uses the methods in the JCGM guide. The NIST uncertainty machine uses those exact same methods. How do you reconcile accepting one but not the other? Is there is a bug in the their software they aren’t telling us about? Is it not really using the methods in the JCGM guide despite advertising that it does? Or is there another reason you don’t accept it?
We are now 80 posts in on these two questions and not a single person accepts the NIST uncertainty machine or the methods and procedures described in JCGM 100:2008, JCGM 101:2008, or JCGM 102:2011. In fact, there is overwhelming resistance to it. That is over 80 years of statistical theory development just in the first order citations alone that is apparently rejected wholesale by the WUWT audience. Considering I was told that I had to use the GUM for uncertainty analysis by authors and commenters on WUWT this is a twist I never saw coming.
LIAR, does your trolling and sophistry know ANY bounds?
The argument is not “Are the equations and results correct?”
The argument is that you are misapplying them by using them on measurements that are NOT of the same thing with the same instrument.
James Schrumpf said: “The argument is that you are misapplying them by using them on measurements that are NOT of the same thing with the same instrument.”
Really? Where does the GUM say that for the measurement model Y?
And why is it literally every single example provide by the NIST uncertainty machine are of measurements of different things including one with two different temperatures?
Why does NIST TN 1900 E2 literally average multiple Tmax observations and conclude that the uncertainty of that average is σ/sqrt(n)?
8 hours later, the bgwxyz troll back at it.
What a nutter.
You’re talking about example E2. Very well. First off, there’s this:
Point 1: all of the 22 observations were made at the same site, not 1 observation at 22 different sites.
Point 2: The precision of the tmax was not improved. In fact, the initial observations were made to hundredths of a degree C, but the mean he calculated was only to tenths, as was the uncertainty.
I don’t see how any of that example supports your position.
I would like to see this example extended to determining the standard deviation of an anomaly.
You’ll also notice the lack of uncertainty analysis for the measuring device. I didn’t realize that LIG thermometers were read to one hundredths of a degree. Even then, if you notice the temps are all rounded to the nearest 0.25 value. That means the uncertainty in measurement alone is +/- 0.25. The author did not even evaluate how this affects the average calculation. It is reminiscent of how climate science is done.
How long would these LIG thermometers have to be in order to resolve 0.01°C? Seems like the answer would have to be “very”.
The very length would actually add to the uncertainty because of gravitational impacts on the liquid as well has hysteresis because of friction between the liquid and the total area of the tube! You would need at least two different sets of graduations – one for when the temp is going up and one for when it is going down and the gradations would not be linear either.
I’ve never seen an LIG with 0.01°C gradations, it would have to have a very limited temperature range, like 10°C total.
I did that this evening with an Australian GHCN station that had a lovely long unbroken seriies of observations. Turned out the standard deviation of the anomalies was the same as for the temps.
Makes sense. All an anomaly is, is y in y = mx +b where m=1 and x is the mean of the temps between 1981 and 2010.
JS said: “Point 1: all of the 22 observations were made at the same site, not 1 observation at 22 different sites.”
It doesn’t make a difference. Those 22 observations are no more or less different whether they were at the same site or different sites. And the GUM never says the measurement model Y can only ever accept input quantities X1, …, Xn that are of the same thing. In fact, nearly all of the examples out there are of measurement models where Y accepts input quantities that are different.
JS said: “Point 2: The precision of the tmax was not improved. In fact, the initial observations were made to hundredths of a degree C, but the mean he calculated was only to tenths, as was the uncertainty.”
Yep. u(T_i) does not improve with more observations. But u(T_avg) does. Specifically in this example notice that σ_Ti = 4.1 C, but σ_T_avg = 0.872 C. That is the salient point here. And if we happen to use a reasonable type B estimate of say u(T_i) = 0.5 C as opposed to the type A evaluation of 4.1 C then we get u(T_avg) = 0.5/sqrt(22) = 0.1 C assuming the T_i observations are uncorrelated.
BTW…just because the NIST instrument had a resolution of 0.01 C does not mean that u(T_i) = 0.01 C. In fact, because the observations all seem to be in multiples of 0.25 we know that u(T_i) >= 0.25 C.
As the droid loops back around and repeats the same old same old.
From the NIST document.
You said:
The item you call σ_Ti is the standard deviation of the population. What you call σ_T_avg is actually the Standard Error or more accurately, the Standard Deviation of the Sample Means (SEM).
Please note that these only apply when you have the entire population of data. NIST has chosen to make this choice.
The error NIST has made is that once you declare that you have a population, you no longer need a Sample Mean or an SEM. They have no purpose. Sampling is not done when you already know the population. It is not needed. The relation between SEM and σ is:
SEM = SD / √N,
where N = Sample Size.
Remember the SEM only tells you the interval around the estimated mean (sample mean) where the true mean may lie. If you already know the true mean, what is the purpose of calculating an SEM?
This is basic statistics for using sampling. Do you need some references?
“ In fact, nearly all of the examples out there are of measurement models where Y accepts input quantities that are different.”
The input quantities are measurands! The value of those measurands are determined through multiple measurements of those measurands.
X1 .. Xn have to be done the same way! They are determined through multiple measurements of each, and those measurements have a propagated uncertainty.
And these measurands are part of a functional relationship.
y = f(x_i)
You use measurands to determine an output which is itself an estimated measurand.
AN AVERAGE IS NOT A MEASURAND. It is *NOT* measured in any fashion, either directly or by using a functional relationship.
Let me reiterate this as well. The standard deviation of the sample means does *NOT* determine the accuracy of the mean! It only determines the interval within which the population mean might lie. There is nothing in that statistical description that implies any kind of accuracy of the mean. The accuracy of that mean *has* to be determined via propagation of the uncertainty of the individual elements in either the sample or the population.
Trying to say that the standard deviation of the sample means is the accuracy of the calculated mean is a fraud and only shows that you have no basic understanding of metrology.
Nick Stokes is now pushing this fraudulent accounting trick, see below.
JS,
Please note that in Example 2 not a single entry shows an uncertainty.
The example goes on to state:
“This so-called measurement error model (Freedman et al., 2007)may be specialized further by assuming that E1, …, Eare modeled independent random m variables with the same Gaussian distribution with mean 0 and standard deviation (. In
these circumstances, the {ti} will be like a sample from a Gaussian distribution with mean r and standard deviation ( (both unknown).”
“Assuming that the calibration uncertainty is negligible by comparison with the other uncertainty components, and that no other significant sources of uncertainty are in play, then the
common end-point of several alternative analyses is a scaled and shifted Student’s t distibution as full characterization of the uncertainty associated with r.”
These are assumptions that simply can *NOT* be made when you are using widely separated field measurement stations of unknown calibration. In essence their assumptions are such as to make this into a multiple measurement of the same thing situation. In other words they assumed their conclusion as their premise ==> circular reasoning at its finest!
And, of course, bdgwx just sucks it up as gospel because he simply doesn’t understand the basic concepts of measurement!
I think everyone agrees the Machine produces the correct results, but that’s not the issue here.
The issue is that a series of temperature observations, such as from the GHCN stations, do NOT have a measurement model y = f(x1, x2, …, xn), which is CLEARLY stated to be the sine qua non of using the machine.
IOW, there is no temp = f{station_id, date, time} function in play, which is a requirement of using the Machine.
Global Ave=(ΣwₖTₖ)/(Σwₖ)
where T are anomalies for a month and w a set of (area) weights, prescribed by geometry. k ranges over stations.
JS said: “IOW, there is no temp = f{station_id, date, time} function in play, which is a requirement of using the Machine.”
So do f(T1, …, Tn, w1, …, wn) instead where T1…Tn are temperatures and w1…wn are weights that you have already acquired by other means and enter y = (ΣwₖTₖ)/(Σwₖ) instead.
BTW…you can create your own R functions. There is nothing stopping you from creating the gettemp(station_id, date, time) function and calling it in the R expression for Y (you’ll probably need to run the software on your own machine to actually do this though). The Allende example shows how to create and call your own R functions.
“So do f(T1, …, Tn, w1, …, wn) instead where T1…Tn are temperatures and w1…wn are weights that you have already acquired by other means and enter y = (ΣwₖTₖ)/(Σwₖ) instead.”
You punted! Since the weighting would have multiple, interactive factors such as latitude, humidity, elevation, barometer, wind, terrain, geography, etc of which some are time varying how do you actually determine the weighting? If some of the terms are time varying how do you even combine them for different locations? If the barometer reading is f(t) and you have two locations f_1(t1) and f_2(t2) just how do you relate those in a functional relationship?
If it was as simple as you make it all the CGM’s would be accurate and similar.
Paragraph 1 is “no”, because what is your x that you can plug into that equation to get the temp for that station and date? The equation is not a measurement model; it doesn’t give you y for some value of x and any other parameters.
I’m not sure what the concern is here. The x’s are the inputs into the model. The function y = f(x1,…xn) operates on those inputs and produces an output just like any other measurement model. If the x’s are temperature quantities then plug in temperature values as inputs not unlike how the Thermal example handles it for T0 and T1.
I tried it with 4 inputs t1 = 15.1, t2 = 15.5, t3 = 15.6, t4 = 14.9, u(t1) = u(t2) = u(t3) = u(t4) = 0.2 where y = (t1+t2+t3+t4)/4 and t1-t4 were gaussian. The result was y = 15.275 and u(y) = 0.1. It worked perfectly when I did it.
“I’m not sure what the concern is here. The x’s are the inputs into the model. The function y = f(x1,…xn) operates on those inputs and produces an output just like any other measurement model.”
You are not calculating standard uncertainty. You are calculating an average uncertainty, not the uncertainty of the average. So is the NIST
If your formula is t1+t2+t3+t4 then the uncertainties add by root-sum-square and come out to be +/- 0.4.
If you want the average uncertainty then it becomes +/- 0.1 (0.4/4)
That is nothing more than mental masturbation.
0.1 x 4 = 0.4 ==> the actual uncertainty you would get if these were four boards laid end-to-end.
The average uncertainty is *NOT* the uncertainty of the average. The uncertainty of the average is +/- 0.4. That’s how unsure you would be of four boards laid end to end — AN ACTUAL FUNCTIONAL RELATIONSHIP of measurands being used to determine another measurand.
“If your formula is t1+t2+t3+t4 then the uncertainties add by root-sum-square and come out to be +/- 0.4.”
The formula isn’t t1+t2+t3+t4. It is (t1+t2+t3+t4)/4, exactly as he said. You keep making elementary errors like this, and it is very hard to make progress.
This happens all of the time. Some may think I’m being overly patronizing here, but I honestly think some of the posters here believe sum(X) and avg(X) are interchangeable. I’m honestly not convinced that they know there is a difference. And we’ve already seen adding terms of different units, calculating ∂f/∂x = 1 when f = avg(X), the conflation of u(ΣXi/N) with Σu(Xi)/N, and omission of crucial parathesis. And that is just this post. I’ve seen many other elementary math mistakes including the belief that Σa^2 = (Σa)^2. I just don’t even respond most of the time anymore. It’s not that I’m intending to ignore anyone. I just know that my responses to them have no effect and only causes a chain reaction of increasingly off-topic, disinformation laden content, strawman arguments, twisting and misrepresenting what I say, etc. Its not fair to Geoff Sherrington, James Schrumpf, etc. who seem genuinely interested in having a respectful discussion
Are you psychic? This is nonsense. And you are the one who goes around calling people “contrarians” because they refuse to buy into this air temperature averaging garbage.
So in other words, you are not interested in the truth, the cost of acknowledging the truth is too high.
And what a hypocrite you are, the very first comment to Geoff’s article was from YOU, whining about people not buying into your inane nonsense about uncertainty.
You are delusional!
The average uncertainty is *NOT* the uncertainty of the average. The standard deviation of the sample means is not the uncertainty of the mean.
It is truly just that simple!
It was a typo, you twit.
No, it was the basis for saying “The uncertainty of the average is +/- 0.4.”
Just bad arithmetic.
The average uncertainty is meaningless. It tells you nothing. It is mental masturbation.
Thanks for additional information and another great article for WUWT.
I looke forward to number three — this may be the best series of articles here in 2022. I have a lot of questions. I realize it’s easier to ask questions than to answer them, but I’m a lazy bum, so I ask a lot of questions.
Could you explain in simpler language how uncertainty can be estimated for infilled (guessed) numbers and adjusted numbers, both guesses of what the temperature would have been if measured correctly in the first place?
I think I understand how raw numbers can have a calculated uncertainty, but the averages use mainly, or entirely, adjusted and infilled numbers.
How can we know the BOM calculated national average temperature is the actual average of the adjustedand infilled data — maybe there is a thumb on the scale, or a fudge factor? Does anyone ever check this?
How well are repeated BOM adjustments documented, especially the homogenization, pasteurization and whetver adjustments are made?
How good is documentation of changes of weather station instrumentation and weather station locations?
Atre there overlapps of data from old and new instruments afer new instruments are installed?
Finally, how can we calculate uncertainty for the BOM employees who compile the data, knowing they were hired because they believe in a coming climate crisis, have predicted a coming climate crisis, and would like to see as much global warming as possible?
Their beliefs must bias their adjustments to increase the warming rate, even years after measurments were made — and I know that is happening — to make their predictions appear to be more accurate.
How can uncertainty be caluclated for the people compiling the data?
“How can uncertainty be caluclated for the people compiling the data?”
What level of uncertainty was there to that spelling error? How could I caluclate it? ; )
I should have wrote composting the data, not compiling!
Caluciated is not the first word I ever spelled wrong.
and will not the last. Can’t see well. Can’t type well.
None of my faults are my fault.
And my dog ate my papers.
I am also pubic school edumacated.
But thanks for reading.
I forgot to ask how fast mercury thermometers reacted to a new daily TMAX and a new daily TMIN, versus Platinum Resistance thermometry that could record 5 second highs and lows, which would increase the TMIN to TMAX daily range versus mercury thermometers.
According to Burt & Podesta 2020 the response times for mercury thermometers is 69 ± 10 s and 211 ± 44 s for ventilated and unventilated configurations respectively vs 35 ± 5 s and 95 ± 17 s for the PRT. Note that PRT instrument packages measure every 10 s with a running 60 s average of the measurements recorded. The Tmin and Tmax are selected from the sample 60 s averages.
Thank you
“The test results revealed a fourfold difference in response times between different sensors: none of the PRTs tested met the CIMO response time guideline at a ventilation speed of 1 m·s−1 assumed typical of passively ventilated thermometer shields such as Stevenson-type thermometer screens…
Although the relative “sensitivity” of meteorological thermometry was first experimentally examined almost 150 years ago (Symons, 1875), it is perhaps surprising how little recent attention has been paid within the meteorological community to determining and optimising the response times of air temperature sensors. This is despite acknowledged recognition of the importance of sensor response time on meteorological temperature measurements, particularly maximum and minimum air temperatures, and the implications of differing sensor response times within a heterogeneous meteorological network are significant. A study by Lin and Hubbard (2008) noted instrumental biases in daily maximum and minimum air temperatures and diurnal temperature range resulting from variations in sampling rates, averaging algorithms and sensor time constants (implying degradation in between-site comparisons, whether in real-time or within long-term records), and recommended that such variations be reduced as far as possible to minimise resulting uncertainties in climatological datasets. ”
This is from a 2 year old paper. Did you not realise that it justifies people’s belief that a big problem was ignored rather than meteorological institutions are on top of things?
Yes. In fact, I’ve known about Lin & Hubbard 2008 and other related publications for quite some time. It was only just recently that I learned of Burt & Podesta 2020.
bdgwx,
Having read a fair amount about response times of electronic thermometers versus in-glass, I have seen no concerns big enough for a mention in these articles. BOM have attempted thermal mass designs to match responses, but there is maybe no best design because conditions change through the day. Overall, it is better to equip with PRT and be aware of small detriments. I do feel that more reports of comparisons would be good. Forensic investigations indicate that something worth a look has happened after about 1995 at many/most stations, with a bias to unexplained higher than expected and erratic temperature patterns. See Part Three in prep. Geoff S
The real problem is trying to “adjust” prior read and recorded data to match readings with new devices. If there is any difference, the old record should be terminated and a new record with a new device ID should be started. I’m sorry the math folks won’t be able to manufacture “long” records by doing this. But it does keep the temperature records in pristine shape, rather than trying to guess what the old readings “should have been”. I can’t think of anything less scientific than that.
Agreed. The differences in response times don’t seem like they would create a big enough effect to matter much. Though, I’m certainly open to possibly if someone knows otherwise.
Assessments of uncertainty in regards to infilling can be done via bootstrapping. For example, Berkeley Earth uses jackknife resampling to asses the uncertainty [Rohde et al. 2013]. In simple terms this is a form of data denial in which the model is intentionally denied data repeatedly and compared with the other iterations and the non-denied form. Bootstrapping is more of a top-down approach to uncertainty assessments so it should be effective in identifying uncertainty in a broader sense even if individual sources of uncertainty are hard to quantify or are even unknown like might be the case with human induced biases. Unfortunately, I’m not sure how ACORN does it specifically since I’ve not studied their technique.
Bootstrapping sounds like baloney
Geoff says he will discuss bootstrapping in part 3. We’ll have to wait until then to see if he can convince the WUWT audience of its usefulness.
In this context, “the WUWT audience” means “bgwxyz”.
Bootstrapping can not reduce uncertainty. It can be used to create a more normal distribution from a sample mean distribution that is not normal. The best that can do is provide a better estimate of the sampled mean and the uncertainty in how close the sample mean is to the true mean.
It can not reduce the uncertainty of individual measurements or change the combined uncertainty. Your example of Berkley is a great example of how the SEM (and a recalculated one at that) is used as the uncertainty of measurements. Read the following from NIH.
“”The SEM is a measure of precision for an estimated population mean. SD is a measure of data variability around mean of a sample of population. Unlike SD, SEM is not a descriptive statistics and should not be used as such. However, many authors incorrectly use SEM as a descriptive statistics to summarize the variability in their data because it is less than the SD, implying incorrectly that their measurements are more precise. The SEM is correctly used only to indicate the precision of estimated mean of population.””
The first two sentences are the most important. Climate scientists ignore this with a passion! Ask yourself if the temperature measurements are samples or the entire population of all temperatures. That will give you an idea of what you are calculating!
Jim,
Yes, a widespread problem. I do not claim to be an expert metrologist, but some of these errors are school level howlers. Consequences of poor education? Geoff S
Berkeley Earth uses the supposed measurement device resolution as its first estimate of uncertainty for that instrument.
NO ONE that knows anything about metrology believes that fiction!
No, you can’t reduce uncertainty post hoc. It would be a magical world if you could, but you can’t.
Maybe I missed a conservation. Who said bootstrapping reduces uncertainty?
Your silly data torture suggestions do. You can’t “evaluate uncertainty” after the fact with statistical games, you clown, because the problem with uncertainty is *you don’t know what’s going on in the system*.
You can torture the data to make it more continuous and pretty, but you’re only adding uncertainty. And you can pretend you know what the uncertainty is based on statistical gimmicks of how the data is coerced to a specification, but that’s not real either because it’s circular reasoning. You’re declaring you know what it is because it conforms to your expectations, not what its really is.
Please go take a physical sciences class and argue with your professor on this. Embarrass yourself some more.
I didn’t invent bootstrapping or jackknife resampling. The jackknife technique was developed by Quenouille and Tukey with the bootstrap generalized later but before I was born. And as far as I know they cannot be used to reduce uncertainty; only evaluate it. Finally, I have no intention of challenging these techniques, any professor that is teaching them, or any scientist that is using them. I accept them and find them useful. And based on the fact that Geoff plans on discussing them in his next article I’m assuming Geoff accepts them too.
Everyone knows you didn’t invent anything. You’ve only memorized how to do something you were taught in R or Python or something else and you think you’re gleaning insights when all you’re doing is spouting talking points and doing busy work.
You can’t infill data without INCREASING uncertainty. You can’t create precision by simply assuming errors cancel, because you have to prove they do firstly and secondly you can’t get below the precision of the instrumentation. For many periods that means to the nearest 1°C just because of the gradations, before considering other uncertainties created by parallax errors, drift, etc.
There is no way to get an average to a precision lower than that lower bound.
CC said: “You can’t infill data without INCREASING uncertainty.”
That’s to good to hear. I’ve been trying to convince the WUWT audience of this for awhile now. Maybe you can help me convince them?
CC said: “There is no way to get an average to a precision lower than that lower bound.”
The NIST, GUM, and every other statistics text I’ve seen disagrees. They all say that for n uncorrelated quantities sharing the same standard uncertainty u the uncertainty of the average is u/sqrt(n). In fact NIST even has an example where they average several Tmax observations and assess the uncertainty of that average as s/sqrt(n) meaning that the uncertainty decreases as n increases.
Insert coin, pull string, repeat…
Give a reference for this. Even NIST can make a mistake. This sure isn’t in their calculator manual.
No clown, you simply don’t understand what they’re saying. Your N uncorrelated quantities aren’t measuring the same thing. Once you grasp this, you might be normal.
I shall remain skeptical that this is possible.
CC said: “Your N uncorrelated quantities aren’t measuring the same thing.”
That doesn’t make any difference. The only requirement specified in the GUM is that the measurement model Y be a function that outputs a value. The function can be as simple as Y = X1-X2 or so complicated that it can’t even be stated explicitly. That’s it. There is no requirement that the input quantities be of the same thing. In fact, many of the examples in the GUM and all of the examples in the NIST uncertainty machine manual are of input quantities of different things. And the GUM even says the input quantities are to be considered measurands themselves which can depend upon other quantities. This arbitrary requirement that the input quantities be of the same thing is completely made up by the contrarian posters here. Don’t take my word for it. Read the GUM and prove this for yourself.
You’re a LIAR, your “word” is trash, troll.
I have read the GUM and understand what it is about. It is obvious you have not.
These should tell you that a measurand is something that you can physically measure or calculate by using a functional relationship based on other physical measurements.
An average is neither a functional relationship nor a measurement. It is a statistical calculation describing the central value of a distribution.
An average may be used to find a “true value” from samples of several measurements of the same thing with the same device. This is based on the assumption that random errors can cancel. Generally one must prove that the distribution is normal for the assumption to hold.
If samples of an experiment are taken, the standard deviation can be used as an uncertainty, but it doesn’t replace also finding other uncertainties that may apply. Multiple experiments should use the same equipment, procedures, and conditions in order to minimize systematic errors.
The GUM is principally about a measurand. It does not address how to manipulate multiple measurements of different things into a cohesive averages to create “trends” used for assessment. Because of this, the variances used in each determination of a different mean for different lengths of time are an extremely important criteria to know.
If you want to spend your time productively, I would recommend working on the distributions used for determining GAT and propagate the variances through each calculation of various daily, monthly, annual, and global calculations.
“That doesn’t make any difference.”
It makes the ENTIRE DIFFERENCE! Y must be a FUNCTIONAL RELATIONSHIP! Other measurements are used to calculate Y.
“N” is not a measurement. Therefore an average is not a FUNCTIONAL RELATIONSHIP OF MEASUREMENTS!
“There is no requirement that the input quantities be of the same thing.”
The UNCERTAINTIES of each measurand used in the functional relationship must be individually determined. If the measurand value is found using repeated measurements of the same thing then the mean of the measurements and the standard deviation of the measurements are the appropriate estimators. If the measurand value is found using measurements of different things the the uncertainty of the total is either the direct addition of the individual uncertainties or the root-sum-square addition of them.
Not a single measurement model anywhere in the GUM states that a measurement model is an “average” of measurements of different things.
Prove us wrong. Give us a quote from the GUM!
” And the GUM even says the input quantities are to be considered measurands themselves which can depend upon other quantities.”
You just shot yourself in the foot! They are all measurands. And the value of a measurand is a MEASURMENT!
Dividing by the sqrt(n) ONLY works when you have identically distributed probability distributions, e.g. a normal distribution. In that case the true value is the average and the uncertainty is the standard deviation of the population, i.e. you divide by sqrt(N).
When you do *NOT* have an identically distributed probability distribution, and temperatures jammed together will *NOT* get you such, you cannot assume the average is the true value and the uncertainty is the standard deviation of the population!
How many times do you have to read this before it sinks in?
The uncertainty of the mean decreases, but the precision of the mean is not increased.
In my Australian GHCN station series there were 165 good observations with a mean of 11.4C and a SD of 0.8. The square root of .8/165 is 0.07, so the uncertainty of the mean is 0.07., but the mean itself hasn’t changed.
JS said: “In my Australian GHCN station series there were 165 good observations with a mean of 11.4C and a SD of 0.8. The square root of .8/165 is 0.07, so the uncertainty of the mean is 0.07., but the mean itself hasn’t changed.”
That is a type A evaluation described in GUM section 4 and NIST TN 1900 E2.
Alternatively you could follow the procedure in section 5 using the measurement model y = Σ[Tx, 1, N] / N and equation 10 and using u(Tx) = 0.1 C from pg. 32 of the BOM uncertainty analysis.
“uncertainty of the mean”
This is the standard deviation of the mean. It is *NOT* the accuracy of the mean which is what “uncertainty of the mean” *should* imply. It is why the term “uncertainty of the mean” is so misleading. It has led you down the primrose path of actually believing it tells you something about the accuracy of the mean.
Not a single element used in this calculation is the uncertainty associated with the individual elements. It is a perfect example of statisticians who have been trained out of textbooks where the data values have 100% accuracy. No “stated value +/- uncertainty”. Just “stated value”.
Your data set is 165 observations where the standard deviation is 0.8. Where is the propagation of uncertainty from the 165 observations? What *is* the uncertainty of each of the 165 observations? We know the BOM only calibrates to +/- 0.3C so there is an in-built assumption of +/- 0.3C associated with each observation. Where is this propagated?
As usual, the unstated assumption in all of this is the typical climate science assumption of “ALL UNCERTAINTY CANCELS”. Thus you are only left with stated values where you can pretend the data stated values are 100% accurate and the standard deviation is a measure of accuracy.
Just assume whatever you need to assume in order to fit your already determined conclusion!
This is NOT the uncertainty of the mean.
Not only is the precision of the mean not increased, the accuracy of the mean is not increased either.
Keep saying it, maybe, just maybe, it might sink in that averages can in no way shape or form arrive at a precision that WAS NOT MEASURED. If it could, you could measure to the nearest yard, do it enough times and get micrometer precision. It isn’t even a logical conclusion. Heck tell a master machinist that he only needs a $25 dollar sliding caliper and he can add a decimal digit with 10 measurements of different things and add two decimal digits with 100 measurements of different things.
Don’t I wish?
Notice how he comes in spurts? He argues the same BS, then is owned, then ghosts everyone and doesn’t reply. They he tries a different thread.
Trolling plain and simple.
Bootstrapping and jackknifing are both techniques to decrease the standard error of the sample means.
THEY DO NOTHING ABOUT THE UNCERTAINTY OF THE MEAN YOU CALCULATE FROM THE SAMPLE MEANS!
If we’re talking about calculating a continent-wide average temperature of Australia, I would firstly be asking –
“what the actual fvck is the point of such a useless exercise?”
The AU continent covers (in the north), tropical territory where if the heat doesn’t kill you the humidity will, through vast interior sub-tropical desert areas in the centre and west, where if you get bogged in the sand you’ll die of dehydration in day or 2 of running out of drinking water, to temperate zone shores of the Southern Ocean in the south, where the Tasmanian locals grow hair all over like Chewbacca to stay warm, and then in the east to the high mountain plains areas that have more area under snow in winter than Switzerland does.
So in all this, ordinary taxpaying punters are supposed to find it useful in their lives to know to hundredths of 1 degree C what the average temperature was calculated for over the whole Great Southern Land for any particular day?
“what the actual fvck is the point of such a useless exercise?”
You used the “F” word you naughty boy!
How else to justify Nut Zero policies unless there is CAGW that Nut Zero is supposed to prevent? There is no CAGW, but you need an average temperature to know that.
+1 degree C. doesn’t mean much:
If in the colder months and mainly TMIN, +1 degree C. is good news?
If in the warmer months and mainly TMAX + 1 degree C, is bad news?
“AUSTRALIA’S COLDER-THAN-AVERAGE WINTERAustralia’s Bureau of Meteorology isn’t a reliable source, their ‘warm-mongering’ forecasts are routinely proven wrong.
This is because the agency appears hellbent on inflating temperatures. They do this by, 1) ignoring the Urban Heat Island effect, and 2) limiting the minimum temperature readings that certain weather stations can reach — artificially boosting the averages.
Still, despite the ‘inflating’, despite the BoM claiming that Australia experienced a warmer-than-average month of August, the continent’s winter ‘officially’ finished -0.03C below than the multidecadal norm (below the 1991-2020 base, to be exact).”
SOURCE:
Australia’s Colder-Than-Average Winter; Iceland’s “Historically Cold” Summer; + Antarctica Plunges To -80.5C (-112.9F) – Electroverse
See graph with latest UAH result as well. Geoff S
http://www.geoffstuff.com/uahsep2022.jpg
Not quite. alarmists demand use of an anomaly. Sort of an average on steroids.
An anomaly derived from calculating a daily/monthly/annual average for current period and subtracting a historical base period 30 year average from the current average.
Of course, adjusting temperatures that comprise the base period and separately adjusting the current period temperatures destroys any possible value an “average” provides.
That is, before including any alleged meter² area “average” beyond individual thermometers.
Various governmental meteorological entities believe rural/urban different altitudes, geography, leeward/windward are all interchangeable.
Once these vastly different temperature sources are aggregated, governmental meteorological entities believe that individual site/instrument differences average out… Systemic errors.
The use of anomalies is defended by saying it allows comparing different locations based on temperature differences. The problem is that the anomalies lose data. A 1C difference in Fairbanks, AK is far different, at least as far as climate is concerned, than a 1C difference in Miami. If the average in Fairbanks is 15C and in Miami is 20C then a 1C difference is 7% for Fairbanks and 5% for Miami, a significant difference that is hidden by using anomalies.
If the GAT is meant to give an expectation for what is happening to the global climate then it fails miserably! The uncertainty as to what it implies for any specific location on the globe is so large as to make it meaningless.
Richard, it’s not just the measured uncertainties, it’s the unknown unknowns that also affect the accuracy of the data. This discussion and the Australian BOM cover several sources of measurement uncertainties but the nonrandom measurement biases are not well addressed. These include the performance of the screen, site changes (more buildings or parking lots, change of ground cover, changes in prevailing winds, etc.) and other issues that are called “lab bias” because we didn’t know why we got a different answer from another researcher when our precision and uncertainty were supposedly known. Certainly the author has a valid point that the adjustments to the raw data of up to 4C need to be explained by the BOM and their uncertainty included in the total.
Richard,
Re stats for guesses. IMO, which is worth nothing, there are no valid measurement uncertainties for guesses because guesses are not measurements that can be traced to a primary reference.
Raw numbers. See Part Three in prep re whether raw is raw.
Several colleagues and I try to see if station data are validly compiled to a correct national data set of the type that Goddard and Hadley use for global averages. It is hard to monitor complexities like area weighting factors. See Part Three for how annual data compiled in 1995 agrees with ditto in 2021.
Metadata and adjustment theory are miserable. Strong objections from me to detecting break points in time series by statistics alone, when supporting metadata are absent. Objections to using adjustments based on neighbour stations hundreds of km distant.
Overlaps. BOM reports quote WMO references re desirability of several years of overlap data after a change. The actual data are seldom shown in public documents. No idea what is done.
I have no specific complaints about BOM people re science affected by beliefs. All of us have belief effects. Geoff S
But THEY get to do the infilling, adjusting and compiling the national average temperature and you don’t
Geoff, a few points:
LIG thermometers—giving them ±0.5C total uncertainty is optimistic if they are graduated in integer degrees (not to mention operator reading issues).
It is important to remember that the GUM is not intended to be the final word on uncertainty; instead it documents the standard way to express uncertainty as the title indicates.
Platinum RTDs have additional uncertainty if the standard ASTM or ISO manufacturing tolerance curves are utilized instead of calibrating the temperature coefficient of resistance curves of individual RTDs.
And for RTDs it gets even worse. You aren’t measuring resistance directly either, but a voltage drop across the RTD. This drop is directly related to the resistance AND the source voltage, so you have to take into account the uncertainty of the source voltage as well. On top of that, there’s the resolution of the Analog to Digital Converter (ADC). Depending on how many bits wide it is you can only resolve down to a certain minimal voltage. For example, a 10-bit ADC (typical), can resolve a 0-5 volt value to about .488 millivolts. You then have to figure out how this translates to temperature, across the entire curve. It is a big mistake to assume that these uncertainties are all random.
And if it is a 2-, 3-, or 4-wire RTD makes a big difference in the uncertainty.
MC,
I opine personally from reading public material that BOM are aware of technical performance of such instruments and have done a good job. Geoff S
Yes, hopefully they are competent enough to correctly handle basic measurement uncertainty like these.
Climate change is the new religion, given that you have to believe, in the dearth of evidence, that man is causing it and what’s more, in a dangerous way.
Ask any devout man or woman of faith whether they have any uncertainty and they will most likely answer no.
Do they [BOM, Met Office, NOAA etc etc] believe they are wrong?
No.
fretslider,
Part Three in prep might stimulate some views on such questions, but be aware that they are not scientific issues. Geoff S
There’s a problem with the “Tmin After AWS” data. This histogram shows bimodal data, this is a red flag that there is a measurement problem which needs further investigation. Further, the “Tmin After” histogram is clearly not a normal distribution, thus will violate the Central Limit Theorem.
A better plot would be a combination of Tmin Before & Tmin After to show the similarity or lack thereof betwixt the Before & After histograms.
Central Limit Theorem??? I’ve poor knowledge of numerical theory, but we’re evaluating a machine’s (+operator’s) uncertainty, which IMHO should be error bars: +/-X, +X/-Y, or even mX+b.
If we were measuring several times a second, and have some large set of measurements for one condition … yes, we could use the Central Limit Theorem to justify reporting the average of that set. However, our data would still be subject to the error bars. I don’t see how CLT can be used across a range of conditions. CLT can justify adding digits but not precision. If your tool measures +/-0.5C, and CLT finds the mean value is X.XX, you can say X.XX +/-0.5C, but only on a set of data for one condition.
Lil-Mike said: “If we were measuring several times a second, and have some large set of measurements for one condition … yes, we could use the Central Limit Theorem to justify reporting the average of that set.”
The CTL is not used to justify any particular measurement model Y whether it be an average or otherwise. What the CTL says is that even if the input quantities Xi are not normally distributed the output quantity of Y will be approximately normal.
Lil-Mike said: “CLT can justify adding digits but not precision. If your tool measures +/-0.5C, and CLT finds the mean value is X.XX, you can say X.XX +/-0.5C, but only on a set of data for one condition.”
That’s not what the CTL says. What the CTL says is that even if that ±0.5 C figure were not a normal distribution then the measurement model Y computing the average will be approximately normal given enough input quantities Xi. Furthermore, starting with the measurement model Y = Σ[c_i * x_i, 1, N] as stated in section G.2.1 and letting c = c_i = 1/N for all c_i and σ = σ(x_i) for all x_i then we have σ^2(Y) = Σ[c^2*σ^2, 1, N] = Σ[(1/N)^2*σ^2, 1, N] = N * (1/N)^2 * σ^2 = σ^2 / N therefore σ(Y) = σ/sqrt(N) as noted in G.2.1. Notice that when c = 1/N the measurement model Y is none other than an average of the input quantities Xi.
Alternatively you can use the “general law of error propagation” in section E.3 or one of the idealized forms like equation 10 in section 5 to solve for the uncertainty of any measurement model Y whether it be an average or something significantly more complex.
The world’s foremost expert on uncertainty pontificates again, y’all better listen up.
bdgwx aims to learn about many subjects until he knows nothing about everything
An “average” is not a MEASUREMENT MODEL. An average is a statistical calculation of the data in a distribution. Find a reference that says an average is a measurement model and provide it. I want to read it!
Jim,
Quite so. Geoff S
It is when Y = Σ[(1/N)*x_i, 1, N].
Bullshite from a bullshiter.
Show a reference that uses an average of different things and using different devices.
Your “function” general format is pretty. Now tell us what assumptions must be met to use it when calculating uncertainty in multiple temperature stations. Is it a functional MEASUREMENT relationship?
This is *NOT* a measurement model. It is a statistical calculation.
The *MEASUREMENT* model is y = Σx_i.
The uncertainty of that model is u^2(y) = Σ u^2(x_i)
“N” is *NOT* a measurement. Measurements have uncertainty. What uncertainty does “N” have?
Average, standard deviation, variance, kurtosis, median, range, etc are all STATISTICAL DESCRIPTORS of a population, or more specifically a probability distribution. The are *not* measurements!
OK you palookas, I’ve taken this matter into my own hands. I just now wrote an email to the POC for the Machine at NIST.gov, in which I explained both sides of the argument here, and asked him to simply tell us who’s right — or if we’re both right, or both wrong.
As soon as I hear something back, I’ll post it here.
James,
If you have my email address or you ask Mods for it, I would be pleased to join in with this query and answer. Geoff S
Sadly, the email was rejected as “forbidden.” Don’t understand why an email address to a non-classified government email system would be so rudely treated, but maybe Mr. Passolo is no longer with the Institute.
I’ll keep looking for a contact.
What a shame!
According to the UM manual, there must be a function f{X0. . .Xn} that gives y.
Y = Σ[(1/N)*x_i, 1, N] does not do that.
Given that TMAX on 5 May 2020 was 13.6 at the NIST station, define the function f{date, location} that gives ùs 13.6.
JS said: “Given that TMAX on 5 May 2020 was 13.6 at the NIST station, define the function f{date, location} that gives ùs 13.6.”
y = f(t1) = t1
let t1 = 13.6
Alternatively you can define your own R function to look the temperature up.
y = f(date, location) = gettemp(date, location)
let date = 2020-05-05 and location = NIST in Washington D.C.
It’s probably easier to just acquire the inputs manually and plug them in directly into your measurement model.
First, you just violated Significant Digit rules by adding a decimal point.
This was used in a chemistry course at Washington Univ. at St. Louis but is no longer on the web. You can find other similar things on the web.
“”Significant Figures: The number of digits used to express a measured or calculated quantity.
By using significant figures, we can show how precise a number is. If we express a number beyond the place to which we have actually measured (and are therefore certain of), we compromise the integrity of what this number is representing. It is important after learning and understanding significant figures to use them properly throughout your scientific career.””
Note that uncertainty limits should not be called error bars because they are not the same.
Jim,
Yes, fundamental to understanding. See the discussion of Pat Frank’s propagation of uncertainty in general climate models. In climate research circles there is much ignorance of basic concepts like accuracy, precision, error, uncertainty. I have no idea how to fix this other than by repeated airing, with examples. Sometimes this feels as futile as kicking treacle. Geoff S
You are correct. What this shows is that mathematicians have been used in lieu of learning the necessary metrological knowledge of how to deal with measurements. I have yet to see a math text used in a math curriculum that discusses measurements as a REAL thing and how to treat them.
When I went to school one of the civil engineering classes taught us how to survey property lines and road paths. Until you actually do measurements with old analog compasses, measuring sticks, etc. you don’t realize how far off you can be in just one mile. Errors and uncertainty compound and there is nothing you can do to fix it with mathematics. In my electrical engineering curriculum you had to learn about tolerances/sensitivity/economics of different devices. It was simple to specify 1% resistors but their costs would ruin a project when 10% and the right design would suffice.
Jim,
Our surveying graduate son has reminded me of closure many times over the years. Surveying errors can cause land boundary disputes involving many $$$. There are.documented regulations for reporting surveying measurements. You can be dismissed from the profession for failing to comply. Geoff S
Way back in the day, I used to be the “Tax Mapper” for five counties in West Virginia, in which job I would take the surveyor’s measurements and plot them on the County Assessor’s big plat books to show how the land was subdivided.
After a while I got to recognize certain names and match them with the competence of their work. Some guys could hand in bearings and distances that closed up no matter how complicated a parcel was marked out, while one guy couldn’t plot a square without the lines crossing twice.
It was a very interesting job.
That’s hilarious!
If I may, Mr. Sherrington, I’m inserting here the youtube video of Pat Frank’s discussion of the propagation of uncertainty. This thread (along with your fine article) is an excellent place to learn, but, it is sadly being so polluted by bdgwx that a non-technical person may become confused (and that is, no doubt, precisely, what bdgwx is trying to do). This video might help such a reader understand.
Janice,
Dr Frank and I have occasionally corresponded over the years. Apart from his awareness-raising paper on GCM error propagation, he has a few other papers on uncertainty and error. I asked him before I wrote my essays if he would like to be involved, but have had no response. I hope that he is not in ill health.
A major contribution from Pat id his stressing of the differences between error and uncertainty. It is a distinction that can get complex and arguable, so I was asking him if he could provide a concise few words.
Before I finish Part Three of this essay series, I hope I can contact him.
Geoff S
The before and afters have another problem I do not know if the data changed from in-glass to PRT on the date I assumed. It is described by BOM as the date on which Automatic Weather station with PRT became the primary instrument. Have queried BOM, no answer yet. More on distributions coming in Part Three. Geoff S
@Geoff says: – ” There are some known sources of uncertainty that are not covered, or perhaps not covered adequately, in these BOM tables. One of the largest sources is triggered by a change in the site of the screen. ”
— A little food for thought.
Over land we (still) have an average of 38W/m² evaporation / over water it`s 100W/m².
During a drought, evaporation tends to zero with every day without rain.
! A feedback amount of theoretical 38-100W/m² due to missing evaporation,
but if that is not enough for you to explain thermal runaway over land areas,
you can consult the clouds.
Without or less evaporation –> no or less clouds – keeping an average of -19W/m² away from the surface. – A terrible deadly vicious circle for nature.
Huge amounts of solar energy are used on the earth’s surface but also within the troposphere for the non-temperature-increasing process of evaporation. Even if this energy is released again through condensation in the atmosphere – it is an energy transport in the right direction to space. The production of clouds during condensation also increases the cooling efficiency of water.
The more intensive this dissipating water cycle takes place, the cooler the temperature structure of air and soil will be.
In order to prevent rising temperatures over land areas, there is no alternative to an adequate supply of water, because without available water only increasing, (BOM meassured) sensible heat and LW radiation are available as surface cooling.
A humanity that has drained land areas for thousands of years and thus today continues to have a steadily reducing influence on the evaporation rate on 1/3 of the land area – is surprised today that it has one foot in the desert and the other in hell – and is confronted with increasing record temperatures, droughts and crop failures.
Using the reduced rH by 1% over land…
I calculated a ~10% lack of absolute water content and evaporation over land accumulated by land use change and closed stomata of vegetation of ~ 6800 km³ within the last 50-75 years alone. This can explain a warming effect of +3.5W/m² on the land surface – but also the spreading global desertification.
HUMANITY AND ESPECIALLY THE IPCC IS AS STUPID AS MY BREAD IN THE KITCHEN CUPBOARD.
The stupidity of the IPCC can now even be proven relatively “watertight”.
In their graphs on cooling & warming causes of climate factors, the cooling factor “land-use reflectance / irrigation” < (~ -0.125°C) appears for the first time in the history of the IPCC in 2020/8.
Thus, the albedo is assessed over urban areas such as cities (~1.5 million km²) and global agriculture (~48 million km²).
Global irrigation/y is ~2600 km³, with only ~1000 km³ going to evaporation and 1600 km³ lost to surface or underground runoff.
If the IPCC now for the first time ascribes a cooling effect to the “additional evaporation” via irrigation, that is correct…
but – WHERE IS THE WARMING EFFECT ??? that people continue to exert through “additional drainage, sealing…etc.
“??? INNER LOGIC OF IPCC GRAPH — NOT AVAILABLE !!!
These 3.5W/m² are attributed “generously” to CO2 emissions by the IPCC,
AND SO THAT THE STUPIDITY WILL NOT DECREASE…
…continuously rising (record) temperatures have been (BOM) measured over land for decades and blindly ascribe their increase to global warming caused by GHE & climate gases –
but the only thing missing is evaporation on land, water or intelligent water management.
BTW — With 6800 km³/y of water, plants can absorb about 25-50 Gt CO2 through photosynthesis (1m³ evapotranspiration = 3.7 – 7.4Kg CO2 absorption).
More than mankind produces annually.
There would have been no appreciable climate change since 1750 due to the burning of fossil fuels if mankind had not ironed down millions of km² of moors, (rain) forests, wetlands and vegetation so mercilessly stupidly.
The GEB / land+9L/m² graph show how we can cool the land surface by evaporation of ~1250 km³ Water – and cool the earth by clouds.
https://climateprotectionhardware.wordpress.com/
The IPCC starts with the dangerous manmade global warming conclusion and works backwards to justify it. Conclusion first. Supporting data later. Since 1988.
The CONCLUSION that Earth is a water-cooled planet could have been known 35 years ago. The fact that hardly any water can evaporate from sealed urban areas and fire-cleared forests is hopefully not a secret that the IPCC discovered just yesterday with research worth millions.
Otherwise I could get the idea that theoretically I alone could save the climate and planet much better all by myself and for free if I had the necessary position of power to do so.
It is striking that the most serious problems of global warming are linked to water (drought, floods, sea level rise,…), although I have no doubts about the GHE and the effects of climate gases.
Is it an artist’s job (I`m not a climate scientist) to point out to the IPCC that there are strategic, global concepts that reduce sea level rise while at the same time providing global regions with improved protection against drought and floods?
If these comparatively inexpensive measures, in addition to many other advantages, then also ensure improved cloud albedo and CO2 absorption in the regions, – the sea levels, the earth’s temperature and climate gas concentrations are mainly related to the volume of water that is held back over the continents annually to intensify the water cycle and evaporation.
But that doesn’t interest an IPCC. My impression is that they are more concerned with questions like:
How static can the IPCC’s mind be… and how many 5-star chefs does it take to spoil the broth?
macias,
What is more, in simple terms, it was known a long time ago that in the way these temperatures are measured, rainfall cools. You can regress temperature against rainfall on various scales of aggregation and find a useful statistically valid relationship. You can arrive at a “rainfall-corrected” temperature that is likely to be more meaningful that the raw temperature for a number of applications.
Colleague Dr Bill Johnstone has numerous examples on his blog bomwatch.com.au
Bill uses such regressions to assist in discovery of temperature outliers and features like break points. Rainf as a factor in temperature variation can explain 30% or so of the variation. There are good physics reasons to do this.
Geoff S
The classic central limit theorem (CLT) requires independently and identically distributed measurements, which is not going to apply in this case. It will be especially problematic when there are changes in siting, instruments, reflectance of the housing unit as a result of aging, changes in the environment surrounding the measurement equipment (growth of vegetation, urbanization, etc.) and many other systematic time-varying changes. There are various generalizations of the CLT, discussed for example here
https://www.datasciencecentral.com/central-limit-theorem-for-non-independent-random-variables/
Again, however, it would seem none of these could handle jumps in distributions over time.
What you have brought up is when to declare data not fit for use. The mathematicians working in climate science want make up as many long records as they can. What is great for mathematicians is that they can artificially “reduce” spurious trends by having LONG records that cover the majority of the time they are looking at. To do so they generate new information to replace recorded data and to infill places with no data. This isn’t science. If a move or new instrument gives disjointed data the old record should be stopped and a new one started. Too bad if that messes with statistical analysis reliability.
I’ve asked several times for someone to mention a scientific endeavor that allows playing fast and loose with carefully measured and recorded data. No answer to date.
Tycho Brahe they ain’t!
Peter Hartley,
it would be good if WIWT had a system where the importance of an author making a comment could be rapidly accessed and taken into account. I thank you for joining in with your experience. The question of whether one can depart from IID and by how far is central to my theme and I do not have an answer yet. It continues to concern me that where there are suitable cases like parallel instruments and system overlaps, the statistical method often shows agreement within the tighter estimates of uncertainty. Something seems to be working ok. That is why I treat bootstrapping in Part Three in prep.
Geoff S
You can’t discuss uncertainty without first discussing Significant Digits. These are explicit rules in science that were created so that the information contained in a measurement is not artificially inflated. Basically the measurement with the least resolution controls calculations. If your measurements are integer, then the final answer must also be shown as integers.
Statistics can not be used to increase the resolution of measurements. Sig Figs follow through to samples, means, standard deviations etc. This confuses many folks when calculating an average. You can’t carry an average out to 9 decimal digits even though your calculator can give you that many. When I have samples that are integers, the sample means must be integers. Variances and standard deviations are the same.
bgwxyz claims over and over and over that the impossible is possible.
When I have samples that are integers, the sample means must be integers. Variances and standard deviations are the same.
So if I toss a coin and count heads then that’s an integer. So for 100 tosses the average number of heads per toss must be an integer? Is that what you’re saying. Seems rubbish to me.
Coin tossing is not a measurement.
Significant digits is old fogey science
New science is more digits = more believable
Anything less than two digits is malarkey and baloney
Three digits is real science
Get with the program
+1.5 degrees C, is a tipping point?
ha ha ha, don’t make me laugh
+1.495 degrees C. is a tipping point?
Head for the hills — that’s real science!
Amen. Praise Mann! All hail the great tropical heat plume. May the wet bulb always be in your favor. Drum circle is at 7.
” It has to be almost total reliance on the CLT. If so,
is this reliance justified?”
People here have weird ideas about the Central Limit Theorem and what it says. It actually says that, with various caveats, that when random variables are linearly combined, the distribution of the result tends to normality. Very interesting, but usually not relevant.
What is relevant here is Bernoulli’s Law of Large Numbers, which is nowadays just basic stats. It says that sample averages converge to the mean, but along the way says that the variance of the sample mean diminishes roughly at sqrt(N), N=number of samples.
But what is important is what this truly means! The statement that you make, “says that sample averages converge to the mean,”, only makes the definition that the sample mean becomes an ESTIMATE of the population mean. The standard deviation of those sample means only tells the interval within which the population mean may lay which is not the accuracy nor precision nor resolution of the actual temperature values! You also never quote what the population σ or σ^2 values are.
Nothing about any of this statistical analysis deals with sig figs or uncertainty, only with finding the mean of a bunch of numbers that prior to 1980 only had a resolution of integer values. Even the conversion of °F to °C introduces artificial uncertainty because the values obtained often have more resolution than the measured °C temperatures did.
NIST TN 1900 (which is based on the procedures defined in the GUM) happens to be consistent with that conclusion. That is the uncertainty of the average diminishes with sqrt(N) for uncorrelated inputs. More relevant to this article they have an example where they assess the uncertainty of the average of several Tmax observations using a type A evaluation as u(T) = s/sqrt(m) where s is the standard deviation of the observations and m is the number of observations. Tangentially related to this article they have an example where they average the results of different infilling strategies (R based quadratic regression via locfit, ordinary kriging via intamap, thin plate regression via mgcv, and gaussian process model via LatticeKrig). The example is for the mass fraction of Uranium (mg/kg), but could obviously be applied to any scalar field like temperature or whatever.
Round 256 of the same old clown show.
When do the lions and tigers enter the tent?
NIST TN 1900 gives “an answer” what it means is in the eye of the beholder without proper analysis it means nothing.
Just to confirm:
You disagree with the uncertainty assessment of a sample of Tmax observations being u(Tmax_avg) = s/sqrt(m) = 0.872 C in example E2. Yes/No?
Spammer.
Are those TMAX measurements taken simultaneously at one weather station, or are they single TMAX measurements at multiple stations.
The answer makes the difference as to whether the equation is actually applicable.
No. They are not taken simultaneously. They are observations taken on different days at the NIST site in Washington D.C.
But at the same site.
With the same measurement system (hopefully).
Yep, same site, but different time. It is an example of averaging different things and assessing the uncertainty of that average as σ/sqrt(n). That is directly applicable to what BOM did in their report.
If I can convince you that NIST and the GUM are correct on this point and that their methods and procedures are valid then I can probably convince you that it doesn’t matter what site or what instrument made the measurements. They’re all still measurements of different things regardless.
They’re measurements taken at that site. When you average those measurements you can say “This is the average of the temperatures taken over these days at this site.”
At this site.
When all the temps from around the world are taken and averaged, for what location is an average derived? The Earth?
When we look at daily temperatures and see thousands of temps for “01/09/2022”, how many of them were on opposite sides of the International Date Line and were actually measured on the 10th instead of the 9th, or whatever? Should that be accounted for?
How many were in different hemispheres with different seasons? How many summer/winter temps were split into different years because of using calendar days for easy display instead using equinoxes and soltices dates. Try and find a paper that has examined these things.
If TOBS introduced a bias does anyone think these may do the same?
JS said: “When all the temps from around the world are taken and averaged, for what location is an average derived? The Earth?”
I have no idea. But it’s moot because no one does it that way. In other words, no one takes all the observations and does a trivial average. All that would be is the average of the observations which no one cares about and which IMHO has no locality.
What they do is a spatial average of a grid mesh. The measurement model in that case is Y = Σ[w_i*c_i, 1, N] / Σ[w_i, 1, N] where c_i is the value of a grid cell, w_i is the weight of the grid cell, and N is the number grid cells. Each grid cell has its own uncertainty u(c_i) that contributes to the combined uncertainty u(Y). Anyway, this measurement model computes the temperature of the grid. If the grid covers Australia then Y is the temperature of Australia. If the grid covers the Earth then Y is the temperature of Earth and so on.
JS said: “When we look at daily temperatures and see thousands of temps for “01/09/2022”, how many of them were on opposite sides of the International Date Line and were actually measured on the 10th instead of the 9th, or whatever? Should that be accounted for?”
Short answer…yes. However, that is a very complex topic that I’d rather save for another time. For now understand that in the measurement model Y above each input quantity to the model has its own uncertainty u(c_i) that contributes to the combined uncertainty u(Y) of the measurement model Y and that each input quantity c_i is itself dependent on its own measurement model Yc with its own input quantities that have their own uncertainties as well. The topic you mention here effects the uncertainties much deeper inside the measurement model hierarchy. The salient point is that given u(c_i) you can compute u(Y). And if each c_i has correlation r(c_i, c_j) = 0 (uncorrelated) and the same uncertainty u(c) then u(Y) = u(c)/sqrt(N). In reality the grid cells will have some correlation and the uncertainty will be different from cell to cell so u(Y) ends up being more difficult to compute.
So you’ve just admitted that a blind average doesn’t represent what you trendologists do to the data!
Then covered it all up with lots of hand-waved word salad.
If you just assume that all uncertainty cancels or that the uncertainty is the standard deviation of the sample means then of what use are the u(c_i)?
That’s the problem with this whole mess! If this were being done by an engineer designing trusses for a bridge and he decided that all of the uncertainty values for the individual beams cancelled or that the standard deviation of the sample means for torsional strength, compression strength, and shear strength was the uncertainty for all of the beams – HE WOULD BE SUED INTO BANKRUPTCY if anyone found out.
But for temperatures? Who gets held accountable for such idiotic practices?
I’ve already given you this once. See the attached graphic. The standard deviation of the sample means can be very precise, I.e. the standard deviation of the sample means is small. But the accuracy can be very, very poor.
The only way to know the accuracy is to actually propagate the uncertainties of the individual elements into the calculated average. That is *NOT* the same thing as the standard deviation of the sample means!
You are ignoring that variances add from daily avarages all the way thru. Don’t try to snow people that you are not averaging. You may be doing some funky stuff with weighting but in the end you are averaging, a lot.
JS gave you the correct answer!
The σ/sqrt(n) factor is a statistical metric showing how close the standard deviation of the sample means is to the actual population average. It is *NOT* the uncertainty of each of the sample means or of the population mean!
Again, the BOM only calibrates their measurement devices to somewhere between +/- 0.3C and +/- 0.5C. Yet these uncertainties are *NOT* propagated into their calculations.
You keep saying: “That is the uncertainty of the average diminishes with sqrt(N) for uncorrelated inputs. “
That is *NOT* the uncertainty of the average! It is the interval in which the population average can lie. It says *NOTHING* about the uncertainty of that average. Do you have to be shown the graphic about precision and accuracy for the umpteenth time in order to differentiate between the two? You can calculate the average with infinite precision and *still* get a totally inaccurate value! You can only determine the accuracy of the average by propagating the uncertainty of the individual elements in the data set. Period. Exclamation Point!
In fact, if you actually read and understand the assumptions laid out in the example they assumed this was *NOT* a measurement of different things. Their assumptions turned this into multiple measurements of the same thing! They had to do that in order to use the GUM procedures!
from the start of the NIST documernt.
And, let’s look at the end of E3.
The bolded points are significant. Why do we never see these computed and quoted when GAT figures are quoted. They do carry forward through the GAT computations. They will make the numbers in the GAT (one hundredths) look extremely poor.
Even just quoting the standard uncertainty of 0.872 means you would need to have anomalies like 0.02 +/- 0.872 which is a meaningless number.
I also must disagree with one point in this example. The author made the assumption that this is an entire population. This allows the computation of standard error by dividing the population standard deviation by √m. This is probably an ok assumption for the temps at a single location.
It is not ok when computing a global average from samples, i.e. stations. Why? Because each of the stations is a random variable consisting of a sample from the entire temperatures of the earth. That makes them samples. From that, the standard error (SEM) is the standard deviation of the sample means and it is not divided by the √N.
You appear to be the math whiz. Why don’t you calculate the standard deviation of the temperatures used to create anomalies and tell us what that is. Then we can see exactly what these small numbers truly mean.
“That is the uncertainty of the average diminishes with sqrt(N) for uncorrelated inputs.”
This is one more example of the false impression the term “standard error of the mean” can leave. This is *NOT* the uncertainty of the mean. It is the standard deviation of the sample means! It is *NOT* the uncertainty of the mean calculated from the sample means. Each of those sample means will have uncertainty propagated onto it from the members of the sample. When the sample means are averaged those uncertainties *must* be propagated onto the average. THAT gives you the uncertainty of the mean, not the standard deviation of the means. A small standard deviation of the sample means only implies that the sample means converge to a value. It does *NOT* tell you anything about the uncertainty of that calculated value.
Your document includes an example directly related to temperature measurements.
Do you understand the problem with this example? (hint: is there ONE daily maximum temp for a month?)
Example E2:
“Exhibit 2 lists and depicts the values of the daily maximum temperature that were observed on twenty-two (non-consecutive) days of the month of May, 2012, using a traditional mercury-in-glass “maximum” thermometer located in the Stevenson shelter in the NIST campus that lies closest to interstate highway I-270.”
“The average t = 25.59 ◦C of these readings is a commonly used estimate of the daily maximum temperature r during that month.” (bolding mine , tg)
“If Ei denotes the combined result of such effects, then ti = r +Ei where Ei denotes a random variable with mean 0, for i = 1,… , m, where m = 22 denotes the number of days in which the thermometer was read. This so-called measurement error model (Freedman et al., 2007) may be specialized further by assuming that E1, . . . , Em are modeled independent random variables with the same Gaussian distribution with mean 0 and standard deviation (. In these circumstances, the {ti} will be like a sample from a Gaussian distribution with mean r and standard deviation ( (both unknown). “
I can answer for the troll:
“bu-bu-bu-but the NIST, the GUM!!”
ROFL!
“This is one more example of the false impression the term “standard error of the mean” can leave. This is *NOT* the uncertainty of the mean.”
So what is the uncertainty of the mean?
You don’t know?
If I have a measurement of 25 ±0.5 °C as a data point and sample it, it will go into a random variable with the same measurement and uncertainty. When the mean for that sample is calculated the uncertainty must be propagated into the sample mean value. Let’s assume the sample mean comes out to be 25±0.5 again. Now when you find the estimated mean of all the sample means, it will have a “value ± μ” which is a measurement uncertainty when μ has been propagated properly. Then you can calculate the SEM (Standard Deviation of the sample Mean), which by the way, has an uncertainty also due to the measurement uncertainty of the data used to calculate it.
The uncertainty of the mean is the uncertainty propagated from the individual elements. There are several methods to do this, either direct addition or root-sum-square addition. You can use either individual values or relative values. The uncertainties of divisors do *not* divide the uncertainties of the numerators, they add to the total uncertainty.
This is all standard practice in metrology but apparently not in climate science. The standard deviation of the sample means only tells you the interval in which the mean can lie, it tells you *nothing* about how accurate that calculated mean is, that can only be determined by propagating the uncertainty of the individual elements.
The average uncertainty is *NOT* the uncertainty of the average. The average uncertainty is totally useless.
If you are calculating the average value from the sample or the population, e.g.
T_avg = Σ(T_i) / N
the uncertainty of the average is:
u^2(T_avg) = Σ u^2(T_i) + u^2(N)
The uncertainty of the average is *NOT* Σu^2(T_i/N)
Since u(N) = 0 because it is a constant
u^2(T_avg) = Σu^2(T_i)
Consider the equation m = kX/g, the mass needed to get a specified extension from a spring with a spring constant of k. The uncertainty of m (mass) is *not* u^2(k/g) + u^2(X/g). The uncertainty of the mass is u^2(k) + u^2(X) + u^2(g). Dividing the uncertainty by sqrt(g) (gravity) simply makes no physical sense. The same thing applies to determining the uncertainty of the average.
This becomes a root-sum-square increase in uncertainty. The more elements you add to the data set the greater the uncertainty becomes. Again, this is the only thing that makes physical sense.
The *ONLY* way around this is to have multiple measurements of the same thing which creates an iid and symmetrical distribution where the uncertainty can be assumed to cancel and the average is the true value. Since temperature data is *NOT* multiple measurements of the same thing you cannot assume an iid symmetric distribution and should use the root-sum-square method which assumes *some* cancellation.
This is *all* right out of the GUM. The NIST machine gives no option for using a non-iid, non-symmetric distribution. It provides no input for kurtosis or skewness. If you cannot input those values then you can’t use the NIST machine on a data set of temperatures. Of course if you have a non-iid, symmetric distribution then the use of the mean and standard deviation is pretty much meaningless. You should use the 5-number statistical description: minimum, first quartile, median, third quartile, and maximum. Please note the absence of mean and standard deviation in that statistical description. Lacking that then skewness and kurtosis should be specified and appropriate measures used on the distribution.
Nick,
I have tried to quote extracts that, with various caveats, say much the same as you do about CLT.
For the Last of Large Numbers, I am puzzled why I cannot find a mention of it in the GUM. Any thoughts on why?
As for all readers, you are welcomed to give an answer to the question that started the email exchange with BOM. (Separation of 2 routine daily temperatures that are credibly different.)
All of us develop preferred personal views about what is credible, based on experience and other factors. I have tried to avoid my preferences in this article so far, but I hope that readers will add their particular emphases.
If you were appointed to an investigation into the relevant BOM estimates of uncertainty, what would be your main input? Cheers. Geoff S
Trendologists like Nick have vested interests in declaring temperature uncertainties are as small as possible.
Looks like the trendologists have enlisted their pack of downvoters; all they can do because they are completely BANKRUPT with regard to measurement uncertainty.
CM,
Best to omit personal comments,Nick has contributed excellently to WUWT many times. Sparring is one thing, insults are another, not too good.
Geoff S
bgw has hijacked your excellent article by repeating this lie that averaging reduces measurement uncertainty, more than a dozen times. Nick then shows up and repeats the same nonsense. My patience wears thin.
Steve Mosher of BEST told me once their product is not an “average anomaly” or an “estimated anomaly”, but a predicted anomaly.
Nick just wants to ignore the caveats … nothing to see here move along 🙂
Geoff,
“I cannot find a mention of it in the GUM. Any thoughts on why?”
As I said, the notions behind the Law of Large Numbers are now just basic statistics; in particular, the arithmetic consequences of the additivity of variance. Hence the appearance of sqrt(N) in uncertainty of a mean, which I am sure you will find in GUM.
“Separation of 2 routine daily temperatures that are credibly different”
You need to be careful here in saying what you mean. Your laboring of measurement error is appropriate if they were supposed to be the same – ie same time and place. If not, then you need to ask, what are we actually testing for? Is there a reason to expect them to be the same? If you find one day that it was 25C in Sydney and 26C in Perth, are they statistically different? Why would it matter?
Generally it doesn’t; testing of some kind of average anomaly is more likely to matter than testing station readings.
Uncertainty is NOT error!
The sample size is ONE!
Thankyou, Nick,
Can we discuss the context of Melbourne Olympic Park, next summer, probably accessible to both of us, when BOM does a Press Release to claim that a (hypothetical) new daily temperature maximum has been set for all Januaries, beating the previous record set after year 2000 by (choose your own, or use 0.05C). Add any caveats you wish, but end up with a number for uncertainty.
Then repeat for an older record say in 1930s before mertication, PRT, small shelters, the 1970s image shift but within the range of rounding from all T in whole numbers, for the old Melbourne Regional station.
I do not find this an exciting exercise, but it has didactic value, your coop appreciated. Geoff S
Geoff,
I think this comes back to the question I asked earlier – why are you looking for statistical significance here? A statistical test basically asks – could number x have been drawn from distribution P? The null hypothesis is that it could; a significant result says that it couldn’t, and you can draw some inference from that. But saying that a temperature is a record doesn’t have that notion attached to it. It is just a hottest day; you are not trying to establish that it is a day that must belong to somewhere else. There isn’t really a significance test that you can usefully apply.
We have this argument every time there is a hottest year ever. People here want to argue that it isn’t really, unless the increment is statistically significant. We had that when 2014 was just a little warmer than 2010. You can’t say it is statistically significant, but what does that mean? 2010 was the hottest? You could have the temperature creeping up year by year, with each increment not “significant”, so you could never say there was a hottest year. In fact the only sensible thing to do is to take the hottest reading for what it is. That is, unless you are prepared to specify a distribution, and an inference that you can make if it is too improbable that the reading could come from that distribution.
Why do you throw away and ignore the distributions? But I guess they are meaningless anyway if the old data has been through the chop shop.
You’re claiming to know what is unknowable. If two readings are 1°C apart but we know factoring in systematic error that the readings have an uncertainty of ±2°C, I can’t say they’re different. They’re not hotter or colder or equal. It’s unknown. Claiming they are just proves you failed freshman year.
Statistical descriptions are useful with a probability distribution.
Since when did temperature measurements of different things using different devices from different locations become a probability distribution?
If I jam temperature measurements from Denver, Chicago, St. Louis, Boston, Miami, and Fairbanks together in a data set does that data somehow form a probability distribution? Does the average value of those temperatures give me *any* kind of an expectation of what temperature I will see in San Diego?
If each of those individual temperatures represent a measurement with an uncertainty and you treat that uncertainty the same way you do variance of a random variable then the uncertainties add. How many do you have to add before the uncertainty masks any difference you are trying to identify?
Even using (Tmax + Tmin)/2 as a mid-range value is an attempt to find an average of two temperatures. Does that average provide you with *any* kind of expectation as to what Tmax and Tmin temperature will be tomorrow? A week from now? A month from now? If it doesn’t then how can it be a probability distribution that is described by an average and a standard deviation?
The entire concept of GAT is an edifice built upon sand. It is looked at as an average of a probability distribution that is actually not a probability distribution at all! There is simply no way to say that if it is 80F in Miami then the probability that it is 80F in Chicago is p(x). The two are not connected in any way through a probability distribution.
No probability distribution ==> no statistical description ==> no mean, no standard deviation.
Nick,
You asked inter alia “If not, then you need to ask, what are we actually testing for? Is there a reason to expect them to be the same? If you find one day that it was 25C in Sydney and 26C in Perth, are they statistically different? Why would it matter?”
You should not ask me that. You should ask BOM. They are the people who use this method of informing the public.
Personally, I feel that people who are advanced users of math and stats have a duty, an obligation, to inform BOM that they are using a method of expression that has questionable validity and that they should stop doing it.
If they do not stop doing it, the Person on the Fulham Bus might start to believe that they are doing it as a type of propaganda, when their charter is more the correct spreading of proper science.
“They are the people who use this method of informing the public.”
I think you are complaining that they don’t. BoM just tells you that it was 25C in Sydney, 26C in Perth. They don’t invite you to make any inference beyond that. If you want to talk about statistical significance, you are talking about what inferences that you can or can’t make. And you have to start by saying what they might be.
“If you want to talk about statistical significance, you are talking about what inferences that you can or can’t make”
If you can’t make inferences then the GAT is meaningless. Pick your poison – is the GAT meaningful or not?
GS said: “For the Last of Large Numbers, I am puzzled why I cannot find a mention of it in the GUM. Any thoughts on why?”
I believe it is because they build upon on the law of propagation of uncertainty instead. A derivation is provided in section E.
Not one of you climate clowns has ever had to do a real uncertainty propagation calculation where reputations and money were at stake, this is painfully obvious.
Two points none of you will address:
First, the sample size of a time series measurement is always exactly ONE.
Second, the absurdity of the uncertainty going to ZERO as the number of points in these averages increases.
I think it is unforgiveable that no one is capable of giving a standard deviation or variance along with the calculations of how the variances were added together from the daily average to the GAT. Scientists should be jumping up and down screaming for these two things. A mean/average is meaningless unless you know these two items. You tell me the monthly GAT is 0.02, my first question is what is the variance so I will know the spread of the data.
My guess is that calculating that would entirely spoil the GAT anomaly because the standard deviation/variance is far beyond these values.
Of course they don’t, it would reveal how the king has no clothes. Especially considering each GAT point has multiple variances that are just dropped.
PSSST, Nitpick, for measurements of time series (i.e.TEMPERATURE), the number of samples is always exactly equal to ONE.
More of your usual sophistry.
The classic version of this theorem also requires sequence of independent and identically distributed random variables with the additional condition that they have the same expected value. It will nt help you in this situation.
He doesn’t care; he’s cherry picking to get the answer he wants, just like his disciples on WUWT do.
You don’t seem to read. What I said of the CLT was:
“It actually says that, with various caveats, that when random variables are linearly combined, the distribution of the result tends to normality. Very interesting, but usually not relevant.
What is relevant here is Bernoulli’s…”
“What is relevant here is Bernoulli’s:”
The law of large numbers requires samples taken from similar populations. Peter Hartley already told you this earlier.
You need identically distributed, randomly generated variables.
MULTIPLE MEASUREMENTS OF DIFFERENT THINGS USING DIFFERENT DEVICES DOES NOT MEET THIS REQUIRMENT FOR USING THE LAW OF LARGE NUMBERS!
The entire CAGW crowd uses two incorrect assumptions in building their meme:
You simply cannot jam multiple measurements of different things together and expect them to be statistically similar to multiple measurements of the same thing.
Wikepedia:
“In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and tends to become closer to the expected value as more trials are performed.[1]”
Britannica:
“law of large numbers, in statistics, the theorem that, as the number of identically distributed, randomly generated variables increases, their sample mean (average) approaches their theoretical mean.”
Now all you need to do is work thru and convince us the caveats don’t apply … always love how you just dismiss those like they are nothing 🙂
I’m not the one claiming to apply the CLT.
No, the claims about CLT came from BOM.
The context I got was that of treating many different systematic effects as a random variable when taken together. However, the part of the reason why the long term measurement statistic uncertainty is lower than the typical measurement uncertainty is due to the LoLN or using the approach in the GUM the law of propagation of uncertainty as well.
Same old clownish garbage, this is a LIE.
The LoLN *ONLY* applies to multiple measurements of the same thing insofar as establishing a true value. It does *NOT* apply to multiple measurements of different things.
Wikepedia: “In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and tends to become closer to the expected value as more trials are performed.”
Wolfram: “A “law of large numbers” is one of several theorems expressing the idea that as the number of trials of a random process increases, the percentage difference between the expected and actual values goes to zero.”
These are all talking about measuring the SAME THING. Why you insist on trying to apply this to measurements of different things is just amazing.
You’re not sampling the same thing you insufferable jackass. The temperature is always changing, and each one of those measurements has systematic error.
Variance of the sample mean is not the measurement uncertainty of the mean. This has been explained to you dozens of times. You clowns think there’s some kind of immaculate conception of error if you simply increase N. All you’re doing with each addition of a new measurement (not sampling from fixed set of a known thing btw) is presuming each measurement has no uncertainty to it at all. Then you’re calculating the presumed expected variance at which the sample mean (ignoring all uncertainty) would converge to the population mean (ignoring all uncertainty). Except you clowns aren’t sampling from measurements of the same thing as temperature is constantly changing. There’s not even a population of “average temperature” of any place on the planet, because an average is an aggregation itself, not a measurement.
“All you’re doing with each addition of a new measurement (not sampling from fixed set of a known thing btw) is presuming each measurement has no uncertainty to it at all.”
Not at all. The presumption is that the variance of each new number is added to the cumulative sum of variances, so the result increases approx proportionally to N. But when you average, you scale the values by 1/N², so trhe combined variance increases by 1/N. The standard error is the square root of that, so 1/sqrt(N).
No need to be snooty about “explained to you”. This is standard, basic statistics, as per GUM or any stats text. You people are the outliers.
Stokes goes for the snark now.G
The standard error IS NOT reduced by dividing the standard deviation of the sample mean standard deviation as you have stated. That would only be true if you were dealing with an entire population. If you believe the temperatures you have are the entire population, then there is little reason to infill or homogenize.
What you are dealing with are samples. Therefore, the standard deviation of the sample means average IS the standard error. You do not divide it by the sqrt(N) to get an even smaller number.
By the way, “N” IS NOT the number of samples. It IS the size of the samples. That is, the number of entries in each sample. So for annual averages, the “N” is 12, and NOT the number of stations.
Anyone who believes you can add precision to measurements through averaging has NEVER dealt with real world contracts that require accurate depictions of measurements. You will never find an engineer or quality control person that would allow this to be done when accepting renumeration for providing a product with a given tolerance. Promoting a measurement that is beyond the precision of what was actually measured is an ethical violation. Statistics CAN NOT add decimal places, i.e., precision to actual measurements.
Your standard basic statistics guide how random sampling helps hone in a population mean. It has nothing to do with the uncertainties of each of the N in the population and in the sample. And it therefore has nothing to do with the uncertainty of the population mean OR the sample mean.
You’re perhaps intentionally still confusing variance/standard error of the sample average with uncertainty of the average and therefore not contributing to the conversation.
You lose. Trust at some point in your life that you can learn from others, or you will continue to be the way you are.
++1000
That’s exactly what the law of propagation of uncertainty as derived in section E of the GUM implies. That is when you scale the input quantities by 1/N you also scale the variance by 1/N as well. And thus the standard uncertainty scales with 1/sqrt(N) for uncorrelated inputs.
So the uncertainty goes to zero as N increases is the official climastrology position?
Still absurd.
Are you a chat bot by any chance? You sound like one.
Better tell the folks at NASA. No reason building a bigger dish telescope. Just take 1,000,000,000,000,000,000 pictures of the same thing and you can reduce uncertainty to however small you want.
+100!
Give us an exact reference in the GUM. Also, give us your interpretation of what is being described.
My reading of Annex E, if that is what you are referring to, is that it only applies to “random effects and from corrections for systematic effects in exactly the same way in the evaluation of the uncertainty of the result of a measurement.”.
I have attached an image of what I found in Annex E.
Look at the note carefully. See where it says the requirement of normality is one reason for the separation of the components of uncertainty derived from repeated observations. Repeated observations are multiple measurements.
Now look at the highlight at the bottom. As we have discussed before, Section 4 deals with a single measurand that may be made up of several measurements. That applies here also.
You really need to learn how to find and understand the restrictions in the GUM. Primarily, it deals with a single measurand and not how to combine measurements of different things. Functional relationships are used to calculate a single measurand and may consist of various components. It deals with things like P=nRT/V where you may take several repeated measurements of the same process to determine a single value. It does not deal with measuring the pressure of a tire on one car then measuring the pressure of another tire on another car and finding an average and calculating the uncertainty of the average.
I keep asking bdgwx but never get an answer.
The mass to get a specific extension on a spring is
m = kX/g
where m is mass, k is the spring constant, X is the extension, and g is gravity.
Is the uncertainty of the mass needed equal to
u^2(m) = u^2(k/g) = u^2(X/g)
Or is u^2(m) = u^2(k) + u^2(X) + u^2(g)
Why can’t I get an answer? Does gravity somehow lessen the uncertainty of “m” since it is a divisor?
“But when you average, you scale the values by 1/N², so trhe combined variance increases by 1/N. The standard error is the square root of that, so 1/sqrt(N).”
Can’t you read? An average is not a measurement! You do *NOT* scale the measurement values!
What you are doing is finding an AVERAGE UNCERTAINTY! I.e. adding up all the uncertainties, which can be different for each measurement, and then dividing by n in order to get an average uncertainty. That is *NOT* the uncertainty of the mean! It is an artificial number that, when multiplied by n gives you back the total uncertainty!
If you have multiple measurements OF THE SAME THING, that form a normal distribution, then the true value can be considered to the be average value of the multiple measurements and the uncertainty of that true value is the standard deviation of the of the normal distribution – which involves dividing by the sqrt(N).
But you have to SHOW that the multiple measurements of the same thing actually *is* a normal distribution for this to apply. There are situations where it doesn’t apply. Situations like systematic bias that changes due to component drift. Situations like where the measuring device faces wear over multiple measurements. Situations where you have significant hysteresis involved.
And when you have multiple measurements of different things the assumption *has* to be that you will not have a normal distribution of measurement values. I.e. measuring six different journals on a used crankshaft will *NOT* give you a normal distribution of measurements in most cases. Neither will jamming different temperature measurements from different locations into the same data set. If you do not have a normal distribution of measurements then the uncertainties will not cancel and you *must* add them, either directly or by root-sum-square. And it doesn’t matter how many crankshafts and how many journals you measure from a large number of different types of engine (six cylinder, eight cylinder, four cylinder, one cylinder, etc., you still won’t get a normal distribution of measurements. And without a normal distribution of measurements the standard statistical descriptions of “mean” and “standard deviation” simply don’t apply.
For a non-symmetrical distribution you should be using the 5-number description of the population: minimum, first quartile, median, third quartile, and maximum. Please note that a GAT wouldn’t even be included, just the median!
“Can’t you read? An average is not a measurement! You do *NOT* scale the measurement values!”
Can’t you do basic arithmetic? Of course you scale them. The average of N values is
A=(Σxₖ)/N=Σ(xₖ/N)
You scale the numbers by 1/N and add them. And so you scale the variances by 1/N² and add them.
For the umpteenth time, an average is *NOT* a measurement.
You just admitted that for the function m = kX/g that the uncertainties are u^2(k), u^2(X), and u^2(g).
The average is no different.
The uncertainties are u^2(x1), u^2(x2), …, u^2(xN), u^2(N)
If the uncertainties in the mass equation are not scaled by dividing by g then the uncertainties in the average equation are not scaled by N.
Your ability to say whatever you need to say at the time is duly noted. It’s why you can’t be consistent and your credibility on the subject is zero!
The most egregious abuse of uncertainty is NOAA assertion that one year is hotter than others based on 0.01C +/-0.05C difference
A great indicator of what training they have had. They obviously do not understand what an uncertainty interval or even a standard deviation actually is.
Is it any wonder NASA can’t get a rocket off the ground because of bad line fittings?
They both should be audited by professional statisticians
Can that be lobbied for? I think the Republicans would really want to get around that. Have GAO and CBO required to hire independent statisticians to examine all claims and methods.
I might add that this is how we get confirmation bias when scientists do a statistical audit of their own data
I’m pretty much convinced Artemis will blow up.
This is certainly separating the statisticians from the mathematicians…
Old Cocky,
…. and from those who measure objects.
Quite so
The difference is becoming even more pronounced…
Shameful BS.
A) Truly random errors are unusual. Most of what they term “random” are systemic or methodological errors.
B) Truly random errors, like flipping a coin, are expected to average out over extended periods. Systemic or methodological errors are unlikely to average out, instead they aggregate.
Great job Geoff! All very sensible, unlike the various government meteorological boondoggles.
ATheoK,
Kind comments are appreciated, thankyou. In my experience of measurement, yes, truly random errors that sum to zero are not common, but take a long and often complex set of experiments and time to demonstrate. There are some lessons I still have to learn from attempts to measure ocean level change from satellite platforms. I doubt that bodies like NASA would have started on that if they disbelieved the large numbers law concepts. OTOH, I suspect that the difference between tide gauges and satellite numbers arises from a lack of accurate quantification of a variable or several, leading to a bias. In other words, some of Nick’s various caveats that do not cancel to zero. Geoff S.
Oceans are a low viscosity fluid clinging to the surface of an 8,000 MPH rough sphere of spinning mass with a substantial iron core.
A surface skim affected by Earth and external gravitational forces as the water tries to keep up with different spin rates in every latitude.
This is before accounting for water displacement and winds, including storms, trade winds and huge air masses of low/high barometric pressures?
Yet, the world’s various meteorological entities believe they can average, again, unique highly variable individual tide stations into a reasonable country or global average?
And these meteorological entities insist on combining physical tidal measurements with satellite data?
All, to spout +/- mm tide measurements where mostly satellite data has multiple centimeter range error bounds just with satellite wavelength transmission and reception?
Your suggestion that this multitude of complex influences lead to bias under the careful abuse of meteorological entities is spot on.
I once read about NASA’s isostasy studies. NASA imaged the Earth as a misshapen geoid where Earth’s surface points are all at different radiuses from Earth’s center. Many radii distances are variant by miles…
Keep up the great diligent methodological work!
This all begs the question of whether or not temperature measurements even form a probability distribution amenable to statistical description.
Statistical descriptions are meant to give an expectation for the next data point that will be a part of the population. I.e. the mean and the standard deviation are meant to tell you where the majority of the data lies and imply the expectation that the next data point will fall into an interval around the mean as defined by the standard deviation.
Temperature measurements from different locations simply do not form a probability distribution that can give you an expectation as to what the next data point (i.e. temperature) will be. Jamming temps from Miami, Fairbanks, Charleston, Chicago, Seattle, Lincoln (NE), Cleveland, and Boston together and calculating an average will *NOT* give you any expectation as to what the average temp in San Diego will be. There is no probability function that connects any of this together.
If jamming temps together into a data set does not define a probability distribution applicable to any and all specific locations then the GAT makes no sense at all, either physically or metaphysically!
“Jamming temps from Miami, Fairbanks…”
Indeed. That is why, as I point out endlessly, anomalies are used. They have an expected value of zero, except for the effect of climate change.
You betray your knowledge of measurements. Anomalies are based on measured temperatures that have measurement uncertainties. Those measurement uncertainties propagate into the anomaly values.
Here is where the rubber hits the road. If I have temperatures with a resolution of integers which have a measurement uncertainty of ±0.5° how in the world do you get anomalies of ±0.01? That resolution doesn’t get increased by subtracting a baseline temperature which is based on temperatures with the same resolution.
I’ll include my favorite qoute:
Like it or not, this is what you are trying to do. There are many lab notes from physical science courses at universities that say the same thing. Use Google to find them and see why they all say the same.
And, as I keep pointing out, anomalies are a joke. The variance of temperatures in winter are wider than summer temperature variance. So the anomalies will be wider in winter than in summer. Where is the weighting to take care of this?
Where is the weighting to take care of the anomaly difference between San Diego and Romana? Between Springfield and Chicago/
What do anomalies tell you about climate anyway? Climate is the totality of the temperature profile. You can have the same anomaly in Fairbanks, AK as in Hays, KS yet the climates are totally different. So what does the anomalies actually tell you? You can’t tell from the anomaly if max temps are going up or if minimum temps are going up. Yet all the CAGW advocates say the Earth is going to become a cinder and the politicians follow right along in their wake.
What anout natural variation?
Didn’t you also say the baseline used in your calculations was a global baseline, and not a baseline from each station’s measurements?
In that event, no anomaly could have an expected value of zero.
“Didn’t you also say the baseline used in your calculations was a global baseline, and not a baseline from each station’s measurements?”
No. I have said, over and over, that you form anomalies by calculating a baseline average for each station, and subtracting that from that station’s temperatures. Then you can think about combining with other stations.
Who cares except climastrologers? Its all still meaningless.
And do you carry forward the standard deviation? Subtracting a constant should not change the original standard deviation, it should remain.
What is the uncertainty of your baseline? Is your baseline anomaly 100% accurate?
Nick,
Can I assume that you were tired when you wrote this?
If you want your assertion to stick, you have to show that the expectation of ‘climate change’ is not zero. I await a proof of that.
Geoff S
I said the expected value of the anomaly is zero, except where there is climate change.
My goof.
Should be 8,000 miles diameter.
I still love the cover picture.
Michael,
This was me when I was much older, with a 46 lb barramundi, caught by light line and lure in a fresh water billabong near Jabiru, Northern Territory. I did not catch this one. My best was just half that weight.
http://www.geoffstuff.com/barra.jpg
Of course you were much older. Since that photo was taken, BOM has re-calculated your age at that date and you were, it turns out, only 8 years old. My, my, how RAPIDLY your age increased since human CO2 emissions increased. CAWG (Catastrophic Aging and Weight Gain) is REAL!!!!
Thank you, so much, for sharing your careful, expert, analysis and for taking the time to reply to commenters (including some comments which, I think, should not be dignified with a reply).
Janice,
Let us hope that you will also like the coming Part Three. Thank you for your kind words, always appreciated. Geoff S
There’s lots of talk about the statistical analysis of the temperature data sets, but the NOAA GHCN Monthly data set is a train wreck as far as I’m concerned.
NOAA proudly states there are over 27,000 stations in the GHCN Monthly data set, and while that’s true, it’s not the whole truth and nothing but the truth.
Over half of the stations haven’t reported any monthly average temps since before 2022. I have the entire ghcnm.tavg.v4.0.1.20220905.qcu.dat file from 6 September loaded into a database — a total of 17,330,280 records for 27,767 stations. A simple query to get the latest year for which each station has a record, and then to count the stations where that most recent year is 2021 or less returns the startling figure of 15,809 stations.
That’s over half of the GHCN stations reporting no data for the last year. The next question is where are those 11,958 stations that are reporting?
6,879 are in the US.
949 are in Canada.
607 are in Australia.
584 are in Russia.
217 are in Sweden.
214 are in China
That’s 9,450 stations in six countries, leaving 2,508 stations scattered across the rest of the world. There are more in the US than in the rest of the world combined.
It staggers the imagination to wonder how a global anomaly to three decimal points could ever come out of a network in such a condition.
A few days ago shen I looked at the Australian stations from a search of what went into GISS, there was a map of Aust with about 100 red dots. Click a dot for details. Long term quality stations like Sydney Observatory and Melbourne were absent and some station names were minor or new to me. Are we seeing a related event? Geoff S
Geoff,
Every day I update the plot of GHCN V4 +ERSST stations on a Google Earth style globe here. You can make it show any month since 1900. It actually shows a triangular mesh with shaded infilling, with the color correct for temperature at each measurement (you can click). Here is a snapshot of Australia in August 2022 (Data is still coming in every night EST)
Nick,
Thanks for the comments and map.
The map troubles me greatly. Compare the picture over land with over sea. Over land, there are a couple of nodes near to or at Nhulunbuy and Katherine that are in simple but inaccurate terms, showing too hot, while half way down the WA border there is a node that is showing too cold. The colours are too full of feature that looks artificial, compared to the smooth look over water., as if you might be using anomaly data influenced by choice of reference period.
Cack around 1978 we flew a very large airborne radiometric survey over about 1/3 of Iran. Thousands of maps were generated, perhaps in the same way that you are generating yours using triangles. You get used to the feel of what the maps are telling you and at first blush I would say that you are not using enough data points to approach a picture that shows proper features. Your features look artificial, generated by data shortage and software.
This is, as I said, first blush and I may be unaware of a significant feature that is misleading me. Geoff S
Geoff,
“compared to the smooth look over water”
I think that is just reality. Sea temperatures are more uniform than land, in both space and time. The weather that perturbs them is generated in the air, but the thermal inertia of the water smooths that out. It is certainly a strong effect, if you look at the whole globe.
“I would say that you are not using enough data points
to approach a picture that shows proper features”
Well, I use what is available in GHCN V4. And as I showed elsewhere, I can use a lot less and still get almost the same global average. But yes, there are artificial features, caused by doing linear interpolation on each triangle separately. The resulting approximating surface is continuous, but with discontinuous derivative. For better graphics, I now use Poisson equation solution. But the answer for the global average is the same. I have stuck with the triangle method here because it does ensure exact colors at the measurement points and is easier to explain.
Nick,
“Sea temperatures are more uniform than land, in both space and time.”
Should your map then need to show the variability in time and space of each weather effect? Some weather effects that can affect temperature over land are smaller than the separation of your nodes and so they either disappear or are incorporated into a subtle bump in the shading between distant nodes.
From the Iranian radiometric survey maps I mentioned, your method would be excellent at removing the very feature we were paid to find, namely one or several data points that were higher in gamma count than those nearby. (Detection of such points was a hard part of the software to get exact, especially as we wrote in machine language or sometimes binary).
Our stated aim was to find discrete areas with high radiometric count. What is the primary stated purpose for your maps?
Geoff,
“Some weather effects that can affect temperature over
land are smaller than the separation of your nodes and
so they either disappear…”
They disappear, as intended, through the monthly averaging. The map doesn’t tell you that a cold front passed different points at different times. But it is influenced by the number of cold days in each place.
“What is the primary stated purpose for your maps?”
In this case, to indicate where it was warmer, and where cooler, than usual in a given month. But I included it here to show where the measurements were taken in Aug 22 (with more to come).
“ the thermal inertia of the water smooths that out.”
Does UAH measure sea temperature or atmospheric temperature?
Unbelievable! Two downchecks for merely asking a question.
They don’t like you asking questions.
How much averaging, “infilling”, “homogenization” and massaging was needed to create this pretty graph?
As much as it takes to keep a straight face when lying.
There is no homogenization. These are raw data.
The infilling is just linear interpolation within each triangle. The colors are accurate at the stations.
That is all.
just linear interpolation
of a few data points over tens of thousands of square km.
lol
Just inventing data based on a presumed slope function and pretending that data is any good.
“The infilling is just linear interpolation within each triangle.”
And what weighting was used for terrain, elevation, and geography (e.g. adjacent surface water) while doing your “linear” interpolation?
I can quite figure out what the surface area is for each triangle. What is the length of each side?
“And what weighting was used for terrain, elevation, and geography…”
For goodness’ sake, these are anomalies. Terrain, elevation and geography don’t change. They were present in the anomaly base values, which are subtracted out.
“What is the length of each side?”
The distance between the relevant nodes. The mesh is in fact the convex hull of the measurement points on the sphere.
In other words distance is no, geography, terrain, elevation, wind, humidity, land use, etc is of no consequence.
That’s about what I expected.
“For goodness’ sake, these are anomalies. Terrain, elevation and geography don’t change. They were present in the anomaly base values, which are subtracted out.”
MALARKY!
Being on the west side of a mountain and being on the east side can *certainly* impact anomaly values. So can being near a body of water in one case and not in another (See the temperature anomalies in San Diego vs Ramona in CA).
The anomalies on top of Pikes Peak are *NOT* the same as the anomalies seen in Colorado Springs!
This issue is comparing anomalies from one place with another! If you try to infill or homogenize Ramona anomalies by using San Diego anomalies or by using Pikes Peak anomalies for Colorado Springs you are contaminating the entire data set!
Anomalies for a downtown measuring site with significant UHI will be different than the anomalies seen in a rural area 50 miles away where the station is surrounded by corn and soybeans. Trying to infill or homogenize one with the other only contaminates the data set.
It’s why the use of anomalies is such a pile of garbage. The intentions are good but the implementation is atrocious!
Absolutely not one word in rebuttal. Just down checks. Why am I not surprised?
Your shading is utter nonsense and represents nothing but a hypothesis.
Here’s a Google Earth image with all of the Australian stations in the GHCN Monthly file for August with continuous data from 1960 to 2022. There are a few gaps here and there, but these 92 stations are good. Note the spacing. The center of the country is a large open area with stations up to 2800 km apart.
Nick,
Nick Stokes
Reply to
Carlo, Monte
September 8, 2022 2:20 pm
Because I have done a Monte Carlo.
That reference you gave leads to these words, in an example for calculation of pi.
“There are two important considerations:
Is there not some parallelism of concept here? Does this not suggest that simplification in generating your map increases the uncertainty?
Geoff S
Geoff,
First, there are various things you can hope to get from a Monte Carlo. One is the mean value, another is the variance, or other measure of variability.
I have focused on the second, although the original application of that culling was to show that the mean was unbiased as stations were culled. Wiki is talking about the first, getting an estimate of pi from whenever dots fall within the circle. Since the expected probability there is based on a uniform distribution, then necessarily the modelling should be uniform. There is no such geometric issue with the global temperature averaging under culling. There might be if you restricted to a specific region like Australia, and allowed points to also fall into the ocean..
NOAA’s data is a collection of data from meteorological agencies around the world, so the answer is probably “Yes.”
“It staggers the imagination to wonder how a global anomaly to three decimal points could ever come out of a network in such a condition.”
Well, I do it every month. I did it with GHCN V3, which had only about 1800 stations currently reporting, but V3 selected them to be reasonably uniformly spread over countries (where possible). That was because of bandwidth constraints which now don’t exist. So V4 just put in all the stations available. It doesn’t matter that the US is way over covered; as long as you have enough in scarcer parts. And you don’t need that many. I did a study here where I took random subsets of all stations (sea and land) of diminishing size, to see how much the selection of stations affected the outcome. Here is a graph of the results for Jan 2014:
The spaghetti shows different random subdivision pathways; the temperatures are difference between what the subsets gave and what the full set gave. You can cull to 500 stations (for whole globe) and still have results within a 0.1 C range.
Just because one can apply a calculation to a set of measurements doesn’t mean it’s correct to do so. That’s the real problem here; the calculations show you what you think you should see, so they’re correct.
BTW, where are the other two zip files for TempLS? I only see the one on its page.
^^^1000
“BTW, where are the other two zip files for TempLS?”
Could you be more specific, please? Which page are you looking at?
https://moyhu.blogspot.com/2019/04/description-and-code-of-templs-v4.html
Thanks, for that. Looks like I just forgot to put in the actual link (I’ll fix); the files are there. The library file is here
https://s3-us-west-1.amazonaws.com/www.moyhu.org/TempLS/V4/Moyhupack.zip
For the data file just replace Moyhupack.zip by LSdata.zip
Thanks, Nick.
Bullshite.
And yet, still ignoring physical uncertainty. You can torture the data until it confesses, but it won’t tell you the truth or the unknowable.
That’s pretty cool. That is a data denial experiment where you resample without replacement. I wonder how this graph would look if you used the bootstrapping approach where instead of resampling without replacement you resampled with replacement by selecting stations randomly while keeping N=4759 (stations get duplicated in the resample). You could also try the jackknife technique where you deny TempLS a subset of stations (say 1/8th) and repeat with a different and independent subset on each iteration until all stations are incorporated (8 times for 1/8th denial) like what is described in Rohde et al. 2013.
You do realize that N = sample size, not the total random variables you are sampling, right?
If you are going to use a sample size of 4759, why bother “sampling”?
You also recognize that each station annual average is a random variable, i.e., a sample of the entire population of temperatures from the entire earth with a sample size of 12, I hope!
Chew on that for awhile!
I don’t think sampling with replacement can work. The effect of duplication is deliberately countered by the area weighting. If you include a station twice, it doesn’t get any more weight as a result, because the duplicate stations share the same area.
On jackknifing, the culling I use is designed to optimise coverage at each stage. At each stage, I remove stations preferentially from well covered areas, randomising somewhat within that intent. I’m not sure how that would go with jackknifing.
Incredible, climastrologists actual believe dry-labbing data has meaning.
Oh, yeah, I didn’t think about the area weighting. That would definitely be problematic.
“0.1 C range”
So what does this mean when the uncertainty of the measurements is +/- 0.5C?
If you included the uncertainty limits on your graph you couldn’t see most of it!
All that clearly shows is that you have the uncertainty wrong. These are 100 random repetitions. They just don’t have anything like the uncertainty you ascribe to them. In fact, the point of the method is to demonstrate the uncertainty. How much can they actually vary.
How do you know?
Because I have done a Monte Carlo. 100 sequences with random variation, That tells you the variability. That is what useful Monte Carlos are for.
Only if your model is a reflection of reality, otherwise it is a useless waste of floating-point number cycles.
Stupid ad hominem noted, Nitpick Nick.
All the algorithm does is randomly select subsets of stations with reasonable coverage properties. Focusing on subsets of 500, say, it tells you that all those different subsets, with different locations and different instances of measurement error, give answers within a range of 0.1C. You can’t get a better quantification of uncertainty.
Um, error is not uncertainty. No mention of propagation calculation.
You are big on saying what uncertainty is not. But you never seem to be able to say what it is. Or how you think it should be calculated.
Monte Carlo is the direct way of calculating. You let the inputs vary over their range of uncertainty, and look at the range of outputs.
To borrow a quote from NIST upthread (h/t Jim Gorman)
“”Section 12, beginning on Page 27, shows how the NIST Uncertainty Machine may also be used to produce the elements needed for a Monte Carlo evaluation of uncertainty for a multivariate measurand:””
All you have to do is read—uncertainty is what you don’t know, and can’t know, error is the difference between a measurement and the true value.
True values cannot be known.
mfpmpppffmpp mfffmm fppmfpmpppffmpp pppmfffmppfmmffmmfpmp fppmffpmfpmf mpfmffpppmpm fmmppfppmmpp pffmmmpppmpmppfppm mppfpfmmmppmpfmpmfmpp ppfmpf mmm fmppfffmfmpp fpmmmmpmffmfmpp mmmpppmpm mmmfmmfmmmpppfffmp fmpmfpmmmfmp fmppfffmfmpp fpmmmmpmffmfmppfmm mmfmmmppp mmpmpp pmppppppffppppp
“You let the inputs vary over their range of uncertainty, and look at the range of outputs.”
This requires you to know the probability density function for the uncertainty – and since the uncertainty is an interval where the true value is unknown how do you assign a pdf to it?
Cherry picking things which you know nothing about.
BTW, where are those standard deviation calculations from daily averages right on thru to a GAT. Have you forgotten how to do them?
No, you don’t get an UNCERTAINTY of 0.1C. You get a standard deviation of the sample means of 0.1C. You simply do not know what the accuracy (i.e. the uncertainty) is until you propagate all the uncertainties of the individual elements.
Monte Carlo sequences are really only applicable when the input quantities can have their probability density functions defined.
Since each individual input in a data set is its own independent input you have to be able to define a probability density function for each individual temperature.
Since the uncertainty interval for each measurement is *NOT* a probability density function the use of a Monte Carlo simulation is not truly applicable.
From JDGM 101;2008 E:
“NOTE 2 The only joint PDFs considered explicitly in this Supplement are multivariate Gaussian, distributions commonly
used in practice.”
Temperature measurements simply do not have Gaussian uncertainty distributions.
This has been pointed out to you before in other threads. Why you keep ignoring this is hard to understand. Do you have memory problems?
And before you throw it out, a uniform PDF is *not* an acceptable pdf for uncertainty. Not every value in an uncertainty interval has the same probability of being the true value. Only one value has a probability of 1, the rest have a probability of 0. You just don’t know which value has a probability of 1. it is unknown. This is an impulse pdf which appears no where in the GUM.
Tell us how averaging measurements increases the RESOLUTION of measurements beyond what the were actually measured.
If resolution can be increased by averaging, why do we spend more and more on better measuring devices. We just need to average more and more reading together don’t we?
If the resolution is not increased, how can the uncertainty be decreased?
He won’t because he can’t. He is as much a fraud as all the rest.
And this is supposed to give us a GLOBAL metric?
I get the importance for knowing or controlling uncertainty to very small and very strict terms in application to many scientific and technological endeavors. Quantum events tend to operate on such short time scales, relative to human consciousness, that exacting methods are needed to identify, study, or utilize them, and indeed this seems true for many aspects in many domains.
However, being concerned about hundredth or tenths of a degree for environmental temperature measurements seems akin to medieval philosophical arguments about how many demons can escape from hell in a minute of evil distraction from the true faith. It is a ideological smoke screen that does not matter a whit for any purpose except confusing people’s minds.
So, in such and such a place, there was a temperature measurement 1 degree higher, or lower, than ever before recorded by humans. So what? It means nothing unless you assign a meaning to it, or more importantly, accept an assignment of meaning put forth by someone who wants to control you.
What’s truly bad about the old arguing on demons escaping from hell, or angels dancing on the head of a pin, is that both sides have bought into the event actually occurring, and now they’re just arguing over the magnitude.
It’s the same place we are in the climate wars: the unproven hypothesis of CO2 warming the Earth has been accepted, and now we’re arguing about how much damage one or two more molecules per million will do.
The angels aren’t just dancing, they’re breaking windows, painting graffiti, and slicing tires!
The following is a series of determinations of the Astronomical Unit – mean distance from the Sun to Earth in millions of miles taken from published research papers. It is not surprising that over time the reported MU became smaller as methods and instruments improved.. However, note that each new paper reports a value outside the MU limits reported in the previous paper. Indeed, it is probably this fact that lead to the publication of these papers.
The point is simply that it’s pretty common for scientists (even good ones) to be a bit overly optimistic about the Measurement Uncertainty of their results.
YEAR RESEARCHER A.U.VALUE MIN. MAX. MU +/-
1895 Newcomb 93.28 93.20 93.35 0.075
1901 Hinks 92.83 92.79 92.87 0.040
1921 Noteboom 92.91 92.90 92.92 0.010
1928 Spencer, Jones 92.87 92.82 92.91 0.045
1931 Spencer, Jones 93.00 92.99 93.01 0.010
1933 Witt 92.91 92.90 92.92 0.010
1941 Adams 92.84 92.77 92.92 0.075
1950 Brouwer 92.977 92.945 93.008 0.032
1950 Rabe 92.9148 92.9107 92.9190 0.004
1958 Millstone, Hill 92.874 92.873 92.875 0.001
1959 Jodrell, Bank 92.876 92.871 92.882 0.006
1960 S.T.L. 92.9251 92.9166 92.9335 0.008
1961 Jodrell, Bank 92.960 92.958 92.962 0.002
1961 Cal. Tech. 92.956 92.955 92.957 0.001
1961 Soviets 92.813 92.810 92.816 0.003
Currently accepted value = 92,955,807.273
Rick C,
Thank you for that neat example of the real world.
Previously, from my younger days in analytical chemistry, I had a favourite example. It was a couple of papers by George Morrison reviewing the accuracy of analysis of elements in Apollo 11 moon rocks and soils. Before this exercise the many esteemed labs gave estimates of their proficiency that were sometimes shot to pieces in this careful interlaboratory comparison. History does not record in convenient form whether successful remedies were applied thereafter.
One remedy should have involved hubris.
Geoff S
Geoff==>You say: “There are grounds for adopting an overall uncertainty of at least +/- 0.5 degrees C for all historic temperature measurements when they are being used for comparison to each to another. It does not immediately follow that other common uses, such as in temperature/time series should use this uncertainty.”
I totally agree that the minimum uncertainty for any historic temperature measurement should be at least 0.5C.
If those measured temperatures are used in a time series, then the uncertainty remains unchanged. If those individual temperature measurements are then averaged into a Daily Average (a not-quite-proper use) then the uncertainty remains unchanged.
I do not agree that any uncertain measurements of a different thing (different time, different place, different circumstance) can be averaged to produced a less uncertain result — which reduction is just a trick of mathematics, not a change in the Real World Uncertainty.
I and many others have argued this point for many many years….no evidence has been produced to refute my opinion.
Another reason that climatology is pseudoscience.
+1000
KH said: “If those individual temperature measurements are then averaged into a Daily Average (a not-quite-proper use) then the uncertainty remains unchanged.”
KH said: “I do not agree that any uncertain measurements of a different thing (different time, different place, different circumstance) can be averaged to produced a less uncertain result”
Under what set of methods and procedures for uncertainty assessment are you basing these statements?
Is it correct to assume that you reject the methods, procedures, and examples defined in GUM JCGM 100:2008, NIST TN 1297, NIST TN 1900, and the results produced by the NIST uncertainty machine?
Someone mute this droid, PLEASE. It is stuck in a loop.
In case you forgot to read the whole NIST document. It spells out what we have been trying to get you to think about.
(bold by me)
Look at the term measurand, it describes a singular item, not an average. An average is not a measurand, it is a statistical calculation to show the central tendency and is only accurate for mostly normal distributions. Let me say it again. AN AVERAGE IS NOT A MEASURAND, IT IS A STATISTICAL CALCULATION TO SHOW THE CENTRAL TENDANCY OF A DISTRIBUTION!
Let’s go on.
I don’t see anything in the functional relationship or the uncertainty formula about dividing by the sqrt N.
And here is another kicker.
“A.4 As an example of a Type A evaluation, consider an input quantity Xi whose value is estimated from n independent observations Xi,k of Xi obtained under the same conditions of measurement. In this case the input estimate xi is usually the sample mean … , and the standard uncertainty u(xi ) to be associated with xi is the estimated standard deviation of the mean ”
Read this carefully. It says Xi,k observations of Xi under the same conditions of measurement. That means the same temperature, humidity, pressure, wind, enclosure, measuring device, etc. In other words, the same thing, multiple times, with the same device.
This whole document is exactly what I described to you earlier. It deals with the measure of one measurand, exactly like the GUM. This is not a document telling you how to find a global temperature and assess its uncertainty. You are barking up the wrong tree.
BTW, I never did find where NIST showed how to average temperatures and find the combined uncertainty.
“I don’t see anything in the functional relationship or the uncertainty formula about dividing by the sqrt N.”
There are a bunch of people here who thump the table about the GUM and NIST, without any understanding of what it means or how to apply it. The application of A-3 goes thus. The averaging formula is
A=f(x)=(Σxₖ)/N=Σ(xₖ/N)
So the derivative ∂f/∂xₖ=1/N. The cross-derivatives in the second term of A-3 are zero, as the text envisages, so the result is
u²(A)=Σu(xₖ)²/N²
If the u(xₖ) are equal, then, since N of them are summed
u²(A)=Σu(x)²/N²=u(x)²/N
or u(A)=u(x)/sqrt(N)
If the uncertainties are not equal, the arithmetic is more complicated, but the reduction with N is similar.
So u goes to zero by increasing the number of averaged points?
NO!
And you do? Just like bgwxyz and bellcurveman, you are blinding stuffing into an equation because it gives the answer you NEED.
This is just one of the reasons why trendology is pseudoscience.
That’s what I get too. In the previous article I did this derivation starting from GUM equation 10 which is an approximation of GUM E.3 (or NIST A.3 above) that works when the measurement model Y is linear and the input quantities are uncorrelated. In most cases the second term on the right-hand side of GUM E.3 (or NIST A.3) is zero.
GUM equation 10
u_c^2(y) = Σ[(∂f/∂x_i)^2 * u^2(x_i), 1, N]
Let
y = Σ[x_i, 1, N] / N
u(x) = u(x_i) for all x_i
Therefore
∂f/∂x = ∂f/∂x_i = 1/N for all x_i
So
u_c^2(y) = Σ[(∂f/∂x_i)^2 * u^2(x_i), 1, N]
u_c^2(y) = Σ[(1/N)^2 * u^2(x), 1, N]
u_c^2(y) = N * [(1/N)^2 * u^2(x)]
u_c^2(y) = 1/N * u^2(x)
u_c^2(y) = u^2(x) / N
u_c(y) = u(x) / sqrt(N)
For the lurkers…One of the most common math mistakes I’ve been seeing here is the partial derivative ∂f/∂x_i. The symbology makes this more complicated to understand that it really is. All it is the change in the output of the function y given a change in the input x_i. There are several ways it can be computed mathematically. I prefer the finite differencing technique and especially the central difference variation of the first order f`(x) = [f(x+h) – f(x-h)] / 2h. In this case setting h=1 is sufficient. But to be frank this specific function y is so trivial even a middle schooler can compute ∂f/∂x_i in their head. It’s obviously 1/N.
And you are still WRONG, this is just a fraudulent accounting trick that does not reflect reality.
And if you don’t divide N through, you get a different answer! Why is this?
bdgwx said: “One of the most common math mistakes I’ve been seeing here is the partial derivative ∂f/∂x_i”
Let’s perform the central difference method on y = f = Σ[x_i, 1, N] / N to see how ∂f/∂x_i is computed analytically.
Let
f(x_1, …, x_N) = Σ[x_i, 1, N] / N
Partition f in preparation for central differencing
f(x_i, z_1, …, z_N-1) = (x_i + Σ[z_j, 1, N – 1]) / N
Starting with the central differencing formula
∂f/∂x_i = [f(x_i+h, z_j…) – f(x_i-h, z_j…)] / 2h
We have
∂f/∂x_i = [f(x_i+h, z_j…) – f(x_i-h, z_j…)] / 2h
∂f/∂x_i = [((x_i + h + Σ[z_j, 1, N – 1]) / N) – ((x_i – h + Σ[z_j, 1, N – 1]) / N)] / 2h
∂f/∂x_i = [(x_i + h + Σ[z_j, 1, N – 1]) – (x_i – h + Σ[z_j, 1, N – 1])] / 2hN
∂f/∂x_i = [x_i + h + Σ[z_j, 1, N – 1] – x_i + h – Σ[z_j, 1, N – 1]] / 2hN
∂f/∂x_i = 2h / 2hN
∂f/∂x_i = 1/N
Notice how all of the terms cancel and we are left with 1/N. That is the partial derivative ∂f/∂x_i which is just a statement of how much y = f changes when the input x_i changes. ∂f/∂x_i is used in GUM equation 10 or the more general GUM equation E.3 (or NIST A.3).
What is the uncertainty of the sum, droid-troll, and what is the uncertainty of N?
“Is it correct to assume that you reject the methods, procedures, and examples “
Have you stopped beating your wife?
Exactly. The pinnacle of trendology logic.
It’s the equivalent of measuring thousands of different size boards and then using those measurements to produce an average length with a three-decimal point precision.
You have the measurements, you know the sample size, the use of an anomaly rather than the actual measurement has reduced the range of the values, and so you apply the equations. You get an answer and work it out to three decimal points, et voila!
However, that final answer tells you nothing about the individual boards that were measured. If you measure the same board 1200 times, at least each measurement is dependent on the length of the board. The length of board 1 is completely independent of the length of board 1200.
It sounds like you’re getting it. The uncertainty of the average u(X_avg) is different than the uncertainty of the individual elements u(Xi). Computing X_avg and u(X_avg) does not change Xi or u(Xi). Increasing the sample size does not decrease u(Xi). It only decreases u(X_avg) and only when the sample X has correlation less than 1.
Liar, clown, troll, nutter—did I miss any?
I understand what you mean, I just think your methodology is misapplied. The CLT and Law of Large Numbers only apply if there IS an actual “correct” value. The single board has a definite length. The tool we use to measure it determines the accuracy and precision of the measurement.
The “average global anomaly” does not actually exist; it is derived, calculated. There is no “correct” value. It’s a best guess.
It’s not my methodology. It is an inevitability from the law of propagation of uncertainty.
It makes no difference if the measurand is derived or calculated. In fact, that’s a major element of the methods and procedures described in the GUM and employed by the NIST uncertainty machine. That is you can define a measurement model Y that derives or calculates the measurand and compute the uncertainty u(Y). The measurement model can be as simple as Y = X1-X2 or so complex it cannot even be written down explicitly.
“I hate droids” — Mandalorian
Your use of the methodology is fatally flawed.
Why can’t you answer the question that if
m = kX/g
then is the uncertainty of m
u^2(m) = u^2(k/g) + u^2(X/g) ????? (1)
Does gravity somehow lessen the uncertainty of the mass required for a spring constant “k” and an extension “X”?
Or is the uncertainty of m
u^2(m( = u^2(k) + u^2(X) + u^2(g) ???? (2)
Pick one. Either (1) or (2). The one you pick will also apply to your claims about how you calculate the uncertainty of an average.
“Pick one. Either (1) or (2).”

Both are completely wrong. Even the units are hopelessly wrong. Jim G upthread gave the NIST formula here
u^2(y) is the sum of the u^2(x_i) multiplied by (∂f/∂x_i)². The answer is
u^2(m) =( X/g)²u^2(k) +(k/g)² u^2(X) + (kX/g²)²u^2(g)
“ The answer is
u^2(m) =( X/g)²u^2(k) +(k/g)² u^2(X) + (kX/g²)²u^2(g)”
ROFL!!!
u^2(k), u^2(X), and u^2(g)!
Where is your sqrt(g)? Where is your u^2(k/g) and u^2(X/g) uncertainties?
You just admitted that the uncertainty of the average is *NOT*
u^2(x1/N) + u^2(x2/N) + … + u^2(xN/N)!
But that *is* what is required in order to get the sqrt(N) term as a divisor of total uncertainty in order to make it appear smaller.
You are as bad at using the GUM as bdgwx and bellman! And you can’t even understand your own math!
^^^^1000!!!!
“uncertainty of the average is *NOT*”
You posed an entirely different question,
“Why can’t you answer the question that if
m = kX/g
then is the uncertainty of m…”
That isn’t an average, and there is no N. But it has an answer in terms of A-3, which I gave. You gave proposed answers too, which involved adding things of different units.
Stokes runs away from the main point (again) with a Nitpick:
“That isn’t an average, and there is no N”
I didn’t say it was an average. But it is a DIVISOR, just like “N” is a divisor in an average. Both are considered constants!
You treat divisors the same.
The GUM doesn’t require consistent units when calculating uncertainties. If you want consistent units then convert the uncertainties to relative ones. The units will all be “%”. They will still add in the same manner.
He really has no uncertainty experience at all, just shooting from the hip now.
“The GUM doesn’t require consistent units when calculating uncertainties.”
Everyone requires consistent units. You were adding a spring constant to an extension (length) to an acceleration. Not even GUM can make sense of that. u(x) has the units of x.
Convert the uncertainties to relative uncertainties so the units are all %. QED.
That’s why Taylor says to use relative uncertainties when multiplying or dividing.
When you calculate the average uncertainty what is the unit for N?
Funny, I see you didn’t divide this by “sqrt N” to get the average uncertainty!
Here is what I said:
That was the point I was trying to get across and you just inadvertently clarified that.
NS said: “he answer is
u^2(m) =( X/g)²u^2(k) +(k/g)² u^2(X) + (kX/g²)²u^2(g)”
Confirmed. The trickiest part of this by far was ∂f/∂g which is kX/g^2 and totally non-intuitive (at least to me). I had to solve for it.
Let
y = m = kX/g
Therefore
∂f/∂k = X/g
∂f/∂g = kX/g^2
∂f/∂X = k/g
Using GUM equation 10
u_c^2(y) = Σ[(∂f/∂x_i)^2 * u^2(x_i), 1, N]
So
u_c^2(y) = Σ[(∂f/∂x_i)^2 * u^2(x_i), 1, N]
u_c^2(y) = (X/g)^2*u^2(k) + (kX/g^2)^2*u^2(g) + (k/g)^2 * u^2(X)
bdgwx said: “The trickiest part of this by far was ∂f/∂g which is kX/g^2 and totally non-intuitive (at least to me). I had to solve for it.”
Here is the analytical solution to ∂f/∂g.
Let
f(k, X, g) = kX/g
Starting with the central differencing formula
∂f/∂g = [f(k, X, g + h) – f(k, X, g – h)] / 2h
We have
∂f/∂g = [f(k, X, g + h) – f(k, X, g – h)] / 2h
∂f/∂g = [(kX/(g+h)) – (kX/(g-h))] / 2h
∂f/∂g = [2hkX/((g+h)(g-h))] / 2h
∂f/∂g = kX/((g+h)(g-h))
Take the limit h => 0.
lim[kX/[(g+h)(g-h)], h=>0] = kX/g^2
Therefore
∂f/∂g = kX/g^2
I have to be honest here. I had to use a computer algebra system to verify one of the simplification steps above. I guess it’s just one of those days.
“∂f/∂g = kX/g^2”
It’s actually ∂f/∂g = -kX/g^2, but you can ignore the sign when it is squared. You don’t really need to go to first principles; more usual calculus would be to use the formula d(gⁿ)/dg = ngⁿ⁻¹, with n=-1.
NS said: “It’s actually ∂f/∂g = -kX/g^2”
Doh…I see the mistake. I left the minus sign off one of my steps. The ironic part is that it was the step I cheated and used a CAS (https://www.symbolab.com/ or https://www.wolframalpha.com/ work well). The CAS got it right. I just copied it wrong.
BTW…I was wondering how you got that so quickly. The other partial derivatives were easily done in my head, but ∂f/∂g had me stumped. Kudos to you. I’ll have to keep that simpler formula in my back pocket next time.
BTW #2…I tested the final u_c^2(y) formula against realistic input values and compared them to the NIST uncertainty machine and confirmed that the results of both match. The NIST uncertainty machine also provides the numeric evaluations of the partial derivatives as well which is pretty nice.
I feel like I’ve suddenly been plunged into one of those Facebook “what’s the right answer” posts, but in the right-hand side of the equation up there the negative sign certainly does matter because the order of execution would be to square g first, then divide X by that result, then multiply -k times the whole thing.
No, it doesn’t matter. The issue is the sign of ∂f/∂g.
(-∂f/∂g)²=(∂f/∂g)²
So you just wrote the equation incorrectly. You meant to square the entire right side, not just g?
That should have been written
(-kX/g)^2. Then the sign of k wouldn’t matter.
No, I squared both
(kX/g²)²
The sign of the derivative (kX/g², not k) does not matter in this expression.
I see you found the missing minus sign. You can’t just ignore the minus sign because it tells you the slope of the derivative.
When you take the square root of the square you still wind up with both a positive and negative value. That’s why the uncertainty interval is not just positive but +/-!
“When you take the square root of the square”
Where do you do that?
It’s far simpler than this.
kX/g ==> kXg^-1. The derivative of g^-1 is (-1)g^(-1-1) = (-1)g^-2.
So ∂f/∂g of kX/g = – kx/g^2
You missed the minus sign.
“It makes no difference if the measurand is derived or calculated.”
It *does* make a difference is the measurand is a measurement or a statistical descriptor. Something you have a hard time discerning!
The average is *ONLY* useful with a symmetrical probability distribution. The average is meant to give an expectation of what the next value can be expected to be (along with the standard deviation of course). Basically if the mean and the median are not the same the mean and the standard deviation are not good descriptors of the distribution.
If you do not have a symmetrical distribution the statistical descriptors of “mean” and “standard deviation” are tell you nothing. You should use at least the 5-number statistical description: minimum, first quartile, median, third quartile, maximum.
When the CAGW advocates start giving us the 5-number statistical description of their data sets then I might start paying attention to what they come up with.
The average length and the average uncertainty only applies to a theoretical board. None of the MEASURED boards will likely meet those specifications. That makes the mean and average uncertainty meaningless.
You keep calling it the uncertainty of the average. It is *NOT* that at all. It is the average uncertainty of the population. It is the value that when multiplied by “N” gives the same total uncertainty as if you added each individual uncertainty together! The uncertainty of the average is the propagated uncertainty of the individual components.
That’s why Eq 10 in the GUM says nothing about dividing by N. The uncertainty u^2(y) is the sum of the u^2(x_i).
Increasing the sample size only increases N. If all individual components have the exact same uncertainty then u(total)/N doesn’t change with increased sample size. The average uncertainty remains exactly the same. That fact alone should clue you in to the problem with what you are doing!
“The uncertainty u^2(y) is the sum of the u^2(x_i). “

Just not true. Jim G upthread gave the NIST formula here
u^2(y) is the sum of the u^2(x_i) multiplied by (∂f/∂x_i)². And for the average,
A=f(x)=(Σxₖ)/N=Σ(xₖ/N)
then (∂f/∂x_i)²=1/N²
bdgwx is, as usual, right.
Stokes is now the world’s leading expert on both trendology and uncertainty.
*snort*
And you are a mindless heckler. You have nothing useful to say.
You cannot sweep uncertainty away under the rug with accounting tricks like root(N).
As I’ve told your disciples many times to avail, forcing measurement uncertainty to zero as N increases is absurd. Tim tried to show you up above on your random walk graph but you pooh-poohed him.
Judging by the downvote, these trendologists DO believe that uncertainty goes to zero.
“As I’ve told your disciples many times to avail, forcing measurement uncertainty to zero as N increases is absurd”

Here it is written out explicitly by GUM. They have allowed extra uncertainty about shift α and scale β. But if you set α=0. β=1, you get
u²(z)=s²(qₖ)/n
or u(z)=s(qₖ)/sqrt(n)
You skipped over the vital point, Nitpick:
“…these n values are obtained from n independent repeated observations qk of a random variable…”
In a time series (like temperature measurements), you get exactly one chance to observe the variable before it is gone forever.
n is by definition always equal to ONE!
So divide away by root(N) to your hearts content, it changes nothing and uncertainty is NOT reduced.
Not one of the Stokes lot has ever tried to debate this point, all they do is go for the downvote button.
No, if you have n repeated observations, they constitute a time series with n consecutive terms. You could have n spins of a roulette wheel, or indeed n successive temperatures on different days.
The reason you can’t understand what I’m saying is because you have zero real world metrology experience.
Try again slowly: T_i is not the same variable as T_i+1, therefore “repeated observations” is a fiction.
They will *NEVER* understand it! It’s religious dogma for them!
If Nick Stokes is representative of the technical rigor of climate “science” in general, it is in far sadder shape than most realize. In typical Stokesian fashion, he went rooting through the GUM for anything he could use as a nitpick that would justify this nonsense and stumbled onto the radioactive decay example (H.4) in which they use—wait for it—averaging! Success!
I then pointed out to him, after a 30-second skim of H.4, the lack of anyplace where 1/root(n) is used to reach tiny uncertainty values. He responded by pulling out a place where they talked about sigma*root(n) with Poisson distributions. Only small problem — they multiplied instead of dividing!
The guy is a dunderhead who people should not be buffaloed by at all.
Did you bother to read any of the assumptions made with this?
Look at 4.2.2 and 4.2.3 closely.
You will find these are multiple measurements of the same thing! Multiple measurements of the same thing.
You are as bad as the others. You can’t just cherrypick formulas without knowing the underlying assumptions.
It has been mentioned so many times it is no longer funny. The GUM and NIST are based upon physical measurements of a single measurand, ONE MEASUREAND. I’ll repeat, ONE MEASURAND. A single measurand may be made up of multiple MEASURANDS. Volumes come to mind. The Ideal Gas Law is another. PV = nRT. Each measurand has it’s own uncertainty. That is why you see these kinds of formulas for combining multiple physical measurands into a single Measurand. Each of those measurements that go into a calculated measurand may have experimental uncertainties from multiple measurements of lengths, pressures, chemical reactions, etc. But THEY ARE ALL OF THE SAME THING.
Averages are not a MEASURAND. AVERAGES ARE A STATISTICAL DESCRIPTOR OF A DISTRIBUTION. The appropriate descriptor for a mean is a standard deviation. You can not get around that.
If you believe that averages are worthwhile then tell us what your result is when you propagate daily variances all the way to an annual average. I suspect you can not because you haven’t done the work to characterize your distributions!
“A” is not a functional relationship that is used to calculate a measurand. Have you read nothing I’ve posted before on this thread?
The propagation of uncertainty formula’s are used to evaluate a measurand. An average is not a relationship that provides another measurand, that is, the measurement of physical quantity.
An average is a statistical calculation to obtain the central tendency of a probability distribution. The appropriate uncertainty measurement for an average is the Standard Deviation of that distribution.
Funny how Standard Deviation or Variance is never quoted when discussing a Global Average Temperature isn’t it?
A few comments back I showed E3.4 of GUM, in which they explicitly used the propagation of uncertainty formula E3 to calculate an average, and got exactly the 1/sqrt(n) factor for uncertainty. You are just making up rules.
And then was oblivious to the fact that n=1 in a time series.
“oblivious to the fact that n=1 in a time series.”
In fact, GUM does a time series example H.4. It’s actually for radon undergoing radioactive decay. They take regularly spaced readings, then the exponential curves are fitted, which is effectively an averaging process.
Here is the table of the samples they measure:
And yes, averaging is part of this:
“The arithmetic means RS, Rx , and R, and their experimental standard deviations s(RS), s(Rx), and s(R), are calculated in the usual way [Equations (3) and (5) in 4.2].”
Another fail, Nitpick, nowhere in ex. h.4 do they blindly divide by root(n) to get an impossibly small uncertainty.
Next…
In fact they do. There are 6 numbers making up the means of R_x and R_S referred to in the previous extract, and here they say the uncertainties of R_x and R_S are sqrt(6) times those of the means.
Are you blind?
No divisor there!
The uncertainty of the mean is uncertainty of R divided by √6, They write that the uncertainty of R is uncertainty of the mean multiplied by √6. Same thing.
More gaslighting, Stokes—neither H.22a nor H.23a, the expressions for u_c(A), have root(n).
Unless you believe that s(R)/R = s(R)/root(N).
Fail.
You do realize that the uncertainty of the mean is really the Standard Error of the sample Mean, right? The formula that relates the SD and SEM is:
SD = √N * SEM
where N is the sample size.
Again, where is the SD for the GAT?
Again, “their experimental standard deviations s(RS), s(Rx), and s(R), are calculated in the usual way”, shows that they use standard deviations. Do you know why? Because the measurement uncertainties are miniscule compared to the experimental standard deviations.
Where are your standard deviations for the GAT? You could end this argument by showing that the SD’s far outweigh the measurement uncertainties.
You are cherrypicking. Tell everyone how “n” is determined.
Do you think it might be multiple measurements of the same thing.
It simply says n observations of a random variable. It could be n successive dice throws, or n observations of temperature. It is you who make up all this stuff about measurements. They are simply describing the mathematics of probability.
Inside you know you are wrong about time series measurements, but the cost of acknowledging the truth is too high.
So instead you fall back on gaslighting and ridiculous roulette wheel analogies.
With successive dice rolls you are measuring the same thing, the output of the dice. With temperature you are *not* measuring the same thing. No one is making anything up. We are trying to explain to you how the physical world works.
Do you *honestly* think that a civil engineer that uses the values of uncertainty/sqrt(n) for the torsional/compression/shear strengths of the beams in a bridge truss will remain employed for long let alone financially sound?
Do you truly believe that is how bridge trusses are designed?
They will not answer this question. Not now, not ever.
Good news, everybody! I re-sent my email to Dr. Possolo, PhD — NIST Fellow & Chief Statistician, and this time I got a very nice reply. Must have been a glitch on the first attempt. For full disclosure I’ll post my email, and then his response, so that we can all agree on the body of our response. Everybody is welcome to provide feedback for our email.
First, my email:
Kind of embarrassing to read it in full awakeness — I wrote this at 0100 — but I guess he figured it out. Now for his response.
And we’re off.
I think we’re all in agreement of the two scenarios I was attempting to describe.
1) We measure one thing 100 times. We have one board; we measure the same board 100 times. Or, we have one room; we take observations from 100 identical thermometers as close to simultaneously as possible.
2) We have 100 different boards of varying lengths. We take one measurement of each board. Or, we have thermometers in 100 different rooms and take one observation from each room.
For the answers to his questions, I propose the following:
1) The 100 measurements of the length of a board would be used to calculate the mean, standard deviation, and the uncertainty of the mean.
2) This is a hypothetical, so we don’t actually have instruments. If needed, we would attach reasonable instrument uncertainties to our measurements, e.g. +/- 0.5 mm or +/- 0.5 deg. C.
3) One measurement of 100 boards means we have 100 boards of varying lengths and take one measurement of each of them vs. taking 100 measurements of ONE board.
4) I guess for his question about our case 3, we would just agree on a representative instrument uncertainty like I said above?
This is exciting. If the PhD Chief Statistician of NIST can’t help us straighten this out, we’re hopeless.
“we have 100 boards of varying lengths”
He’ll probably ask for some definition of the population of boards. A common situation would be where a process aims to cut 2m boards, but there is variability. A common quality control situation, where you could indeed use his machine.
The population of boards is 100 boards of random lengths. Think of the boards as stand-ins for GHCN station temperature observations.
This is the argument we’ve been having, that the LLN and CLT are or aren’t applicable to measurement series that aren’t of the same thing.
I’m keeping this simple, because it is.
“measurement series that aren’t of the same thing”
You don’t need the same thing. You need a defined population from which the 100 boards are drawn, which generally means a probability distribution.
That’s one scenario. The other is 100 measurements of a single board.
But it doesn’t matter. The UM can’t be used for this kind of calculations anyway. It needs a function for x that gives y. This is not that.
Sure it can. Let y = f(x1, …, x100) and then have the function f operate on all 100 inputs. Now, I will say the web version is limited to only 15 inputs (understandable), but you could download the NIST UM software and run it on your own machine not be limited in this manner. And notice that the procedure in GUM section 5 (which the NIST UM uses) does not put a limit on the number of input quantities to the function f either.
If you go to a scrap yard and pick boards at random what kind of probability distribution would you expect?
It might help if you had some real-world experience to draw from!
By definition there is likely no probability definition for the population of boards. There is no relationship between them so it would be difficult to assign a probability distribution for them.
The only limit on their lengths is zero for a minimum and probably 50 feet for a maximum (the max length for a freight trailer on US highways). So I guess you can define a range for the population.
If you go to a local lumber yard and pull out 100 boards at random you will find lengths ranging from 2″ to 20′ (longer lengths would be a special order). If you just go down the row and pick sequentially from each bin you’ll not have any of the same length. Most will be of an even length (2′, 4′, 6′, 8′, …., etc) but not all. 3′ boards can be found as well.
If you just pick up random boards at the scrap yard you *might* find that 8′ 2’x4’x boards are the most populous but the variance is going to be huge since you will find boards that are cut to all different lengths and you will find longer lengths used for roof trusses support beams. You will likely see a severely skewed histogram.
————————-
Since you will be using field instruments uncertainty would seem to be an appropriate factor. For length measurement these are typically not calibrated closely like in a laboratory nor are they marked with a high resolution capability. I would have picked at least a +/- 1mm uncertainty and more likely a +/- 1.5mm (about 1/16″). Depending on the measuring tape an uncertainty of +/- 3mm (1/8″) wouldn’t be outrageous for a combined random+systematic uncertainty.
+/- 0.5C would seem to be a pretty standard choice. It’s what is listed in the US Federal Meteorology Handbook No. 1 for the standard. And the BOM apparently only calibrates to +/- 0.3C which would drift to a higher uncertainty in the field for many instruments.
——————————
Just like with the boards, it would be difficult if not impossible to define a probability distribution for temperatures since there is not an easily calculated relationship between them. Since the winter and summer temps have a different variance and since the distributions, even at the same location, have different tails according to the histograms I have seen, it would be impossible to say the distributions are normal or symmetric, either locally or globally. In other words the distributions, if you can actually define one, will be skewed.
There are probably other factors that need to be considered. I’ll have to think on it. For instance, boards would have to be defined by moisture content since new, wet boards change length as they dry.
I would just ask him straight up if you can define the measurement model as y = Σ[L_x, 1, N] / N where L_x is the length of board x and N is the number of boards and where each L_x is a different board possibly measured by a different instrument.
You could present a more relevant scenario. If we have a rectangle that is subdivided 4 times of equal area and we have temperatures t1, t2, t3, and t4 for those areas can the measurement model be y = (t1 + t2 + t3 + t4) / 4 such that y is the average temperature of the rectangle and u(y) is the uncertainty of that average.
I like your scenario 1 better, as it’s closer to our point of contention. It could be presented as scenario 1a, where l_1 through l_N represent the length of a different board for each measurement, vs scenario 1b where l_1 through l_N represent the length of one board measured N times.
I like scenario 2 because it is more relevant to this discussion since it has a spatial component, but they are functionally equivalent so it probably doesn’t matter all that much. I will say NIST TN 1900 E35 also has a spatial component, but it is a lot more complex since it uses kriging like techniques to form the spatial field.
Look at what you just wrote. You separate a rectangle into 4 squares, each with their own temperature. You then average the four temperatures and expect to find that temperature within the square?
Remember, to be a measurand built from an average, it must also be measurand of a physical quantity.
Unless you carefully define the temperatures so that the average comes out to one of the 4 divisions, you’ll never get that whole square average anywhere in the square. In other words, it isn’t a general physical measurement model.
I’m not expecting to find Tavg in one of the subdivisions of the square. That would be very unlikely unless the square is in equilibrium. What I’m expecting is to obtain a value for the measurand. In this case the measurand y is the average temperature. That is the measurement we are making. And the measurement model y = (t1 + t2 + t3 + t4) / 4 does just that using other measurands t1, t2, t3, and t4 as inputs in accordance with GUM section 5, the NIST UM manual, and JCGM 6:2020.
“I’m not expecting to find Tavg in one of the subdivisions of the square. That would be very unlikely unless the square is in equilibrium. What I’m expecting is to obtain a value for the measurand.”
ROFL!!! I thought you have been trying to say that the average *IS* the measurand. Here you are trying to have it both ways! If you can’t find the average anywhere then how can it also be a measurand?
It’s not obvious that you understand what the word “measurand” actually means.
from the GUM:
“D.1.2 Commonly, the definition of a measurand specifies certain physical states and conditions.”(bolding mine, tg)
An average is typically a statistical descriptor, not a measurand.
Unless, that is, the average is considered a “true value”, i.e. the average of multiple measurements of the same thing with no systematic uncertainty involved.
Better yet. Send him a link to this article and ask him to comment here!
I’ll offer it, but I don’t know if he’d want to take time either at the office or at home to read all of this. I just want to offer two agreed-upon scenarios on which to judge, but the more I read of the manual, the less I think the UM can be used for our purpose.
Send both agreed upon scenarios and a link to the lengthy back and forth. The only bad idea is to back out of your good idea. I’m noting that the alt.statisticates here aren’t cool with it, which doesn’t surprise me…
No way backing out. More concerned with looking like an idiot for my writing.
Just make sure he understands that (Tmax – Tmin)/2 is not the only problem. It is then averaging those monthly and annually, and then finding a average of any number of far separated stations.
I suspect his response will be like his E2 example. These are distributions and must be represented by a mean/average and a combined standard deviation/variance. I also expect that the standard deviation will be so large that it far outweighs any uncertainty in measurement.
Predictions.
When, oh when, will they grab their ears and heed the Rule of Raylan?
Hey blob, what is “QAnon”?
Seriously? Yes, rhetorical. Or am I giving you too much credit?
https://en.wikipedia.org/wiki/QAnon
You believe what you see in wikipedia?
Not all. In fact, I have edited petroleum engineering articles. But here, yea. Feel free to rebut – but specifically. I’m playing catch up after 3 months on the Cal central coast, but I’ll check back later today….
Since you worked in the petrochemical industry you might be interested in NIST TN 1900 E35.
Thanks.
And E2 and E34 obviously. I’m not a geoscientist. Petroleum engineers do assess reserves, both from production and volumetric analyses, rolled up into stochastic economic analyses. And we construct the wells and facilities and operate them. But the geoscientists tell us where to drill….
Did you read and understand all the assumptions he put into the example of temperature? They are necessary to using the calculator. You can not do this with multiple locations.
Yes, I did. This is why, if you are correct, Dr. Possolo will back you up. Or RU crawfishing from Mr. Schrumpf’s good suggestion? In fairness, it seems that Mr. Schumpf has an idea of Dr. Possolo’s response, and is already prebutting…
No backing off at all. Why do you think those assumptions were put in there? I am sure he understands that his assumptions were to simplify the problem. I’m not sure you do though. Why didn’t he use the NIST calculator? Did he use a functional relationship to obtain the proper equation to use? No, he did not. He used normal probability and statistics to calculate the mean and standard deviation. Why did he do this?
He was brave enough to show the standard deviation from that distribution. Neither you nor your peeps are willing to propagate the SD from the daily mid-range temps all the way up to the GAT. Why don’t you be the first?
I’ll disagree with him that the SEM is an appropriate calculation even though when you push out the confidence interval, you arrive at a value close to the SD. When you divide the SD by the “sqrt N”, you are indicating that you are treating the data as a population and not a sample. This means the calculated mean IS the mean and the SEM S/B zero. The fact that the SEM isn’t zero should immediately tell you that the distribution is not normal.
Just posted the first cut of a response at the top.
The thing I’m backing away from is being able to use the Machine to get a bunch of statistics for a series of measurements. I thought you could, say, plug in the August monthly averages for a GHCN station from 1900 to 2022, and get all the uncertainties and such.
I don’t think it does that now. I think that you’re supposed take a bunch of x measurements, plug in a y = f{x0…xn} function, and it will give you the uncertainties in your measurements from the function.
But the only way to do that is to assume that the measurements are of the same thing, multiple times, with the same device. In other words something running an experiment 100 times, keeping everything the same. That means your experimental uncertainty is the standard deviation of the distribution. It is not the measurement uncertainty of each monthly value.
JS said: “I don’t think it does that now. I think that you’re supposed take a bunch of x measurements, plug in a y = f{x0…xn} function, and it will give you the uncertainties in your measurements from the function.”
The point of GUM section 5 and the NIST UM is to compute u(y). Not only does it not compute u(Xi) it is actually required that u(Xi) be supplied to the tool.
For example, let’s say you want to know the uncertainty in the cooling degree days observation for a station. The measurement model measurement would be y = (Tmin + Tmax) / 2 – 65. You must input Tmin, Tmax, u(Tmin), and u(Tmax) into the tool. It will then report u(y).
NIST finds the average uncertainty, not the uncertainty of the average. Two totally different things!
I truly don’t think they are capable of grasping this truth.
Looking at the UM’s manual, I don’t see how it can be used at all for a temperature series. The manual clearly states there must be a y = f(x1. . . xN) relationship, which is clearly not the case in a series of temperature observations.
Global Ave=(ΣwₖTₖ)/(Σwₖ)
where T are anomalies for a month and w a set of (area) weights, prescribed by geometry.
That is not what they mean. That does not give you y for some x.
Have you looked at the UM ‘s manual?
I’ve not seen anywhere in either the GUM or the NIST manual where it is said that certain measurement models (formulations of Y) are disallowed. In fact, the GUM says Y can be so complex that you might not even be able to write it down. And the NIST manual and examples say you can use any R expression including calls to other R functions. And the JCGM 6:2020 document is entirely devoted to the development of complex measurement models and even says that when working with time series data you can average them to “abate the extent uncertainty”.
It said moving averages could be so used.
Your cherry picking has risen to an unacceptable level. Why didn’t you include the assumptions that go along with this section.
Please read this out loud several times and then consider what it says.
Here is another section from this version of the GUM.
And from JCGM 200:2012(E/F)
You are a glorified cherry picker that doesn’t understand what he is reading. A MEASURAND is a physical measurement that can also be calculated using other measurands. An average IS NOT a measurand, it is a statistical descriptor of a distribution. If I was your teacher, I would insist you write the previous sentence 1000 times.
BTW, I have asked several times about the propagated variance of temperature averages from the daily mid-range all the through to the GAT.
Are you going to provide those or not? If not, why not?
I agree. A functional measurement relationship must give you measurand, not an average value. An average has no fixed relationship that predicts an output of a physical value.
From the GUM:
“experimental standard deviations”! Why do we never see any standard deviations in these proposed uncertainties?
I have asked numerous times for this, and crickets! I can only assume that propagating SD’s from daily temps all the way up to a GAT would give the lie to the uncertainty. Variances add, they never decrease when you add random variables together.
These folks are grasping at straws!
OK, I went to the UM and did it. I asked for 4 variables, it promptly called them x0 to x3 and gave them unit normal distributions (I could have asked for anything there). It asked for an expression. I said (x0+x1+x2+x3)/4. I could have put in a weighted mean expression, which is what a global average is (too much typing though). Then I said RUN, and it told me the result was 0.002 (I asked for 10000 steps), and the sd was 0.5 (=1/sqrt(4))
Ha, ha, ha. 4 unit normal distributions and you got an SD of 0.5?
What if you had 4 distributions of temperatures without normal distributions? What would the SD be for 4 stations with temperatures from Southern Peru in the winter, Columbia in the winter, Washington D.C. in the summer, and Greenland in the summer? Do you reckon the SD will be somewhat larger than 0.5? Hummmm. Now subtract a baseline which doesn’t change the SD at all. I’ll bet it makes an anomaly of 0.02 +/- 0.5 look small.
OK, I did 4 different distributions from the choices offered. These clearly are not repeated measures of the same thing. All had mean zero, sd 1. They were; x0:Triangular, x1:gaussian, x2:rectangular, x3:uniform. Result, exactly the same – sd=0.5=1/sqrt(4) for (x0+x1+x2+x3)/4
There wasn’t anywhere I could input something about Peru. The UM just does what it says, combines uncertainties.
In other words you used your conclusion as the premise!
You listed not one single skewed distribution, NOT ONE!
Each of the distributions you list are symmetrical distributions. Pick a distribution where the median and the mean are *NOT* the same! I.e. the kind of distribution you would get from temperatures. Provide a box-whisker plot of your distribution showing the 5-number statistical descriptors of min, max, 1st quartile, 3rd quartile, and median.
-It the global temperature distribution Gaussian?
-Are the anomalies calculated from the baseline average and the current temp a Gaussian distribution?
-Since winter and summer temp (i.e. northern hemisphere and southern hemisphere) anomalies have different variances how does their combination become symmetrical and not bi-modal?
The only answer the trendology uncertainty “experts” can give to these questions is to push the downvote button.
This is desperation.
“You listed not one single skewed distribution, NOT ONE!”
The goal post shifting gets tiresome. First they had to be repeated measurements of the same thing. Well, clearly that can’t be, if they have different distributions. Now there is supposed to be something special about symmetry.
Well, the UM does offer unsymmetric distributions. student t, asymmetric (their word) triangular, etc. But they are specified by parameters other than sd, and you have to work out sd. I could do that, but then the goal posts will move again – it is a waste of time.
Isn’t that funny! The NIST calculator won’t let you input random variables, each with their own non-normal distributions. Just exactly what you unknowingly do when calculating the GAT, i.e., all stations have normal distributions with the same standard deviations.
All you have proven is your version of what you do gives the same answers as the NIST calculator.
Do you understand why this calculator only allows one distribution for all random variables? Apparently not.
Think about measuring one thing, multiple times, with the same device. Do you reckon that might give you random variables with the same distribution? Ho, Ho, Ho!
No wonder you can’t quote a propagated variance for the GAT, I’ll bet you’ve never done one.
I’m pretty sure the allowed distributions are all symmetrical as well. So you (they) can assume all uncertainty cancels.
“Isn’t that funny! The NIST calculator won’t let you input random variables, each with their own non-normal distributions.”
Isn’t that funny? Yes, they do. In fact, that is just what I did.
What’s even funnier is that the NIST UM lets you upload a custom distribution. Click More Choices in the drop down and then select Sample Values to upload the file with a custom distribution.
Have *YOU* done that for temperature data to see what you get?
I’ve attached a graph of what I get out of the NIST UM for my 10 year minimum temp record.
As you can see it is a bi-modal distribution which the UM uses to create a Gaussian distribution with a mean of 47 and an SD of 20 where the median of the data (58) is not equal to the average found by NIST.
If this is how you think a non-Gaussian distribution should be treated then we have an even deeper disagreement than I thought.
Well pin a bright shiny star on your dunce cap, BFD.
y = f(Tmin, Tmax) = (Tmin + Tmax) / 2 is a functional relationship.
y = f(t1, …, tN) = (t1 + … + tN) / N is a functional relationship.
y = f(L0, L1, T0, T1) = (L1-L0) / (L0*(T1-T0)) is a functional relationship.
All of these are functional relationships that take different temperatures as inputs to produce a single output. BTW…that last one is one of the examples provided with the software.
And you get a different answer if you don’t factor 1/N to the inside!
This should tell you something, but it won’t sink in methinks.
Hey look, I got another downvote for pointing out the obvious.
Success!
Let’s do #1 on the Machine. It works. You get a result. The result is TAVG. TAVG is not a known value, like the thermal expansion coefficient of copper. TAVG is just the mean of TMAX and TMIN.
Look at #2. Same thing. I plug in some values for Xavg -Xn. It gave a result. I don’t doubt the answer is correct, for as far as it goes. But what does the answer mean? It’s not a known value like the thermal expansion coefficient of copper.
The last one IS the thermal expansion coefficient example. According to the periodic_table.org site, the thermal expansion coefficient is 16.5 µm/(m·K). If their measurements are used in the measurement model, we get
(1.5021 – 1,4999) / (1.4999*(373.10-288.15)) = 1.727e-5 m/m-K or whatever units it’s in, but it’s the correct answer, or I should say, within the bounds of correct answers.
In your equations 1 and 2 above, there is no “correct” answer. There is only a result.
Do you see it now? The UM is not for getting the mean and standard deviation from a series of numbers, it’s for getting the uncertainty in measurements as applied to a known equation for getting some known value.
JS said: “TAVG is not a known value”
Of course it is known. It is defined exactly as Tavg = (Tmin + Tmax) / 2. It is a measurand no different than any other. And it happens to be used as a measurand in many applications including the daily average, heating degree days, cooling degree days, etc. More generally average temperatures are used ubiquitously in atmospheric science. The books on my shelf Mesoscale Meteorology by Markowski and Richardson and Dynamic Meteorology by Holton and Hakim use average temperatures. And it’s not just temperature. It’s other intensive properties like density, pressure, vorticity, etc. The uses are prolific and even encompass spatial extents of the atmosphere. The those averages all have an uncertainty.
Tavg IS NOT A PHYSICAL MEASUREMENT OF A PHYSICAL QUANTITY. You are barking up the wrong tree. The animal is in a totally different one. If you can prove otherwise, show a reference that says an average of different measurements of different things can provide a physical measurement.
Even Dr. Possolo did not use uncertainties in his E2 example of a monthly temperature average. He used the mean and standard deviation as the indicator of uncertainty. Just like the GUM says.
When are you going to start quoting a standard deviation for GAT and the calculations used to calculate it.
In other words he assumed all uncertainty cancels. Typical for climate science. And statisticians.
And Nick Stokes.
“Of course it is known. It is defined exactly as Tavg = (Tmin + Tmax) / 2. It is a measurand no different than any other. And it happens to be used as a measurand in many applications including the daily average, heating degree days, cooling degree days, etc. “
First, this isn’t an average. It’s a mid-range value, nothing more. Multiple different combinations of Tmax and Tmin give the same Tavg in this case thus the value loses any ability to reference the actual climate at a location.
For example, daytime temps are very close to a sinusoid. In that case .63Tmax gives a MUCH better representation of daytime average temp and is completely reversible. If you know the average value then Tmax is Tavg/0.63. You can’t do that with (Tmax+Tmin)/2.
Second, the fact that things have always been done one way is not an argument for continued use of that methodolgy. Degree-day values today are done using integration of the temperature curve and not (Tmax+Tmin)/2 thus the resulting values are far better at maximizing efficiency and cost of HVAC systems. It’s something the climate scientists should consider! It’s a downright shame that enigneers are leading the scientific community in adopting better methodology.
It’s really a condemnation of meteorology if they are using “average” for things like density, pressure, etc. Thermodynamics are a field in which gradients are very important and that includes material intensive properties. That was true clear back in 1942 which is the copyright date of one of my thermodynamic texts!
Methinks you’re seeing it clearly, and are conveniently backing away from:
“This is exciting. If the PhD Chief Statistician of NIST can’t help us straighten this out, we’re hopeless.”
Not at all. First cut response just below.
Glad to hear it, and I stand corrected on my presumption that you would discount inconvenient responses..
The first two are mathematical functions only. THEY ARE NOT FUNCTIONAL RELATIONSHIPS THAT PROVIDE A MEASURAND. They are statistical descriptors of a distribution. None of the temperatures in the first two control the value of a measurand, i.e., another temperature.
You still haven’t quoted a statistical descriptor of standard deviation or variance for any of the means you calculate using the first two equations! Is there a reason for that?
If you can not give an SD or σ^2 you tell us why not. Have you never tried to calculate one? Doing so for y = (Tmin + Tmax) / 2 is not terribly hard. There are calculators all over the internet to do the work.
You won’t get an answer.
This is the first cut of the response to Dr. Passolo at NIST. I want to keep this simple because I think our discussion/argument is basically very simple. These were his questions back to us.
1) How do you propose to combine the 100 measurements of the length of a board?
One side of our discussion believes that taking 100 measurements of something — in this scenario, a board — allows us to use the Law of Large Numbers to reduce the uncertainty in the mean.
If we measure the board five times, and have an SD of 0.12, and I’m doing this right, the uncertainty of the mean is sqrt(0.12/5-1) = 0.17
If we measure the board 100 times, and have an SD of 0.11, the uncertainty of the mean is sqrt(0.11/99) = 0.03
The other side of the argument says we can do this even if the measurements aren’t of the same board, but of 100 separate boards of different lengths. The same method that reduced the uncertainty of the mean for one board can be used to reduce the uncertainty in the mean of a hundred boards.
2) Is each measured value qualified with its own uncertainty, and do you have the uncertainty for each one of them?
No. We are just using unitless values with no uncertainties, for simplicity’s sake.
3) What do you mean by 1 measurement of 100 boards? Do you mean the total length once you have laid them down in a long straight line, board after board, end to end, or do you mean something else?
We mean there’s 100 boards of different lengths in the lab, and we measure each of them once.
4) For case (3), how has the uncertainty been quantified, and what is this the uncertainty of?
We aren’t quantifying the uncertainties. Right now we’re arguing about whether the method is applicable or not.
Another question was asked regarding the Uncertainty Machine: is this a measurement model?
OK, feedback time.
I should have said both sides agree on scenario1 of question 1. The argument is over scenario 2.
“One side of our discussion believes that taking 100 measurements of something — in this scenario, a board — allows us to use the Law of Large Numbers to reduce the uncertainty in the mean.”
You need to clarify that this has the following assumptions.
a) Multiple measurements;
b) of the same thing;
c) with the same device.
This will allow the average to remove random errors and obtain a “true value” However, it won’t remove uncertainty in each measurement generated by Type A and Type B uncertainties.
If we measure the board 100 times, and have an SD of 0.11, the uncertainty of the mean is sqrt(0.11/99) = 0.03″
Since one side will not address Standard Deviations and Variance, it should not be included in the question.
I don’t go along with this at all. We are talking about 100 boards, 100 measurements, one for each board, and each of the measurements being done with 100 different devices each of which has their own uncertainty.
My problem is that the GUM requires a measurand (singular) derived from one measurement or a defined functional relationship that provides a singular output for each combination of other measurands. In other words, PV = nRT or A = πr^2 or V = (1/3)πr^2h. These are derived measurands with a physical basis.
The other side says the GUM allows:
a) defining an average (as you show in the image);
b) and allows dividing a combined sum of uncertainties by N (the number of data points, here 100) and obtain an average uncertainty.
The point I would make is that the average probably won’t be a real physical measurand describing any of the boards and the average uncertainty probably won’t be met by any individual board. The numbers are meaningless from a physical sense. You can’t go grab a board and have any expectation of what you have. As such they do not meet the definitions the GUM uses for a measurand.
My main point is that Dr. Possolo did not even deal with uncertainties in his E2 example. He used traditional statistics to describe the distribution of temperatures in his examples. This allows one (in the 100 board example) to have an expectation that 68% of the boards will be within 1 standard deviation of the mean. In other words 68 boards of the 100 boards will be within 1 standard deviation. This provides the ability to make decisions based upon the statistical descriptors of mean and standard deviation.
If any thing else is asked, Dr. Possolo might want to respond about the standard deviation of temperature averaging for the GAT.
Absolutely, this is the lesson of the USAF “average pilot”, no person matched it.
I forgot that one. It might be good to mention that.
I think I also forgot to mention that one of the reasons a standard deviation or a deviation with confidence level is so that someone purchasing the lot of boards would have an expectation of how many boards would be within standard deviation. In other words, one could assume 68% of the boards would be within one standard deviation interval. That would allow a buyer to assess how well the boards might meet his needs.
I’m sorry, I didn’t realize these were responses you had received back. Let me deal with that.
1) The formula is σ/sqrt(n). Note that sqrt(σ/n) != σ/sqrt(n).
2) Yes. All inputs Xi, into the measurement model y must be accompanied with a u(Xi) value. That’s a requirement of GUM section 5 and the NIST UM.
3) I believe he’s asking what you do with them. In other words, what is the measurement model. Do you want it to be y = avg(x1…xN) or do you want it to be y = sum(x1…XN) or something else.
4) I don’t think anyone here has asked about the measurement model y = sqrt(Σ[(xi-xm)^2), 1, N] / (N-1)) yet. That measurement model produces the sample standard deviation and thus u(y) would be the uncertainty of the sample standard deviation. I’m not sure what meaning that has. The crux of this discussion is the measurement model y = Σ[xi, 1, N] / N which is obviously a different thing.
Averaging does not reduce uncertainty, regardless of how many times you post that it does.
I still think you should ask him about the average temperature of a region subdivided into 4 equal areas in which the temperatures for the subregions t1, t2, t3, and t4 and uncertainties u(t1), u(t2), u(t3), and u(t4) are known and where the measurement model is y = (t1 + t2 + t3 + t4) / 4 and ask whether u(y) is the uncertainty of the spatial average temperature of the region. That would be the most relevant to this discussion and the general approach used by global average temperature datasets.
The Law of Large Numbers can be applied if the 100 measurements are of the same thing with the same device. This will allow an average to remove random errors and obtain a “true value” However, it won’t remove uncertainty in each measurement generated by Type A and Type B uncertainties.
And, if you wish to use the SEM (Standard Error of the sample Mean) you must use a coverage calculation as in the E2 example in the NIST TN1900.
However, measurements of 100 different things with different devices may not be use a simple average to remove “random errors” because the errors can no longer be considered random.
In this case, one must assume that “q” is a function of a number of random variables. This equation comes from An Introduction To Error Analysis (2nd Ed) by Dr. John R. Taylor.
δq = √[{(∂q/∂x)δx}^2 + … + {(∂q/∂z)δz}^2]
I would add that “reduce the uncertainty in the mean of a hundred boards” by dividing by √100.
I would say the uncertainty in measurement of each board is 1/8″ = 0.125″
We want to characterize them for sale and need to let buyers what to expect.
I would say the uncertainty in measurement of each board is 1/8″ = 0.125″ and that the distribution is not normal.
I came rather late to this thread (though there was plenty left) and I missed bdgwx’ invocation of E2 from the NIST set of examples. I don’t know why the discussion didn’t end there. The example is exactly what is done in surface temperature. E2 works out a May average TMAX for the NIST location, using measurements on 22 separate non-consecutive days. That is not 22 measures of the same thing. And so how do they assign an uncertainty to that average?
Exactly as we have been saying; divide the uncertainty for each day by sqrt(m) to get the uncertainty of the monthly average. Then you can use the Gaussian properties to get a 97.5% confidence interval. If you don’t assume Gaussian, the interval is a tiny bit wider.
This has been quite an interesting and informative discussion, but is everybody being caught out by an overloaded term, “uncertainty”?
The example Nick quotes looks very much like the Standard Error of the Mean, but I think the other blokes are concentrating on the measurement uncertainty.
As Evelyn Waugh said of the US and UK, two great nations separated by a common language.
SEM has a very specific meaning, and these trendologists are abusing the meaning for their own ends.
That’s one possibility, but it looks more like different interpretations of the word “uncertainty” from practitioners in different fields and/or the NIST documentation using a rather broad definition.
Phrases such as “standard uncertainty” will have a very specific meaning, just as SEM, sample mean and population mean have very specific meanings.
This is why the GUM exists, which has precise definitions for all these terms.
It has to do with accuracy and precision. You can calculate a mean very precisely, i.e. a very small standard deviation of the mean (known by statisticians as standard error of the mean – a very misleading description) but it has nothing to do with the actual accuracy of the mean.
See the attached graphic. The standard deviation of the mean can be *very* precise while also being *very* inaccurate. The term “standard error of the mean” was coined by statisticians whose only experience with data never included any uncertainty. They were trained with data sets like (1, 2, 3, 4, 5, …., 100) instead of (1+/-0.5, 2+/-0.7, 3+/-0.4, 4+/-0.8, 5+/-0.9, …, 100+/-0.3). So they are trained to expect the sample means drawn from the first data set to define how accurate the mean is – never realizing that it has nothing to do with actual physical reality where you don’t get just “stated value” but “stated value +/- uncertainty”.
There are certainly differences between discrete and continuous distributions, which may be contributing to the differences of opinion between the practitioners from different fields.
“the Standard Error of the Mean”
GUM is a bit sniffy about that:
The problem is the other blokes insist that uncertainty of measurements can’t be distinguished from uncertainty of the mean. But see how Note 2 identifies it.
And to put that in its place, just a bit further down
“B.2.18
uncertainty (of measurement)
parameter, associated with the result of a measurement, that characterizes the dispersion of the values that
could reasonably be attributed to the measurand
NOTE 1 The parameter may be, for example, a standard deviation (or a given multiple of it), or the half-width of an
interval having a stated level of confidence”
That sample s.d. definition looks a bit off. There must be something I’m missing.
Note 3 is indeed a bit snotty. but it would be interesting to see how they define SEM.
That you think it is wrong is exactly why it is in there, it is not an “error” except in very limited cases.
Error is the difference between the true value of a quantity and a measurement of same. The problem is the true value cannot be known—this is why the GUM deprecate usage of the term.
But trendology (people like Stokes) desperately need it to keep their impossibly small “error bars”.
No, I’m concerned about the sigma (q sub k). Sample s.d. applies to the entire sample, not to any particular element of that sample.
Sample variance of a sample size of 1 by definition approaches infinity.
You are confused by the statistics concepts of sampling a fixed population.
Measuring an air temperature time series is NOT a fixed population, therefore all the statistics do not apply.
You have to propagate the uncertainty of each individual temperature measurement, not assume they all cancel like Stokes & Co.
Quite possibly, but 8.7.12 seemed to be more general than just Tmax.
I don’t know what 8.7.12 refers to.
D’oh! 8.2.17.
Not that I’m dyslexic, mind.
You missed the important part. “the same measurand“.
This is basically describing something like a chemical reaction where it is difficulty in translating individual component uncertainties into a final measurand through a functional relationship. In this case, an experimental uncertainty can be defined by running the same experiment using the same measuring devices, temperature, humidity, pressure, etc. a number of times to get values to average.
Note: These are not considered proper IID samples from a population so the GUM uses the term “experimental” to designate the statistical descriptors as a different thing that correct sampling would provide.
Consequently the “normal” standard deviation becomes “experimental standard deviation”. In other words, it is considered similar to the standard deviation of a population.
Like the normal SEM, it is considered an “experimental” standard deviation of the (sample) mean. Like an SEM is not an uncertainty per se, it is an interval within which the sample mean may lie.
You folks are still cherrypicking things that appear to support your conclusions without adequate knowledge of what the GUM is actually about.
THE MAIN PURPOSE OF THE GUM IS TO OBTAIN THE UNCERTAINTY OF A MEASURAND (singular) FOR USE IN EVALUATING WHAT IS NOT KNOWN ABOUT THAT SINGLE MEASUREMENT.
It is not designed to measure the uncertainty a series of measurements of different things made by different devices. That is what means, standard deviation, and variance is for, just like in the NIST TN1900 E2 example.
You forgot:
“”for a series of n measurements of the same measurand””
This would typically generate a Gaussian distribution around a true value (not guaranteed by typical) where the standard deviation around the mean can be considered to be the uncertainty.
In other words, not an example really useful with temperatures.
“The problem is the other blokes insist that uncertainty of measurements can’t be distinguished from uncertainty of the mean. But see how Note 2 identifies it.”
Standard deviation is *ONLY* a measure of uncertainty when used with a symmetrical distribution. If there is any skewness in the distribution the mean and the standard deviation are not useful statistical descriptions.
I’ve attached a box-whisker plot for my own weather station data over the past 10 years. You can readily see that the distributions are not symmetrical or Gaussian. If you want I can provide histograms showing the same thing. How do you combine these with a simple (Tmax+Tmin)/2 equation? And that’s not even considering the fact that daytime temps are sinusoidal and the nighttime temps are an exponential decay. The difference between the daytime average and the nighttime average is not well represented by (Tmax+Tmin)/2 if you are truly looking for a daily average.
I understand that Tmax and Tmin may have been all that was available long ago. That has *not* been the case for at least forty years. And it is possible to use Tmax and Tmin to develop better figures, daytime avg = .63Tmax and nighttime avg = ln(2)/λ would give a much better represenation. We do *not* have to stick to tradition. Just watch Teyve in Fiddler on the Roof sometime!
“You forgot:
“”for a series of n measurements of the same measurand”””
You are telling me that NIST forgot it? In one of their published examples (E@ur momisugly in fact)? Because that is exactly what they are doing. Combining measurements over different days. This is not measuring the same temperature n times.
And sigma/root(n) does not apply as an estimate of uncertainty, regardless of what NIST wrote.
Guess what, Nitpick, NIST is wrong. They did not average 22 measurements of the same quantity!
Here is what the GUM E.4 says:
The ONLY case when standard deviations are divided are:
A TIME SERIES DOES NOT QUALIFY!
“Consider s(q), the experimental standard deviation of the mean of n independent observations qk of a normally distributed random variable q “
You never got a reply, did you?
Of course not.
“NIST is wrong”
The author is none other than Dr Possolo. And the example is deliberately chosen and developed at length. It wasn’t a slip.
You don’t understand this nor does the NIST author guy. I’m not buffaloed by big names or lots of letters after the big name, this is a lame appeal to authority.
I dunno. 22 measurements from the same station, averaged for the month seems like a different animal from taking 1 measurement from 22 stations and averaging those.
In the first case, you can say you have the average temp for this point on the Earth’s surface for that month. For what location on Earth do you have an average temp in the second case?
Still, this is one of the items I’m asking about in the response.
Actually, they are not different—the point is that during a time series of measurements, you get exactly one try to get a particular reading before it is gone forever. This is true regardless of location for air temperature.
N is always equal to exactly one, regardless of how many readings are averaged.
Yep.
Yes. There is no way to calculate an experimental standard deviation of a sample. That means you must assume at least the uncertainty of the resolution of the device used to measure the measurand.
“I dunno. 22 measurements from the same station, averaged for the month seems like a different animal from taking 1 measurement from 22 stations and averaging those.”
Watch the goal post shifting in real time! First it had to be repeated measurement of the same thing. Now it just has to be in the same place.
But their E35 takes samples containing uranium from all over Colorado.
All these rules that people here make up are nonsense. GUM/NIST are just talking about uncertainty propagation from random variables. Randomness can arise from many different causes, but the mathematics is the same, as illustrated by the UM. The UM is just a function, as defined by an R expression, with inputs having randomness prescribed by a distribution.
The only difference between averaging readings over different days and over different places is that you should use area-weighted averaging. That just means that the R expression is not (ΣTₖ)/(N) but (ΣwₖTₖ)/(Σwₖ), for some set of weights w that you work out from the geometry. Actually, the former is just a special case of the latter with wₖ=1.
Big deal, what does it tell you?
FALSE! They are not “made up”, they are straight out of the GUM.
More Stokes gaslighting, the GUM goes WAY past random error-only.
I see Nick beat me to it. But E35 in NIST TN 1900 is a nearly perfect embodiment of the discussion. I say nearly perfect because the only way it could be any better is if it were dealing with a spatial field of temperatures (in K) as opposed to the mass fraction of uranium (in mg/kg). This is a great example because 1) the measurand forms a two dimensional scalar field and 2) because they present 4 measurement models that are then incorporated into a 5 measurement model that is the average of the original 4. And the original 4 are each incredibly complex themselves involving special spatial processing functions in R. All of the content discussed in E35 is equally applicable with temperatures.
Um, bgwxyz, do you understand that uranium deposits in Colorado are not a time-varying field, UNLIKE temperature?
Of course you don’t.
And so it goes on and on…
But the LLoN says … MUST BE REPEATED MEASUREMENTS OF THE SAME THING!!
But E2 measures different days … MUST BE ALL AT THE SAME PLACE!!
But E35 is distributed in space .. MUST BE ALL AT THE SAME TIME!!
There is a new rule for every circumstance.
Who dug out this stupid uranium example? Not me.
Try again slowly: air temperatures vary with time, uranium ore concentrations do not!
Do you see the difference? Of course you don’t.
Did you read this example with ANY understanding?
“independent random variables” — yet you have said the temperatures are correlated out to 1500 Km.
“with the same Gaussian distribution” — do you really believe that monthly temperature records ALL have a Gaussian distribution. This needs proof. The author made this assumption knowing that the proof was not available.
The author warns that:
He goes on to say that other criteria allows him to make that decision. Have you made tests to see if those criteria are met?
He goes on to say:
He says the few measurements wouldn’t justify adopting such a model. Are the temperature readings you are using to few to justify using a better model?
Have you proven that calibration uncertainty is negligible when compared to other uncertainties? With the uncertainty levels I have seen quoted out to the one hundredths place and even one thousandths place, I’ll bet calibration errors are NOT negligible.
Have you tried this procedure to see what the monthly intervals are for the stations you are using for GAT?
You can’t just cherrypick some formulas and calculators without recognizing all the assumptions that go along with using them. Have you asked yourself why the author didn’t just use the NIST calculator for such a small number of data points? Hmmmm. Maybe he recognized that it wasn’t appropriate?
With all this desperate and idiotic cherry picking, Stokes then has the gall to accuse anyone who doesn’t bow down to his great visage of “moving goalposts”.
After seeing first-hand the level of professional ethics and rigor exhibited by these so-called climate scientists, I am reminded of the weasels in Dilbert.
“Have you made tests to see if those criteria are met?”
NIST says the criteria has been met. This is NIST’s official guide to uncertainty calculation. It would have been extensively reviewed within NIST.
Despite all your huffing, the fact is that NIST did exactly as climate scientists do. They took 22 readings of temperatures on different days in May, averaged them, and calculated the standard uncertainty (their words) of the mean as σ/(sqrt(22). So much for all the noisy but unsupported claims above that this can only be done for repeated measurements of the same thing.
This is all getting a bit heated for a technical discussion, so I should have enough sense to keep my head down.
Not singling you out, Nick, just seeing the different approaches of those with similar domain expertise* in closely related domains.
aiui, E2 is an example of calculating the experimental standard deviation for the mean Tmax for May 2012 at a particular site.
Leave aside the resolution bounds of the measurements, which just reduce clarity here.
The population is the Tmax for each day in May at that site (a population can be whatever we define it to be, but the calculations are explicitly covering the sample Tmax figures for May)
If we had the Tmax for each of the 31 days, it would be the population mean, so no need for an estimator.
A sample of 22 days is a fair proportion of the 31 days in the population, so should give a reasonable estimate of the population mean.
Try the same calculations with sample sizes of 5, 10 and 15 and see how well they converge on the population mean.
Even better, use a sample size of 30 and see how closely the mean matches the population mean.
Unfortunately, we don’t have the actual figures from that station for that period.
A similar exercise can be conducted for any station where we do have the full Tmax series for a chosen month.
Calculate the population statistics, then calculate the experimental standard deviation for each possible random sample of chosen sizes (5, 17, 22 an 30, for example) and see how well they match the population mean.
Similar exercise could be conducted for each season, and a full year. The number of random samples blows out rather quickly, though.
[*] and, yes, that domain expertise is stronger than mine.
I get what you’re saying with only 22 observations. Even if the example was working with all 31 days you can still do a section 4.2 evaluation. For example, if we use the data from the Reagan Airport in 2012/05 (can be downloaded here) we get s = 3.0 C so u(Tmax_avg) = 3.0 / sqrt(31) = 0.54 C. If we do a section 5 evaluation and the combined uncertainty estimate of 0.2 C from pg. 34 of the BOM report it would be u(Tmax_avg) = 0.2 / sqrt(31) = 0.04 C.
40 milli-Kelvins! Total nonsense!
Again you demonstrate your lack of any real world knowledge or experience.
You forgot to include the coverage factor that the author calculated according to G3.2. Read the entire section to figure out why the Student’s T was used.
You can’t escape the conclusions that the author concluded.
Plus you can’t escape the need to calculate the uncertainty for each month and then combine them properly. Even if you use the 0.8 number for a month you are going to end up with a much larger uncertainty than you are currently quoting. I estimate between 10 and 100 times.
Once again you are confusing standard deviation of the stated values with the uncertainty of the mean calculated from those stated values.
Even though the measurement devices are only calibrated to +/- 0.3C you think you can dismiss that (and so does the author of the example) uncertaInty.
Accuracy and precision ARE NOT THE SAME. You might be able to calculate a mean with a standard deviation of 0.04C but that is *NOT* the accuracy of that mean!
If you have all 31 observations, you have the population, so the E2 calculation isn’t applicable.
As the sample size approaches the population size, the sample mean approaches the population mean.
The intent of using full subsets of samples at various sizes is to show the reduction of spread of the sample means as the sample size increases.
I’m sorry, but this is wrong—the sample size is always equal to the population size, which is exactly one. It is not physically possible to sample a real air temperature more than once.
Each daily Tmax measurement is indeed a one-off (hence its own population), so it isn’t possible to repeat it or improve on it.
However, the population I am specifying is the population of those 31 daily Tmax populations for that month at that site. It is possible to calculate population statistics for that particular population, and sample statistics for samples drawn from that population.
The monthly populations can be further incorporated into larger populations by time, geography, site type, or some other arbitrary, method. All airports named after Heads of State, for example.
A population is just something for which we have all of the constituent elements. The classic example used for high school probability lessons is blue and red marbles in a jar.
There is a whole lot of questions about combining data from different days. It is certain you are not measuring the same thing so the data is heterogeneous. This can have serious impacts on measures of standard deviation. If the data has different variance (e.g. uncertainty) for whatever reason then the data doesn’t combine cleanly and estimates of uncertainty and significance are complicated.
Yes, you can combine them and calculate the typical statistical descriptors but whether those are valid must be determined first. They should always include a histogram or box plot to insure they fit a Gaussian distribution (or some other defined symmetrical distribution). If the distribution is skewed then a lot of the typical analysis gets a lot more complicated.
See my attached graph of the NIST UM handling of my ten year minimum temp distribution. It is obviously a bi-modal distribution since it covers all four seasons. But the NIST machine just turns it into a Gaussian and goes ahead and calculates the typical mean/sd descriptors. Not good, at least in my opinion.
When I get a chance I’ll see what I get for some monthly data. It probably won’t be bi-modal but it could very well be skewed in at least some of the months.
Or like Stokes you can just have wave and declare that all distributions become Gaussian as you more and more points.
Problem solved, real data be damned!
Adding the additional complications of skewed and multi-modal distributions, etc, tends to detract from the fundamental conceptual differences between the participants.
Hence, trying the 2 very clean approaches of
a) working back from a relatively small population with full sample combination coverage at various sizes for convergance of the sample means to the population mean
b) assuming error-free observations with a specified resolution level.
From what I read in E2, the author was using similar simplifying stipulations.
If you are going to depend on the NIST UM then you need to know what the UM is doing for all situations. There is nothing that says a small, clean sample can’t be skewed.
You should really use the very same assumptions for both small samples, large samples, and for populations.
The sample means converging to the population mean is not really the issue here. No one denies that the standard deviation of the sample means can be small. The issue is the accuracy of those sample means and of the mean calculated from them.
Even if you have the total population and can calculate the mean of the population directly that mean will still have uncertainty propagated onto it from the measurement elements of the population. The uncertainty is not the standard deviation of the stated values but the propagated uncertainties of the individual elements.
Simplyfiing assumptions such as all uncertainty cancels, all measurements are 100% accurate, that the average uncertainty of the population elements is the uncertainty of the average, or that the standard deviation of the population distribution is the uncertainty of the mean only lead to false impressions as to how uncertainty of measurements should be handled. You can see that proven in spades in this sub-thread.
+100
The only problem with your suggestion is that ALL GLOBAL temperatures become the population. You can then sample them randomly and easily show that the LLN should converge to the mean and if the sample size is large enough the SEM should be fairly small.
As it is now, stations are the random variables and they all have different distributions. That is, they are not IID. In addition, the calculation of standard deviation is done on anomalies as if they were the actual data distribution. They should carry the same variance as each month has. Subtracting a constant from the value doesn’t not change the variance.
Look at it this way. They very best standard deviation looking at example E2 is probably about +/-4. That means the uncertainty of the total would at best be 4. That is a far cry from the often quoted +/- 0.02.
ALL GLOBAL temperatures are A population, if one consolidates all of the smaller populations. The variance wouldn’t be pleasant, though 🙂
Old Cocky said: “ALL GLOBAL temperatures are A population”
I know what you’re trying to say and I agree with you in principal that a global average temperature is computed from a population of grid cells. However, not all of them are computed from the population. UAH is a good counter example. Their grid mesh has 10368 cells, but only 9504 of them are used in the averaging step. Many of the other datasets like BEST and HadCRUTv5 do compute a population average though.
I was trying very hard to avoid scope creep.
Just trying to make the point that a population can be pretty much anything provided it meets the criterion of having all elements of interest. Anything less is a sample.
According to the trendologists, the variance disappears in a puff of greasy green smoke as more and more points are averaged.
But, if you have a population, why are you sampling at all. Sampling is only done if you can’t obtain the full population of data. In your example, you don’t need either a sample mean as an estimated population means or an SEM to tell you accurate the sample mean is, you already know the entire population, just calculate the mean and standard deviation.
This is why climate scientist don’t want to declare their temperature database a population. Then they couldn’t calculate an SEM and divide it sqrt N to get an even smaller number.
It’s done specifically to show that the sample means converge to the population mean and that the experimental standard deviation decreases.as the sample sizes increase
The standard deviation of the sample means decreases, not the uncertainty of the mean that is calculated from the sample means. Precision is not accuracy.
Like Jim asked, if you have the whole population then taking samples to “estimate” the population mean is nothing more than mental masturbation. Just calculate the mean – you’ll have a zero standard deviation of the sample means.
There seems to be a difference in mental maps between the mathematicians and the metrologists, so trying to find some common ground by restating in different terms.
The experimental standard deviation decreases as the sample size converges on the population, and the sample mean also converges on the population mean. Using the full set of possible samples of various sizes helps illustrate this.
fwiw, I don’t think anybody here is being deliberately obtuse, but the different backgrounds have led to non-congruent sets of what “everybody knows”
The “mathematicians” here have openly declared their faith that air temperature uncertainties are as low as 40 mK.
This is absurd.
You talk about standard deviations, but what happens to measurement uncertainty?
Actually, they are, Stokes is the perfect example.
That is what I was trying to say, but must not have been clear.
As the sample size increases, the sample mean should converge on the population mean given a well-behaved data set.
“As the sample size increases, the sample mean should converge on the population mean given a well-behaved data set.”
But that says nothing about how accurate either the sample mean or the population mean is. Only propagation of measurement uncertainty can clarify that, not the standard deviation of the stated values.
Measurements should always be given as “stated value +/- uncertainty”. Standard deviation only depends on the stated values, not the uncertainty values. This is true even for Gaussian distributions of the stated values. The average of the stated values only becomes the true value for Gaussian distributions if the uncertainty can be considered symmetric and has no significant systematic uncertainty contribution.
Metrologists should understand:
Those are confounding factors which reduce clarity – scope creep, if you will. For argument’s sake, assume that E2 used the number of customers of a corner store in a small town during the month of May, 2012. Being a discrete distribution, all measurement errors and uncertainties disappear.
For a small population, Normality isn’t an unreasonable assumption. As the author of E2 noted, it is more trouble than it’s worth to do anything else in such cases.
On that basis, my interpretation is that the E2 calculation of experimental standard deviation of the sample is effectively calculating how well the sample mean estimates the population mean, and only how well the sample mean estimates the population mean.
Examples usually build in complexity, so one would assume the other factors to be introduced later.
you nailed it.
Old Cocky said: “If you have all 31 observations, you have the population, so the E2 calculation isn’t applicable.”
GUM section 4.2 does discriminate between sample and population.
Old Cocky said: “As the sample size approaches the population size, the sample mean approaches the population mean.”
True, but moot. Remember, the goal is to quantify the uncertainty of the estimate of the measurand. The measurand is the monthly average. Both a sample mean and the population mean are estimates of the measurand. But they aren’t perfect estimates. They both have an uncertainty.
bdgwx said: “GUM section 4.2 does not discriminate between sample and population.”
But neither does the GUM deal with “averages” of measurements of different things. You keep trying to stuff averages into the GUM procedures and that is misusing the document.
Read Annex B with careful understanding.
From Oxford Languages:
Is an average temperature a phenomenon, body, or substance that can be determined quantitatively?
Does an average have a true value obtained from a measurement?
Is an average of measurements subject to a measurement itself?
I’m not trying to be obtuse here, but this whole argument of using the GUM and UM is totally off base. Both of these documents are written to evaluate MEASUREMENTS of a single physical phenomenon, body, or substance.
Averages, means, deviations are used in measuring a single item because there should be MULTIPLE measurements of the SAME THING (measurand) that can provide a probability distribution that can identify variance in those measurements.
Averages of independent measurands is a whole different subject. I submit the Example E2 as proof that normal statistical analysis should be done. That is why the author used a standard deviation and expanded coverage to categorize the distribution of independent measurements of different things.
And time and again they totally ignore applying the GUM and NIST to the individual air temperature measurements, which do fall into the scopes.
The problem is they keep cherrypicking. They don’t even recognize the assumptions being made to allow for “manufacturing” the data into something that can look like data from measurements of a single measurand.
<pedantry>The measurand in the example is monthly average Tmax rather than monthly average</pedantry>
Even I am not feeling sufficiently foolhardy to weigh into calculating daily, monthly and annual averages 🙂
Therin lies part of the problem. By doing Tmax you are not starting with an average that has a mean and standard deviation. When you do Tmax it turns into a random variable with an average and variance. You can’t just average two random variables without also combining the variances. For a month you will have 30, 31 random variables that need their variances added in the proper manner.
“If you have all 31 observations, you have the population, so the E2 calculation isn’t applicable.”
That isn’t true. If you look through the calculation, the number 31 does not appear anywhere. They just take the experimental sd of 22 days, and divide by sqrt(22). If there had been 31 days measured, they would have done the same, dividing by sqrt(31). The population is May days. The uncertainty includes the possibility that the weather might have worked out differently. That is the most important component.
I’m surprised, Nick. The example uses a sample of the daily Tmax for 22 May days to calculate the experimental standard deviation (or standard uncertainty in the example), which is a measure of how closely the sample mean approximates the population mean.
The 31 May Tmax figures by definition constitute the population of May Tmax observations for that site. It can’t be anything else, because there are 31 days in May.
As formulated, the sample is a subset of the population of the daily May Tmax observations.
“which is a measure of how closely the sample mean approximates the population mean.”
It isn’t that. It is the uncertainty of the sample mean. If you had 31 days, it would be the uncertainty of the population mean. Missing values are not the only (or main) source of uncertainty.
When you use experimental sd for uncertainty, you include all the things that caused the variation that produced the sd. These include measurement error, but much more significantly, weather.variation. None of that goes away if you have all 31 days.
To use an “experimental standard deviation” you must make very, very explicit assumptions as the author of E2 did. In essence he set the problem up such that he could assume the data was from one measurand. He also made assumptions about the distribution being uniform. These assumptions are ok for setting up an example but must be proven for field measurements.
If you read the GUM G.1 Introduction:
This is why E2 calculated a coverage from the combined standard uncertainty to obtain the expanded uncertainty.
Show us the assumptions you need to make when averaging Tmax and Tmin to get a variance and standard deviation of a single measurand. Remember, E2 only used Tmax.
Show us the assumptions you must make to make a monthly average into a measurand.
Show us the assumptions you must make annual averages into a measurand.
Then show us the assumptions you must make to calculate a global average into a measurand.
You are not making a textbook example. You are creating an actual analysis that will be used by all.
“It is the uncertainty of the sample mean. If you had 31 days, it would be the uncertainty of the population mean. Missing values are not the only (or main) source of uncertainty.”
The standard deviation of the sample means is *NOT* the uncertainty of the mean thus calculated. It *is* how closely that mean approaches the population mean.
The calculation of the population mean is based on the stated values, not on the uncertainty. The population mean has no standard deviation, it *is* just the mean.
Measurement error is not part of the sd variation, it is part of the uncertainty of the mean. The sd variation comes from the spread of the stated values.
Weather variation *is* part of the standard deviation. In E2 is it the only contributor to the standard deviation since measurement uncertainty was assumed to be zero.
I think this is the crux of the differences between the people from different backgrounds, and perhaps somewhat ambiguous terminology.
From my background, the experimental standard deviation may as well be the SEM (with slightly larger variance), and that is purely used for determining how well the sample mean estimates the population mean. The population mean is the population mean, so it estimates itself perfectly. Sample equations just don’t apply to the population.
Old Cocky said: “which is a measure of how closely the sample mean approximates the population mean.”
Yes and no. It’s complicated. I know what you’re saying and I agree in principal especially if we were discussing pure statistics. But this is more than pure statistics. This is uncertainty analysis which adds a another layer of complexity. The section 4.2 procedure is actually about determining the uncertainty of the measurand. It’s not about how closely a sample mean approximates the population mean. The confusing part is that the formula used is the same, but the context and meaning behind it is subtly different.
The section 4.2 procedure (type A evaluation) is an experimental based technique to assess uncertainty. In contrast with the section 5 propagation of uncertainty which is a bottom-up approach the section 4.2 procedure is more of a top-down approach. Section 5 evaluations force you to identify, quantify, and propagate uncertainty sources individually. Section 4.2 evaluations are done statistically by letting the data describe its own uncertainty. As a result the meaning of the section 4.2 uncertainty can be dramatically different.
Example E2 uses the section 4.2 procedure which includes all sources of random effect uncertainty including resolution, noise, and anything else that contributes to the variation in the data. One not so obvious contributor is weather. Weather is an uncertainty source in this evaluation because it determines what the true average is. The fact that we don’t know what the other 9 observations are necessarily means that there will be uncertainty due to the variation caused by weather.
This is why the GUM does not prefer one type of uncertainty (type A or type B) over the other. In this particular example the experimental setup forces the type A evaluation to include an uncertainty component that in most cases isn’t really intended and produces an unnecessarily large uncertainty. That doesn’t mean type A evaluations are bad. In fact, we could reframe the experiment to produce a type A uncertainty that would be more meaningful but it would require independent measurements for comparison.
About which you know NOTHING.
This is true only for a single measurand. You must make a number of assumptions to do this. Are you saying the field measurements at all stations meet the assumptions necessary?
<"Example E2 uses the section 4.2 procedure which includes all sources of random effect uncertainty including resolution, noise, and anything else that contributes to the variation in the data."
You should reread E2 before making such broad statement in a word salad.
You’ll notice the author ended up using a Students T distribution rather than a Gaussian. Have made the same decision and if so based on what.
It is time for you to start showing some analysis of individual stations and all the averages used.
I think this is the key to the different interpretations.
From my background, with my unexaminable assumptions, if it swims like a SEM, waddles like a SEM and quacks like a SEM, it may as well be a SEM. The extra confounding factors extend its range a bit, but it’s still a measure of how well the sample mean estimates the population mean. It’s not an estimator of the uncertainty of the population mean, which is furry and burrows in the ground.
“ which is furry and burrows in the ground.”
ROFL!!
“which is furry and burrows in the ground.”
I think you pulled that one out of a hat. Suppose you did have a sample of all 31 days. Then you could say that the sample mean is certainly equal to the population mean. All that means is that you are equally uncertain of either. And the measure of your uncertainty is sd/sqrt(31).
I was thinking of a wombat, but it works equally well for a rabbit 🙂
This seems to be another of those mental domain map differences.
If you have all 31 May daily Tmax observations, by definition you have the population of May Tdaily max observations. A population can’t be a sample from itself.
Yes, it would be a sample of another population, but E2 specifies May daily Tmax observations, implicitly defining the population.
Yes, but the mean of 31 has uncertainty, no? How would you quantify it? The uncertainty is actually independent of whether you call the 31 a sample or a population.
Alas, this seems to be where we run into terminology differences and different mental maps.
Going back to my number of customers at the corner store for the 31 days of May, we have no measurement uncertainty or resolution bounds. We can assume that the distribution approximates to Gaussian. It probably isn’t, but let’s assume it is. Let’s also assume the data set contains no large excursions.
In this case, there isn’t any uncertainty in the population mean or variance.
In this case, the “standard uncertainty”/”experimental standard deviation” is identical to Standard Error of the Mean.
So, we expect the SEM to decrease as the sample size increases, and we also expect the sample means to more closely estimate the population mean.
Calculating and charting the full set of samples for each of the selected sample sizes would be an instructive exercise. No, I haven’t volunteered to do that 🙂
How do yo quantify it? By propagating the uncertainty of the measurements. According to the BOM the measurement uncertainty is based on a calibration tolerance of +/- 0.3C. According to the Federal Meteorology Handbook No. 1 it should be +/- 0.5C.
The uncertainty of a series of 31 data points with +/- 0.5C uncertainty and an unknown systematic uncertainty component the uncertainty would be +/- 0.5sqrt(31) = +/- 3C.
Even if you assume all measurement uncertainty cancels it is the sd that is the uncertainty. sd/sqrt(31) is the average uncertainty distributed across all individual elements. It is *not* the total uncertainty. The average uncertainty per element is not the uncertainty of the average.
This is a good explanation.
Assume a quality person samples the output of a machine making rods. They can measure the rods and obtain a mean.
Nick and his cronies would then sum the uncertainties and divide by the number of samples thus obtaining the average uncertainty of the samples.
Now does this average uncertainty tell one the range of variance in the rods?
Will any individual rod looked at have exactly that average uncertainty?
The more rods you sample the smaller and smaller the average uncertainty becomes. Is this what happens in a machine?
What percent of the rods will have that average uncertainty?
All of a sudden the standard deviation of entire distribution becomes important doesn’t it?
That is why it is necessary to take the uncertainty in the mean and use a coverage factor. From that you can begin to know if the machine is heading toward needed maintenance.
It is what the GUM was designed to assist in.
The standard deviation is only the determination of uncertainty if you ignore the individual elements propagating uncertainty to the mean. Which is what everything climate science does – just ignore measurement uncertainty totally or claim it all cancels.
“Section 4.2 evaluations are done statistically by letting the data describe its own uncertainty.”
That’s only true if you assume all measurement uncertainty cancels and the standard deviation of the stated values becomes the uncertainty.
That’s what we keep on trying to tell you. The GUM assumes multiple measurements of the same thing where you get a Gaussian distribution of measurements and the uncertainty of the individual elements in the data set cancel.
After reading through E2 several times, I still believe the end result was using simple statistical analysis to create statistical descriptors of the distribution. Regardless of the GUM references, simple standard deviation, error of the sample mean, and a coverage factor was all that was used.
+100
And it is still wrong wrong wrong, you get an F, mr. trendology expert.
Stepping into the lion’s den once again, it’s time to look at the measurement resolution effects on the population mean.
The May 2012 observations for Reagan Airport as kindly referenced by bdgwx show the daily maxima in whole degrees F.
Therefore, the resolution limit is implicitly +/- 1/2 degree F.
The sum of the Tmax is shown as 2482 and average (their term) is 80.1
Now, doing the same for the lower bounds, we subtract 15 (or 16 if you prefer) from the total. That gives a sum of 2467 and a mean of 79.6.
Repeating for the upper bounds, we get a sum of 2497 and mean of 80.5.
So, the population mean Tmax to tenths of a degree F is somewhere in the range of 79.6 to 80.4 degrees F.
A similar result obtains for Tmin
“Now, doing the same for the lower bounds, we subtract 15 (or 16 if you prefer) from the total. That gives a sum of 2467 and a mean of 79.6.”
That would make sense if every error had been in one direction. That is like getting 31 heads in a row. It can happen, but is very unlikely. Error ranges quoted are usually more in the range of 95% probability.
It’s not error, it’s the limit of resolution. To obtain full coverage, the calculations have to be made at the upper and lower bounds.
Yes, errors can reasonably be assumed to cancel, but the true measure can’t be known below the resolution limits.
Sorry, I should have added that with a Normal distribution, the mean can be assumed to lie midway between the upper and lower bounds, but it can’t exceed those bounds.
Old Cocky said: “It’s not error, it’s the limit of resolution.”
The limit of resolution causes error. It is unavoidable. Fortunately resolution errors are random. And since there are 31 measurements the CTL tells us those rectangular distributions tend to normal when convolved.
Wave your hands again and declare all errors to be random, Stokes.
QED
Once again, the master trendologist declares that all errors are random and all cancel in their Houdini air averages.
This took a while to occur to me, so sorry fo the delated reply.
For measurements taken with any individual instrument, there generally will be a bias within its calibration interval.
They are calibrated and certified to read within a specified interval at a number of specified points. (e.g. -30 degrees, 0 degrees, 30 degrees, 60 degrees, 90 degrees and 120 degrees) As long as these criteria are met, they pass.
So, if the thermometer was calibrated at the high end of the interval for the observations, readings will continue to be at the high end of the interval. If it was calibrated at the low end, it will continue to read at the low end.
The reference will read to a greater precision and accuracy, but during calibration the lesser instrument just has to read within its specified interval.
So, yes, the errors will indeed be in 1 direction, because the errors are due to bias, not random.
Without the calibration report (or performing our own checks of the instrument to a greater precision and accuracy), we don’t know where the instrument’s readings really sits within the calibration interval.
In addition, it is not unusual to read low at one end of the instrument’s measurement range and high at the other.
It is for reasons like these why it is incumbent upon the organization/ laboratory performing the measurements to do a complete and honest uncertainty analysis of its own measurement system that takes such factors into account.
A laboratory that has received third-party accreditation under ISO 17025 for whatever measurements it might perform is required to do so and report the uncertainty as X ± k*u_c(X).
Yes, I’m sure calibration can have its own errors, and these would then pass through the averaging process for one station over time. But for global temperature, there are over 9000 stations reporting every month. What is the chance that their calibration errors will all line up?
I’m still trying to keep this focused, or we run the risk of going all over the shop.
The distribution of calibration differences between stations is interesting in and of itself, but perhaps later.
At this stage, I’m trying to limit this to the resolution bounds for any one specific station.
Keeping it specific, then expanding to the more general.
<pedantry>the calibration doesn’t have errors, but it has limits of resolution</pedantry>
I know what you mean, but it’s a definition thing.
Well, to quote your post to which I responded:
‘So, yes, the errors will indeed be in 1 direction, because the errors are due to bias, not random.”
touche
Perhaps you should read what he said.
OC: “At this stage, I’m trying to limit this to the resolution bounds for any one specific station.”
It was a fair cop. I was sloppy with the wording, and Nick spotted it.
It’s always good to have somebody else proof-read, because we tend to miss our own errors.
Nick likes to nitpick.
It’s useful to have somebody point out the errors, or they propagate.
Actually, I think both your and my use of “error” was no error. Try saying it with “uncertainty”.
Yes, we know you don’t believe uncertainty even exists, this is obvious.
Ahem, Stokes, uncertainty is still not error.
This must be beyond your comprehension skills.
I don’t think it’s conceptual skills so much as the different fields having slightly different definitions/concepts.
As an outsider to both (or possibly a dabbler in both), it’s possibly easier to pick up than for the practitioners who have decades of “everybody knows”..
The BOM only calibrates to +/- 0.3C. Drift after calibration will increase this interval.
What is the chance that the calibration errors will all totally cancel?
That’s why root-sum-square of the measurement uncertainties is done as opposed to direct addition, it assumes partial cancellation of uncertainty.
This is a study by Hubbard and Lin about MMTS uncertainty.
https://journals.ametsoc.org/configurable/content/journals$002fatot$002f21$002f10$002f1520-0426_2004_021_1590_atcbtm_2_0_co_2.xml?t:ac=journals%24002fatot%24002f21%24002f10%24002f1520-0426_2004_021_1590_atcbtm_2_0_co_2.xml&t:ac=journals%24002fatot%24002f21%24002f10%24002f1520-0426_2004_021_159
I hope this link opens.
I wonder what the calibration range was, and the bounds.
Reading 0.3K low at lower temos and 0.4K high at higher temps would have failed calibration for 1 degree F resolution.
Oops – reading high at lower temps and low at higher temps.
Engage brain before clicking “Post Comment”
This is really two questions. 1. How often are the stations calibrated and 2. are they actually replaced or adjusted if they fail calibration?
Even with the newer stations it’s not just a tweak of a variable resistor to calibrate it. The calibration is not always linear over the entire reading range so tweaking at one temperature can actually make the uncertainty worse overall.
Systematic bias, non-linearity of the sensor, and hysteresis (reading different coming down and going up).
Even the best thermistor temperature sensors suffer from these let alone the entire measuring device.
It’s why measurement uncertainty simply can’t be ignored and the standard deviation of the stated values used as the uncertainty of the mean.
Your example violates significant digit rules for measurements. The rule I learned was that measurements, even after averaging can never exceed the resolution to which the measurements were taken. Statistics can not add information to the resolution, it is impossible.
Yes, I know. The initial figures were taken from the link bdgwx provided, and I followed their lead on the use of significant figures.
Even on that basis, the mean Tmax comes close to 80 +/ 0.5.
That is what I have been saying for months. After seeing Example E2, my asking for variance and standard deviations is not unreasonable either.
An average is a statistical descriptor of the central tendency of a distribution. It is meaningless without a variance and/or a standard deviation. If it differs from a Gaussian, the variance and SD is basically meaningless since you can no longer know the percent of data in the intervals quoted.
My point. If I tell you I got an average of 50C, what was the range of temperatures used to calculate it? Even if there was no error, can you tell me the range?
“since you can no longer know the percent of data in the intervals quoted.”
That’s what the 5-number descriptors can tell you – or use a box plot to show it graphically.
But then, like you say, you can no longer use the standard deviations of the sample means as an uncertainty measure!
Old Cocky said: “So, the population mean Tmax to tenths of a degree F is somewhere in the range of 79.6 to 80.4 degrees F.”
Not according to the GUM. Following the procedure in section 5 you must convert the ±0.5 F resolution uncertainty into a standard uncertainty and then plug it into equation 10. Resolution uncertainties are rectangular so that means u = 0.5 / sqrt(3) = 0.29. Then using equation 10 we have u(Tmax_avg) = 0.29 / sqrt(31) = 0.05.
Old Cocky said: “Now, doing the same for the lower bounds, we subtract 15 (or 16 if you prefer) from the total. That gives a sum of 2467 and a mean of 79.6.
Repeating for the upper bounds, we get a sum of 2497 and mean of 80.5.”
That would only occur if the resolution error were the same for all 31 measurements. Resolution errors are random. 0.0 is just as likely as -0.5, -0.1, +0.1, or +0.5. It is a rectangular distribution. It is insanely unlikely that all 31 measurements would result in either -0.5 or +0.5 of resolution error.
Another goofy declaration that highlights your lack of real knowledge.
And you STILL don’t understand that uncertainty is NOT error!
Oh look, over the space of a few hours, the air temperature uncertainty has increased from 40 mK to 50 mK.
Keep going, eventually you’ll start to get some reasonable numbers.
I’ll have to give that further consideration, and I had too many meetings today to be able to think.
I’m not avoiding this, but the rest of this week looks like being too full-on to give it due consideration. It will probably be this weekend before that’s possible, but I will post something. The thread will probably be dead by then, but such is life.
No problem. These discussions will pop up in other articles so it’s not like there won’t be other opportunities to continue. It sounds like Geoff is going to do an article on the bootstrapping method which should be another interesting one.
Yes, the bootstrapping article and ensuing discussions should be quite interesting. It’s not something I have experience with, so a great learning opportunity.
I must be feeling a bit maudlin this evening, because I can’t help thinking that despite the differences of opinion and occasional testiness, everybody who is still involved in this discussion could sit around the barbecue on a summer afternoon with beer in hand and swap war stories for hours.
Maybe steer clear of politics, religion and uncertainty once we’d had a couple, but that’s just how it rolls.
Absolutely. It is a sentiment I’ve shared on here a few times as well myself.
My goodness, that publication is one for the Geek’s Geek. I don’t know whether to thank or curse this discussion. You could get lost in there for ages 🙂
I’m not sure that section 5.1 applies in the case of E2. It seems to be for the case of repeated measurements of the same thing, and
E2 covers how well a sample mean estimates the population mean. In the case of E2 the mean daily Tmax for the month is a calculation rather than a measurand. Tmax for the month, is a measurand sure, but not the average daily Tmax.
Each day’s Tmax in E2 is a one-shot (and measurement uncertainties were assumed away). If we had 5 thermometers at the same site, then yeah, the combined uncertainty of the 5 would be applicable to each of the daily Tmax figures.
I’m pretty sure the resolution bounds of the mean are the same as the resolution bounds of the individual elements (the significant digits rule) because those are an envelope within which the mean lies. Statisticians are sometimes a bit naughty in regard to introducing spurious precision, just like with writing 1/4 as 0.25. 1/4 has a resolution of +/- 1/8, whereas 0.25 implicitly has a resolution of +/- 0.005.
“E2 covers how well a sample mean estimates the population mean. “
Herein lies the main problem with uncertainty of the GAT.
from one of the uncertainty documents I have:
“If repeated measurements are made of the same quantity,
statistical procedures can be used to determine the uncertainties
in the measurement process. This type of statistical analysis
provides uncertainties which are determined from the data
themselves without requiring further estimates. The important
variables in such analyses are the mean, the standard deviation
and the standard uncertainty of the mean (also referred to as
the standard deviation of the mean or the standard error of
the mean).”
The climate alarmists want to treat the entire temperature record as repeated measurements of the same quantity. This lets them use the standard deviation of the mean (i.e. the standard error of the mean) as the smallest possible uncertainty even though it actually has nothing whatsoever to do with the actual accuracy of the calculated mean.
Old school metrology users tend to follow John Taylor’s prescription for calculating uncertainty:
y = Σx(i) / n
u(y)/y = sqrt[ ((u(x1)/x)^2 + … + (u(xn)/xn)^2 + (u(n)/n)^2 ] ==>
u(y)/y = sqrt[ ((u(x1)/x)^2 + … + (u(xn)/xn)^2 ]
This is the only way that uncertainty makes physical sense. Even in a situation where you have multiple measurements of the same thing where there exists both random and systematic somponents to the uncertianty then
(1) ẟx_ran = σ_ẋ where σ_ẋ is the standard deviation of the mean
sdom = σ_ẋ = σ_x/sqrt(N)
(2) ẟx_sys = systematic uncertainty
then total uncertainty is
ẟx_total = sqrt[ (ẟx_ran)^2 + (ẟx_sys)^2) ]
The climate alarmists always want to assume that ẟx_sys = 0 so they can just use the sdom as the uncertainty.
I’ll leave you with what John Taylor says in his book on uncertainty:
“The average uncertainty of the individual measurements x1, x2, …, xn is given by the standard deviation or SD:
σ_x = sqrt[ (1/(N-1)) Σ(x_i – ẋ)^2 ]” (bolding mine, tg)
This is the *sample* standard deviation. Replace (N-1) with N for the population standard deviation.
I can’t emphasize enough that this is the *AVERAGE UNCERTAINTY OF THE INDIVIDUAL MEASUREMENTS”. It is *not* the uncertainty of the mean as so many want to believe.
The total uncertainty is defined as above (see John Taylor) as the sum of the uncertainties of the individual components.
Total uncertainty is what must be considered when designing a bridge made up of individual component beams making up trusses, when designing a beam to span a foundation in a house, or when ordering crankshaft journal bushings in an engine.
Total uncertainty is what should be specified for the GAT, not the average uncertainty of the individual stated values of the temperatures. That’s the big takeaway from E2 – the assumptions that there is no individual element uncertainties to consider so the stated values themselves determine the accuracy of the calculated mean. That’s an impossible assumption to meet in the field, it can only happen in a hypothetical situation.
There seems to be a pure/applied conceptual gap. In this case mathematicians / metrologists.
One of the pitfalls of gaining deep expertise in a field is that the foundational bases of that field (what “everybody knows”) become so deeply embedded that it isn’t possible to even realise they exist, let alone question them.
One of the strengths of cross-disciplinary teams is the ability to cross (or work around) those boundaries.
I don’t agree that this is a pure/applied conceptual gap, i.e. mathematicians/metrologists. It’s not even a theoretical/practical gap.
Math exists to describe the real world. Even the old E=mc^2 describes the real world. So does quantum mechanics, right down to describing how semi-conductor junctions work.
Statistics are just one more form of math used to describe the real world. Insofar as that math is misused it no longer describes the real world.
I see the problem as so many math “people’, including climate scientists, who no longer have any grasp of the real world. There are no repercussions to “mathematicians” that make simplifying assumptions that make no sense in the real world. Assumptions like “all measurement uncertainty cancels” or “the average uncertainty is the uncertainty of the average” or “multiple measurements of different things can be handled the same way as multiple measurements of the same thing”.
None of those defending the GAT as an accurate metric seem to have ever had to order new journal bushings for an internal combustion engine, built a foundation-spanning beam from a number of shorter boards, or even done something so simple as building a stud wall that wouldn’t leave ripples in the ceiling dry wall. Pete forbid they should ever design something used by the public such as a bridge!
I often wonder how many of them have even heard of the π theorem or have actually studied variable heat flow (a whole chapter in my 1942 textbook “Intro to Heat Transfer).
You can’t get much more of a Pure vs. Applied gap than that 🙂
Statistics is somewhere between the two worlds, and has its own sins.
This may be one of those foundational bases of the metrologists – something which just is.
Some fields of pure mathematics, way above my pay grade, are purely theoretical, with no apparent connection to the real world.
It may turn out later that the do relate to, and can describe, the real world, but the practitioners of the field are involved for the pure intellectual challenge.
Statistics, as a field, certainly exists to describe the real world, but I would argue that it’s a field which uses mathematics rather than a field of mathematics. And perhaps that’s one of my foundational assumptions.
“There seems to be a pure/applied conceptual gap. In this case mathematicians / metrologists.”
So the guru now is John R Taylor, practical man and metrologist? Or is he? True, he was a professor in Physics at Colorado, publishing on stuff like
“A Note on Integrals Involving Pairs of Confluent Hypergeometric Functions,”
He has a Mathematics degree from Cambridge, and his Cambridge PhD was on “Aspects of S matrix theory”.
Where did I say that?
Not you but from the comment you responded to:
“Old school metrology users tend to follow John Taylor’s prescription for calculating uncertainty:”
Fair enough, but we should keep the attributions correct or it becomes even more confusing.
I make no claim to having worked in that field, so will have to leave your point about Taylor to Tim.
Even Possolo agrees with Taylor:
“The measurement model, V = πR^2H, expresses the output quantity V as a function of the two input quantities, R and H, whose values are surrounded by uncertainty. If, for the purposes of uncertainty evaluation, both R and H are modeled as random variables, then V will also be a random variable and the problem of evaluating its uncertainty can be solved either by characterizing its probability distribution fully, or, at a minimum, by computing its standard deviation.
We’ll do both under the assumption that R and H are independent random variables, and that both have Gaussian distributions centered at their measured values, with standard deviations equal to their standard uncertainties. (bolding mine, tg -> Point 1)
Gauss’s formula [Possolo and Iyer, 2017, VII.A.2], which is used
in the Guide to the expression of uncertainty in measurement (GUM) [JCGM 100:2008], provides a practicable alternative that will produce a particularly simple approximation to the standard deviation of the output quantity because it is a product of powers of the input quantities: V = πR^2H. The approximation is this
(u(V)/V)^2 ≈ (2 x u(R)/R)^2 + (1 x u(H)/H)^2
Note that π does not figure in this formula because it has no uncertainty,(bolding mine, tg -> Point 2) and that the “2” and the “1” that appear as multipliers on the right-hand side are the exponents of R and H in the formula for the volume. The approximation is likely to be good when the relative uncertainties, u(R)/R and u(H)/H, are small — say, less than 10% —, as they are in this case. Therefore
u(V) ≈ 7204m^3 * sqrt[ (2 x 0.03m/8.40m)^2 + (0.07m/32.50m)^2 ] = 54m^3″
Point 1: Even Possolo has to make the assumption that these are multiple measurements of the same thing whose stated values form a Gaussian distribution and that measurement uncertainty cancels and the standard deviation of the stated values is the total uncertainty. These assumptions simply do not apply to the case of multiple measurements of different things.
Point 2: Constants do not figure in the uncertainty formula. So when figuring the uncertainty of an average the constant “N” does not figure in the uncertainty of the average.
if y = ΣX_i / N then the uncertainty of y is
[u(y)/y]^2 = [u(X1)/X1]^2 + … + [ u(Xn)/Xn]^2
Average uncertainty is not uncertainty of the average. Standard deviation of the sample means is not the uncertainty of the average.
They *ALL* say this, Taylor, Bevington, Possolo, etc.
Now tell me how they are *all* wrong and you are right!
“So the guru now is John R Taylor, practical man and metrologist?”
Everything in Taylor’s book comports with Bevington’s book, Hoel’s book, and the GUM. Taylor is the only one, however, that focuses on both situations, multiple measurements of the same thing and multiple measurements of different things.
Both Taylor and Bevington differentiate between average uncertainty and uncertainty of the average.
Bevington states: “It is important to realize that the standard deviation of the data does not decrease with repeated measurement; it just becomes better determined. On the other hand, the standard deviation of the mean decreases as the square root of the number of measurements, indicating the improvement in our ability to estimate the mean of the distribution.”
Hoel states: “This theorem shows how the precision of a sample mean for estimating the population mean increases as the sample size is increased.”
It is the standard deviation (i.e. the total uncertainty) of the DATA that determines the accuracy of the mean, not the standard deviation of the mean. Everyone seems to agree with this, not just Taylor. And it is the *accuracy* of the mean that is important, especially for practical purposes. It *should* be as important for theoretical purposes as well but apparently it is isn’t for climate science.
Dr. Taylor has a very good book that is an introduction to uncertainty. He is not the only writer about uncertainty in MEASUREMENTS. Like it or not, Dr. Frank has a good grasp of this subject. Lastly, Dr. Possolo has good documents issued under NIST’s umbrella.
I refer you to TN1900, E2.
Why do you think Dr. Possolo used an expanded uncertainty calculation in E2?
Why do climate scientists and folks like you do not do this for GAT?
What is the expanded uncertainty for the GAT? Show your calculations from Tavg to the final average.
Better yet, have you any idea why E2 reduced the precision of the average of measurements from two decimal places to one? I’ll betyou do not.
You haven’t learned anything over the last several months have you? What do you think RSS is used for? What assumptions must you make to use it?
doesn’t understand the uncertainty of a time series.
“Despite all your huffing, the fact is that NIST did exactly as climate scientists do”
As I show in a different post you are 100% correct in this. They take a non-Gaussian distribution and convert it into one. Then they can assume all individual measurement uncertainty cancels and the standard deviation of the stated values is the uncertainty.
“They took 22 readings of temperatures on different days in May, averaged them, and calculated the standard uncertainty (their words) of the mean as σ/(sqrt(22).”
Yep, and every assumption they made was to turn the measurements into measurements of the same thing!
This still doesn’t improve the precision of the mean, it just says “the mean of the sample is probably within this much of the mean of the population.
IOW, you can’t take a series measured to tenths of a degree and use the LLN to claim a mean to hundredths or thousandths of a degree of precision.
You are exactly correct.
The uncertainty of the mean is an interval around the sample mean (standard deviation of the sample mean (SEM)) within which the true population mean may lay.
The sample data has no better resolution than the original data. The sample mean should have no more resolution than the data used to calculate it. The SEM is not a definition of how precise the, how many digits of resolution, that the mean actually is.
In addition it is not ethical to promote a number beyond what resolution you have actually measured. My lab professors would have failed any work in which I did something like that. Think about how following standard lab practices would affect the use of anomalies.
Here is a document from the Government describing the use of Significant Digits.
“This still doesn’t improve the precision of the mean”
The precision of the mean of 1 reading, and the uncertainty, is determined by s, the experimental standard deviation of the readings. It is just a multiple depending on whether you want to talk about 95%, 97.5% etc. And as they say here, the esd of N readings is s/sqrt(N). Now their N=22 reduces uncertainty of mean by a factor of about 5, which isn’t huge. But you don’t have to stop at 22.
The mean of 1 reading is the reading itself 🙂
Sorry, I couldn’t help myself.
Exactly, but it has a precision, which is also the precision of the reading. As you add more readings, the standard uncertainty of the mean, in the words of E2, reduces.
Not when you apply a coverage factor!
From E2:
23.6 ◦C to 27.6 ◦C = 4°C
Note the last sentence. You reckon the the monthly distribution is normal?
The last paragraph nsays:
“A coverage interval may also be built that does not depend on the assumption that the data are like a sample from a Gaussian distribution. The procedure developed by Frank Wilcoxon in 1945 produces an interval ranging from 23.6 °C to 27.6 °C (Wilcoxon, 1945; Hollander and Wolfe, 1999). The wider interval is the price one pays for no longer relying on any specific assumption about the distribution of the data.”
Assuming normal, the interval is 23.8 °C to 27.4 °C, width 3.6°C. The wider interval is 4°C.
FALSE!
Air temperatures DO NOT qualify, regardless of how many times you gaslight that they do.
The sample size is ONE!
NONSENSE!
The uncertainty is to be propagated from a proper analysis of the measurement system that generated the reading!
And how can ONE reading also be multiple readings?
Are you really this dense, or is it all a trolling act?
1 reading, 1 reading! You want to explain how you get an experimental standard deviation from a distribution of 1 thing? How do you calculate “s”?
(X – Xbar) = 0 –> (s =0). In other words Xbar = X.
You are showing that you have never had a physical lab class and maybe no calculus based physical science class like physics.
Come on dude, quit looking for things that might support your position without any knowledge of the assumptions behind it.
If you had read the example E2 with understanding, you would know that the 0.8 needs to be expanded by a factor to give a coverage. Funny it’s not close to the uncertainty usually quoted by climate science.
I also thought you calculated anomalies by month. Dividing by 30, 31, or 28 won’t change much will it? Wonder what 12 random variables each with an SD 4 will give for an overall SD?
“You are showing that you have never had a physical lab class and maybe no calculus based physical science class like physics.”
Well, I have but we are talking about what Dr Possolo says. And he probably could list those experiences too.
How do you calculate “s”?
He lays it out. It is the experimental sd – ie of the 22 day readings.
“Funny it’s not close to the uncertainty usually quoted by climate science.”
No, it is for 22 days. A global monthly average has many thousands.
Oh look, mr. “you moved the goalposts” ran away from answering all the hard questions again.
What a surprise.
E2 doesn’t list the uncertainties of the daily Tmax figures. In fact, they appear to have been measured to 1/4 of a degree and have spurious precision as presented. That’s one of the pitfalls of measuring by halving intervals and recording the values as decimals.
The implicit resolution bounds are +/- 0.125.
It looks very much like an example of calculating the experimental standard deviation, leaving out confounding factors. It would have been useful to have also used the full 31 daily maxima to calculate the population mean and s.d. to see just how good the fit was.
E2 is a section 4.2 evaluation. Had the uncertainties been known then section 5.1 could have been used instead. The result is all the same. That is the uncertainty scales as 1/sqrt(n).
The resolution limit of readings taken to 1/4 of a degree is implicitly +/- 1/8
Which is insignificant compared to the experimentally evaluated (type A section 4.2) numerator of 4.1 C. If you wanted to go the section 5 route the numerator would have been significantly less including that 1/8 C rectangular resolution uncertainty plus all of the other components BOM considered.
That wasn’t claiming it has a major effect, just that it exists and is a known unknown.
Ambiguity is a major contributor to misunderstandings.
The point is that section 4.2 evaluations are experimental and performed without any a priori knowledge of the uncertainty of the individual measurements within the experiment. It is the section 5 evaluations that you would use that 1/8 C rectangular uncertainty in addition to the other sources of uncertainty to compute the final total uncertainty.
More hypocrisy from the person who believes that non-random errors magically cancel. You don’t do real uncertainty propagation, and have never one in the past. This is glaringly obvious.
Why do you keep leaving out the G.3 section of the GUM. Do you think the NIST author didn’t know what he was doing?
“It looks very much like an example of calculating the experimental standard deviation, leaving out confounding factors”
They aren’t confounding. He uses the observed σ as an estimate for the sum of effects causing daily temperatures to differ, and this includes measurement error.
The resolution limits extend the range of the differences from the mean and hence increase the sample variance and standard deviation, but that makes the example more messy.
Nick,
You quote with approval “… the { ti } will be like a sample from a Gaussian distribution …”
What place does “like” have in this discussion?
Geoff S
Geoff,
“You quote with approval”
I am quoting the NIST Guide. NIST and GUM were supposed to be the metrology authorities that would put theoretical folks in their place.
They have modelled the t¡ process with a Gaussian, so I suppose they say like acknowledging the modelling. Another question is what place does “Gaussian” have? It really doesn’t matter.
“e_1 … en are modeled independent random variables with the same Gaussian distribution with mean 0 and a standard deviation σ”
In other words assume all the measurement error cancels!
“In these circumstances, the [t_i] will be like a sample from a Gaussian distribution with mean τ and a standard deviation σ (both unknown)”
Yep, just assume all uncertainty cancels and use the standard deviation of the stated values as the uncertainty!
Why would we expect anything else? That’s how climate science works!
It’s Saturday night here and I’d like to get a final version set by tomorrow evening, so I can have it to him for Monday morning. Might just be the thing to do while he has morning coffee or whatever.
First the answers to his questions.
1) How do you propose to combine the 100 measurements of the length of a board?
The Law of Large Numbers can be applied if the 100 measurements are of the same thing with the same device. This will allow an average to remove random errors and obtain a “true value”
However, measurements of 100 different things with different devices may not be use a simple average to remove “random errors” because the errors can no longer be considered random.
2) Is each measured value qualified with its own uncertainty, and do you have the uncertainty for each one of them?
Does this matter? We’re not so interested in getting the correct answer as we are determining what applications of the Law of Large Numbers are appropriate. Would it be sufficient to say we’re using a meter stick with mm gradations, and we are only recording +/-0.5 mm measurement error for each measurement?
3) What do you mean by 1 measurement of 100 boards? Do you mean the total length once you have laid them down in a long straight line, board after board, end to end, or do you mean something else?
One hundred separate boards of varying lengths, each measured one time with a different meter stick, and the length and a +/-0.5mm measurement error for that board recorded.
4) For case (3), how has the uncertainty been quantified, and what is this the uncertainty of?
A +/-0.5mm measurement error for each board?
And our own questions:
Is this a measurement model for the UM? y = sqrt(Σ[(xi-xm)^2), 1, N] / (N-1))
If we have a rectangle that is subdivided 4 times of equal area and we have temperatures t1, t2, t3, and t4 for those areas can the measurement model be y = (t1 + t2 + t3 + t4) / 4 such that y is the average temperature of the rectangle and u(y) is the uncertainty of that average.
JS said: “Is this a measurement model for the UM? y = sqrt(Σ[(xi-xm)^2), 1, N] / (N-1))”
I’m curious…where did you see that as an example measurement model?
It was is one of your posts.
4) I don’t think anyone here has asked about the measurement model y = sqrt(Σ[(xi-xm)^2), 1, N] / (N-1)) yet. That measurement model produces the sample standard deviation and thus u(y) would be the uncertainty of the sample standard deviation. I’m not sure what meaning that has. The crux of this discussion is the measurement model y = Σ[xi, 1, N] / N which is obviously a different thing.
I guess I misunderstood what you were going for with it. Do you have something else for me to put in there?
I wasn’t going for it. I’ve never seen anyone mention it as a possible measurement model before. I’m not saying it isn’t useful; just something I’ve never seen or thought of before. I can say that I can see that measurand having utility in the quantification of the homogeneity of a spatial field. In other words the bigger y is the further from equilibrium the spatial field is. And I see no reason why it couldn’t have an uncertainty u(y) just like any other measurand. So I’m not saying it isn’t useful. It just wasn’t something I’ve ever seen discussed on WUWT in this context before.
The only thing I’m not sure about is 2). I think he may be asking what functional relationship does this question presuppose. This is a difficult question without making a choice of what purpose the boards will be put to. It shows why uncertainty requires a functional relationship to adequately express the uncertainty in a calculated measurand.
To make it most similar to temperature I would approach it from the standpoint of creating a sales presentation describing the conglomeration of boards in board feet. That is average length, the variation of lengths to be expected, and the variance attributable to measurement uncertainty.
But isn’t our question “Is it appropriate to use the Law of Large Numbers in this case?”
We’ve got the “100 separate boards” scenario and the “100 measurements of one single board” scenario. “Is it appropriate to use the LLN in both cases?” is a question I definitely want to ask.
Asking “Is TAVG = (TMIN+TMAX) / 2 a measureand?”, or a a similar question was, I thought, the other basic question we had.
CAUTION—If NIST TN 1900 is any indication, the author (Possolo) does not understand that a time series of measurements does not qualify for employing sigma/root(n) as an uncertainty estimation.
Forgive me if I don’t tell the PhD in statistics, Chief Statistician of NIST he’s bollocks.
The meta-argument has always been that climate scientists don’t know their statistics and are applying them wrong. Are we now going to claim that statistics PhDs don’t know their statistics either?
As long as he understands what I’m asking about, I’ll accept his answer.
I’m not asking you to tell him such, only to realize that he may not have thought this through. This is an easy trap to fall into, even for people with lots of education. Generally, statistics education does not include uncertainty, which came from the realm of physical sciences and engineering.
“Generally, statistics education does not include uncertainty”
Dr Possolo, Chief Statistician, NIST, was commissioned by them to write the guide to evaluating uncertainty. He wrote the textbook.
NS said: “Dr Possolo, Chief Statistician, NIST, was commissioned by them to write the guide to evaluating uncertainty. He wrote the textbook.“
Interesting. Yet more averaging of different things and 1/sqrt(n) examples. Arsenic in Kudzu is obvious, but the other examples have relevancy as well. Anyway, I’ve download the pdf and I put it in my library for future reference. Thanks.
You are going to be surprised after a fast read of his book. You’ll note he did not divide the standard uncertainty by sqrt N, he actually multiplied by a coverage factor. The figures are surprisingly similar, in the range of 3.6 to 4, not 0.02.
Climatology use a coverage factor? Not a chance.
Note that GUM eq. E.6 makes it clear that even with n observation of the same variable (not the case for air temperature), non-random factors do not cancel and are not divided by root(n).
Here is more background to the E2 example, with lots of info about the weather stations.
The very first example he gives in his book have to do with measuring ONE thing, the volume of a tank.
The book says: “But which probability distribution? The answer depends on what is known about the sources of uncertainty listed above, and on how their contributions will have been combined into the reported margin of uncertainty. A common choice (but by no means the best in all cases) is to use a Gaussian distribution as the model that lends meaning to the margin of uncertainty.”
After a quick read it seems the examples in the book are associated with ONE thing: a sample of coal, a lab weight, the height of Mt. Everest, surveying a piece of property, the Hubble-Lemaître constant, etc.
His book is very much like Bevington’s book – it focuses on measurement uncertainty associated with multiple measurements of the same thing that can be defined with some limited, symmetric probability distributions. There is nothing in his book that covers how to combine multiple measurements of different things. As Bevington states: “If we make a measurement x_i of a quantity x, we expect our observation to approximate the quantity, but we do not expect the experimental data point to be exactly equal to the quantity. …… “As we make more and more measurements, a pattern will emerge from the data. Some of the measurements will be too large, some will be too small. On the average, however, we expect them to be distributed around the correct value, assuming we can neglect or correct for systematic errors.” (bolding mine, tg)
It might behoove you to actually read the subject matter you link to instead of just cherry-picking something you think will snow everyone into believing you.
Nothing in Possolo or Bevington supports how multiple measurements of different things are handled in climate science.
Especially the last sentence here!
I realize you know this already. But I should point out for everyone else’s benefit that Dr. Possolo is the author of NIST TN 1900 so he’s already effectively told us what he thinks. Specifically averaging different temperatures is okay (E2) and working with spatial fields of an intensive property (E35) is okay as well. Several of the other examples have relevancy as well. I seriously doubt you’re going to get a fundamentally different response regarding these topics. It might be worth mentioning again…let him see this whole article so he gets the full context.
I don’t think anyone has a problem with averaging one station’s tmax and tmin for a tavg. It’s the same site and the same instrument.
It’s the taking of multiple stations’ observations and combining them for an average that represents… what?
Besides, the LLN only improves the accuracy of the mean. Not the precision, correct?
“Besides, the LLN only improves the accuracy of the mean. Not the precision, correct?”
Sloganeers love thumping the table about this. But no, they usually shout it the other way around.
Stokes is now even more desperate to keep his jive alive.
JS said: “It’s the taking of multiple stations’ observations and combining them for an average that represents… what?”
The spatial domain. It is similar in principal to example E35 which performs kriging like models to a spatial domain. That is measurands that differ in the spatial domain can be aggregated and the uncertainty in the measurement model performing the aggregation.
The problem is that krigging basically assumes that measurement points are somehow related to other measurement points. Thus a homogeneity is assumed. This works well when you are trying to interpolate an underground mineral field which does not vary in time but only in space and even in space it varies slowly.
It works terribly for temperatures where the relationship between measurement stations is not homogenous and is time varying. It requires falling back on the old climate science assumptions that all temperatures everywhere are the same.
He cannot understand the difference because it would interfere with his cherished milli-Kelvin uncertainties.
Tim,
The way we used the Krige method was prefaced by the semivariogram that explored some possible influences of one value on another through the ‘range’. Geoff S
But you still had a time invariant field to work with. The spatial autocorrelation could be defined as to where it ended.
How do you do that spatially with temperature measurement stations?
More garbage piled on top of the same old garbage.
Tell us again how a temperature uncertainty can be 40mK!
JS said: “I don’t think anyone has a problem with averaging one station’s tmax and tmin for a tavg. It’s the same site and the same instrument.”
Yes, there are people that definitely have a problem with this. Not only do they think the methods and procedures defined in the GUM and implemented by the NIST UM are invalid for (Tmax + Tmin) / 2 at the same site/station, but they also think the whole concept of an average temperature is meaningless and useless at its core. There was a whole article dedicated to delegitimizing the very act of averaging temperatures and other intensive properties on WUWT just a few weeks ago. That’s how deep the rejection goes.
This is now beyond Monte Python crazy.
bdgwx,
This matter of claculating Tav from Tmax and Tmin is a bit of a sideline. IMHO, one loses traceability to a standard reference by that process and so changes the status of the measurand. Geoff S
I don’t think it is a sideline at all. In fact, I think it is spot on relevant to this article. A daily average temperature is not fundamentally different from a monthly or annual temperature. That is the primary reason why the long-term measurement statistic in the BOM report has lower uncertainty than the isolated and typical uncertainties.
Did you read example E2 at all? With understanding?
Why do you think the author applied a coverage factor to the standard uncertainty of the mean?
Is there much difference between 4.1C and 3.6C from the standpoint of standard deviation?
From:
https://physics.nist.gov/cuu/Uncertainty/coverage.html
When are you going to publish the calculation of standard deviation for the GAT?
You have an example here of one month. That should assist you in determining the values to be used in any given month. Remember subtracting a constant from the average temperature won’t change the original variance and standard deviations.
Of course not, he just picked out the parts he thinks adhere to his incorrect thinking.
Schist not granite!
“Yes, there are people that definitely have a problem with this.”
I DEFINITELY have a problem with this. Mid-range values are *NOT* an average temperature. Not when the daytime temp is a sinusoid and the nighttime temp is a decaying exponential.
This may have been the best that could be done in the past. It is *NOT* appropriate today when far better data is available.
The fact that you continue to defend it based on tradition just shows how out of touch you are with reality!
And he is WRONG!
He’s wrong, just like you and Nitpick Nick Stokes are wrong.
40 mK?? Do you seriously believe this is reality??
The Tavg = (Tmax + Tmin)/2 is a start but not the whole thing. This may be an appropriate equation in some circumstances. The problem needs to be set up similar to the E2 example but using temperatures from 100 different weather stations on the same days as his example. Such that what is the standard deviation of “averaging” 100 random variables.
That’s why I put in one of my answers that the problem with boards needs have reason for finding the average of all the boards, the standard deviation, and the combined uncertainty. In essence you are combining 100 random variables in an average. Each board should probably have a different uncertainty to make it more relatable to temperatures.
It may be worthwhile to simply say lets not use boards, lets extend the E2 example to 100 unrelated stations around the globe and see what he thinks the answer should be when averaging them all together. The data for those stations on the dates he used may even be available. We could probably provide them to him in a text file.
Jim,
The uncertainty in Tmax differs from the uncertainty of Tmin because they are triggered as observations by two different sets of provesses, each of which has its own errors and uncertainties. Like, a puff of hot air can trigger Tmax recording, but not Tmin recording. Geoff S
The sum of two random variables is just another random variable, with distribution the convolution of the two components.
Why don’t you convolve the temperature profiles then instead of just doing an inappropriate mid-range value?
How do you combine non-normal random variables? You can add the variance of random variables that are normally distributed but what do you do when they are skewed the way temperature data is?
How do you convolve summer temps in the NH with winter temps in the SH? Anomalies don’t help because the variance is different in the summer and winter. Neither distribution is Gaussian.
“Why don’t you convolve the temperature profiles”
What data is being referred to here?
TAVG is used because TMAX and TMIN are the data we have,
But for all these variables the statistics are determined empirically. You can do that just as well for TAVG directly, rather than building from its components.
“How do you combine non-normal random variables?”
Again, you have a fanciful notion of the significance of normality (or symmetry). These are general properties of distributions.
“Anomalies don’t help because the variance is different in the summer and winter. Neither distribution is Gaussian.”
None of those things matter.
Stokes has waved his hand, so mote it be.
“TAVG is used because TMAX and TMIN are the data we have,”
Not since 1980 if not before!
“But for all these variables the statistics are determined empirically. You can do that just as well for TAVG directly, rather than building from its components.”
The average temp for a sinusoidal daytime temp is 0.63Tmax. Why isn’t that combined with the average temp for an exponential decay at night?
If you don’t think the daytime temp is a sinusoid I can give you lots of empirical measurements showing differently.
Mid-range temps are a crutch based on tradition. Why doesn’t climate science switch to using degree-days based on integration? There is more than 30 years worth of data available to do that.
“Again, you have a fanciful notion of the significance of normality (or symmetry). These are general properties of distributions.”
I’ve shown you that temperatures are *NOT* normal or symmetrical. Both are requirements in order to assume that measurement uncertainty cancels and the standard deviation of the stated measurement values are the appropriate measurement of uncertainty.
“None of those things matter.”
Thus speaks the religious dogma!
” Both are requirements in order to assume that measurement uncertainty cancels and the standard deviation of the stated measurement values are the appropriate measurement of uncertainty.”
Just not true, and you never give anything to support it. The basic fact here is that if you add two random variables, you get a new rv whose distribution is the convolution of the components, and whose variance is the sum. There is no requirement for normality or symmetry there. The only time you make use of Gaussian properties is in converting statements about variance to p-values, or as called here coverage factors. But, as the finale to E2 shows, the dependence on Gaussian is small.
And, as the CLT says, and your UM experiments show, as you add many variables, you approach Gaussian anyway.
OPEN YOUR EYES and go read the GUM! Like E.3!
Oh the irony.
There is no requirement of symmetry or Gaussian in E.3.
From E3:
Exactly how does this mean, “no requirement of symmetry of Gaussian“?
If you are claiming that the distributions are not Gaussian then the following from E2 is appropriate.
“A coverage interval may also be built that does not depend on the assumption that the data are like a sample from a Gaussian distribution. The procedure developed by Frank Wilcoxon in 1945 produces an interval ranging from 23.6 ◦C to 27.6 ◦C”
That’s even higher than using Gaussian distributions.
“From E3:”
Weirder and weirder. That is not a quote from GUM E3. It is a quote from the UM documentation, setting up an example problem using inputs available on the machine. The complete para is
It’s an assumption for a particular example problem they are setting up. What GUM E3 does say, explicitly, is
E.3 is the much quoted propagation equation.
“If you are claiming that the distributions are not Gaussian”
It is your quote that says they need not be. As I said, you can work out the confidence intervals with or without an assumption of normality; you get answers either way, and very little different.
From E2:
You and your peeps were the ones that brought this document to the foreground. Now all you can do is either misquote it or simplly say the assumptions aren’t correct.
This is a perfect example of what needs to be done with monthly temperatures. I would suggest convolving monthly values of Tmax and Tmin along with adding their variances.
Let’s not forget that an SD for E2 translates to a variance of 16+. Not exactly a small value. I see no reason that all otherglobal stations shouldn’t have similar values which would upset the supposed “uncertainty” claimed by many CAGW adherents.
And what precedes that E2 quote?
You may assume Gaussian, but it isn’t required for the statements about variance. And continuing your part of the quote quote:
You can’t test it, but they proceed anyway. That is because the property isn’t required.
From E2:
“Assuming that the calibration uncertainty is negligible by comparison with the other uncertainty components, and that no other significant sources of uncertainty are in play, then the common end-point of several alternative analyses is a scaled and shifted Student’s t distribution as full characterization of the uncertainty associated with r.“
““Assuming that the calibration uncertainty is negligible by comparison with the other uncertainty components, and that no other significant sources of uncertainty are in play,”
In other words all measurement uncertainty disappears and the standard deviation of the stated values becomes the uncertainty.
Such a simple little assumption. Such a significant impact!
“Just not true, and you never give anything to support it. “
I gave you the quotes from the BOM report. You’ve been given the quotes from Ex E2. The fact that you are willfully blind as to what these quotes say is *your* problem and no one elses.
“The basic fact here is that if you add two random variables, you get a new rv whose distribution is the convolution of the components, and whose variance is the sum. There is no requirement for normality or symmetry there. “
ROFL! If the random variables are not Gaussian then there is *no* defined variance! How do you add variances if they don’t exist?
Why do you think that if you have a skewed distribution that the rule is to use the 5-number statistical descriptors and not mean/standard deviation?
“The only time you make use of Gaussian properties is in converting statements about variance to p-values, or as called here coverage factors. But, as the finale to E2 shows, the dependence on Gaussian is small.”
Again, if you don’t have a Gaussian distribution then how do you get a variance?
You are a typical climate science defender. All distributions are Gaussian and all measurement uncertainty cancels.
“I gave you the quotes from the BOM report. You’ve been given the quotes from Ex E2.”
There is nothing in the BOM report quotes that requires the distribution to be normal or symmetric. Nor is there in E2, except right at the end, where they use Student t (based on normal) to deduce coverage factor (p-value) from the variances, which had been reduced by averaging. But then they immediately go on to say that you can get a very similar coverage factor without assuming normality.
Nitpick Nick Stokes strikes again!
Ignores the main points about his false assumptions.
You were given the quotes from the BOM. There is nothing *requiring* the distributions to be normal or symmetric BUT THE ASSUMPTIONS IN THE REPORT ARE THAT THEY ARE!
There is a reason for that! Without those assumptions uncertainty doesn’t cancel, propagation of uncertainty from the individual elements onto the mean can’t be ignored, and the standard deviation of the stated values is *NOT* the uncertainty of the mean.
Again, if the distributions are not normal then there is *NO* variance! The statistical descriptors of mean, standard deviation, and variance apply to Gaussian (and a few other symmetric distributions) but not to skewed or otherwise non-Gaussian distributions. The 5-number statistical description is what should be used with non-normal distributions.
Why do you think that non-normal distributions have a variance? Do you have any support for that you can link to? Why does the 5-number statistical description not include a variance factor?
“You were given the quotes from the BOM. There is nothing *requiring* the distributions to be normal or symmetric BUT THE ASSUMPTIONS IN THE REPORT ARE THAT THEY ARE!”
That is why it is so hard to make progress here. You said
“I gave you the quotes from the BOM report. You’ve been given the quotes from Ex E2. The fact that you are willfully blind as to what these quotes say is *your* problem and no one elses.”
Now it turns out that they didn’t say it, but you somehow perceive it, and if I can’t see it I must be blind.
“Again, if the distributions are not normal then there is *NO* variance! The statistical descriptors of mean, standard deviation, and variance apply to Gaussian (and a few other symmetric distributions) but not to skewed or otherwise non-Gaussian distributions.”
Doubling down on that idiocy, even though I linked a table with all kinds of distributions and their variances. Variance of a distribution is just
∫ (x-m)²P(x) dx, where P(x) is the probability density function and m is the mean. If that integral exists, it is the variance. Nothing about Gaussian or symmetry.
Variance is a measure of the uncertainty of the stated values. The whole purpose of statistical descriptors is to provide an expected value for the next element in the data set. The wider the variance the more possible values the next element can take on and the higher the uncertainty is about what value could be expected for it.
Look at your table. The Poisson and exponential are associated with the time between events. So is the geometric. The binomial is associated with a discrete probability distribution.
None of these have anything to do with something like temperature which consists of different things being measured.
The Gaussian and uniform distributions are symmetric distributions that basically allow the assumption that all measurement uncertainty disappears.
While all of these distributions may have a variance they simply aren’t useful for handling measurement uncertainty. Which is why we see the assumption that all data sets are Gaussian in climate science.
“ROFL! If the random variables are not Gaussian then there is *no* defined variance! How do you add variances if they don’t exist?”
Absolute nonsense. Here is a list of variances for a variety of distributions, not all even symmetric.
Yet you still wave your hands and declare everything to be Gaussian, and assume all uncertainty cancels inside your magic averaging.
And which one of the distributions does your statistics software package use when calculating the variance?
Do you calculate it manually? What distribution do you use if it doesn’t fit one of these.
Here is a question/answer about non-normal samples. Do you do this to calculate variance in the GAT distributions?
https://stats.stackexchange.com/questions/316714/sampling-distribution-of-sample-variance-of-non-normal-iid-r-v-s
And which of those distributions can be applied to temperatures?
You made the idiotic claim that
“If the random variables are not Gaussian then there is *no* defined variance!”
In fact, variance is a general property of distributions. I showed a list of non-normal distributions with stated variances.
And I answered you about that.
No where in that list did I see a bi-modal or multi-modal distribution. Nowhere did I see a kurtosis or skewness.
There *IS* a reason for the use of the 5-number statistical descriptors. The 5-number statistical descriptors do *not* include mean, standard deviation, or variance. The 5-number descriptors along with box plots are the robust and most useful descriptors of skewed distributions.
Why are they never used in climate science? Is it because it’s too hard?
“No where in that list did I see a bi-modal or multi-modal distribution. Nowhere did I see a kurtosis or skewness.”
So it goes here. Thump the table with one rule. When that fails, just make up another one.
And full of ignorance. Even the normal distribution has kurtosis (3). Here are the moments of the exponential distribution (which is in that list):
I could chase up bimodal, but what’s the point? You’ll just make up a new rule.
Yes, the normal distribution has tails but they are symmetric. And don’t think anyone missed you not commenting about skewness.
You are trying to defend climate science driving simplification to the point where their statistical descriptors are truly useless in determining what is actually going on with the climate. You can’t even tell if it is minimum temps driving the car or if it is maximum temps.
In my book, Stokes and his minions are part of the gigantic fraud that is being perpetrated on the world, to spend hundreds of trillions of dollars that don’t exist, to solve a “problem” that doesn’t exit.
Nick,
Who defined them as random variables? They cannot be, because the fit within an observed range of values, if you like, something like minus 90 C and plus 60 C. One cannot as a program for a list of random variables that mean anything in relation to these temperatures without specifying a range, which constrains the values outside the category of.random vGeoff S
Geoff,
“Who defined them as random variables? They cannot be, because the fit within an observed range of values, if you like, something like minus 90 C and plus 60 C”
There is a lot of really basic stats wrong in this thread. I’m not sure whether your objection is that there is or isn’t a range, but neither is relevant. A dice throw is random, range 1-6. A normal distribution is quintessential random, but has no limiting range.
Not only is Stokes the world’s leading expert on trendology, he’s also right up at the top on statistics and measurement uncertainty.
But he likes (or needs) these milli-Kelvin air temperature uncertainties, something doesn’t jibe here…
A dice throw is *NOT* the same as the universe of temperatures. The probability distribution of a dice throw are limited and uniform. The probability distribution of temperatures is typically unknown because you and the climate scientists refuse to characterize the distribution by just assuming it is always Gaussian!
If I call up a bunch of computer-generated random numbers to simulate an exercise with temperature data like these, I routinely specify a range and often a distribution of choice. Are these numbers truly random?
GS said: “If I call up a bunch of computer-generated random numbers to simulate an exercise with temperature data like these, I routinely specify a range and often a distribution of choice. Are these numbers truly random?”
Yes. BTW…the NIST UM will do the simulation for you.
No, they are what is called pseudorandom.
For our purposes, they are. We aren’t writing CIA codes. And unlike TRNG’s, they can be seeded to pinpoint the actual output change resulting from a change to 1 or more inputs.
Who are “our” and “we”, blob?
Looks like you believe the outputs of the spaghetti machines have some kind of real meaning.
“spaghetti machines”
About the level of thoughtful, data based response that you have become known for…
blob strikes a deep irony vein…
So that’s what climate scientists have all been paid to do for the last 50 years? Trying to find the right seeding of random number series to find the correct combination that explains climate?
Lots of mindless deflection, even for you. I already told you one reason for seeding PRNG’s.
Another common reason is corporate politics. We often ran stochastic economic evaluations on a common app with our non operated partners. Some of them wanted to remake our runs themselves, to see if we did them correctly. It was not a problem and kept us all on the same pages.
They’re pseudo-random because they’re generated using an algorithm and will give the same result for the same seed.
I think bdgwx’s answer was intended to cover the question of the bounded range.
GS said: “Who defined them as random variables?”
The GUM.
Tmin and Tmax are random variables because their values are not known for certain. They have uncertainties u(Tmin) and u(Tmax) that are > 0. There is a dispersion of values that can be attributed to each measurand. That makes them random variables. All temperature observations are random variables.
You forgot the part that says “which is associated a probability distribution”
Everyone keeps asking you to define the probability distribution of temperatures by at least defining a variance and range but you refuse! So do most of the climate scientists. Instead you just assume it’s Gaussian without any actual proof that it is.
I’ve given you a picture (box plot and value graph) of what my minimum temps look like. They are *NOT* Gaussian. Even the NIST UM has to morph them into a Gaussian in order to calculate a mean and standard deviation – meaning the *actual* distribution must be ignored.
” Instead you just assume it’s Gaussian without any actual proof that it is. “
Nobody assumes that it is Gaussian, and there is no need to do so. That is your hangup.
Of course there is a need to do so. It’s what allows you to assume that all measurement uncertainty disappears and the standard deviation of the stated values is the uncertainty.
Of course in the real world the measurement uncertainty never disappears.
u(Tmax) may be different from u(Tmin) but u(Tavg) is still calculated the same way. GUM section 5 and the NIST UM still require u(Tmax) and u(Tmin) to be input separately.
Idiocy.
The INST UM *still* just calculates what you tell it. If your formula is (Tmax+Tmin)/2 the NIST calculates the AVERAGE UNCERTAINTY, not the uncertainty of the average. If u(Tmin) = .5 and u(Tmax) = .5 then the NIST comes up with .358 for the standard deviation. That is .707/2.
As you have already confirmed in another post
u(y) = sqrt[ u^2(Tmax) + u^2(Tmin) + u^2(N) ] = .707
The components of the average relationship are Tmax, Tmin, and N=2. You add the uncertainty of the components.
y = f(x_1, x_2, …, x_N)
where x_1 = Tmax. x_2 = Tmin, and x_N = 2
“u(y) = sqrt[ u^2(Tmax) + u^2(Tmin) + u^2(N) ] = .707”
Again, for at least the third time this thread, using the formula for sum, not average
No. “N” is not part of the sum. It is a component of the equation you use for average. The uncertainties of the components add. They may add directly or they may add by root-sum-square but they still add.
You keep trying to say the average uncertainty is the uncertainty of the average. It is *not*.
Final cut. It’s late here on the East Coast, but there’s still time for a final review after I submit this.
Here we go:
1) How do you propose to combine the 100 measurements of the length of a board?
Rather than boards, let’s consider an extension of Example 2, the surface temperatures example, from your “Simple Guide for Evaluating and Expressing the Uncertainty of NIST Measurement Results”.
In that example, the maximum temperature (TMAX) was recorded for 22 non-consecutive days of May 2012. The average TMAX for the month was calculated, as were the standard deviation s and the standard uncertainty associated with the average, sem. The values were 25.6C , 4.1C, and 0.782C.
If only every other TMAX measurement had been used, starting with the first one, the average would have been a little higher, but s and sem would have been a bit smaller. The values were 25.0C, 4.3C and 1.30C.
The Law of Large Numbers can be applied if the measurements are at the same place with the same device. It allows an average to remove random errors and obtain a “true value”.
However, measurements at different sites with different devices may not be able to use a simple average to remove “random errors” because the errors can no longer be considered random.
Is it still appropriate to use a large set of measurements, but from different sites, to reduce the standard uncertainty associated with that average?
2) Is each measured value qualified with its own uncertainty, and do you have the uncertainty for each one of them?
Since the station temperatures seem to be in increments of 0.25, we were taking +/-0.125C for the measurement uncertainty.
3/4) What do you mean by 1 measurement of 100 boards? Do you mean the total length once you have laid them down in a long straight line, board after board, end to end, or do you mean something else? For case (3), how has the uncertainty been quantified, and what is this the uncertainty of?
Since we changed our example to using temperatures, we’ll consider this to be OBE.
4) For case (3), how has the uncertainty been quantified, and what is this the uncertainty of?
A +/-0.5mm measurement error for each board?
And our own questions:
Is this a measurement model for the UM?
y = (TMAX + TMIN) /2
If we have a rectangle that is subdivided 4 times of equal area and we have temperatures t1, t2, t3, and t4 for those areas can the measurement model be y = (t1 + t2 + t3 + t4) / 4 such that y is the average temperature of the rectangle and u(y) is the uncertainty of that average.
That’s it, I’m beat. I’ll look at this again in the morning for any further suggestions before I send it out.
“Is it still appropriate to use a large set of measurements, but from different sites, to reduce the standard uncertainty associated with that average?”
Very unclear. Do you mean improve the average at the original site, or do you mean to create a new average for a region ?
“2) Is each measured value qualified with its own uncertainty, and do you have the uncertainty for each one of them?”
What is done in E2 is that the empirical sd (for the set of 22) is used as the uncertainty for each reading. It includes the measurement uncertainty.
“OBE”
Needs explaining.
“And our own questions:”
He’ll have trouble making sense of them
“Is it still appropriate to use a large set of measurements, but from different sites, to reduce the standard uncertainty associated with those measurements’ average?”
Is it still appropriate to use a larger set of measurements, but from thermometers at different geographic locations hundreds of miles apart, to reduce the standard uncertainty associated with the average of the measurements from all those locations?
“OBE”
Needs explaining.
Sorry, let some jargon sneak in. Means “Overcome By Events.” I’ll change it.
“And our own questions:”
He’ll have trouble making sense of them.
I’ll add some wordage to make them clearer.
You are mischaracterising what the author has done. He would have the average temperature for the MONTH characterized as:
25.6 ± 4.1 (SD)
25.6 ± 3.6 (Student’s T)
25.6 ± 4 (Wilcoxon, 1945; Hollander and Wolfe, 1999)
I’ve been asking you and your peeps to provide this kind of info for months and have been ignored. Now, thanks to whoever pointed out this document, it is laid bare in front of everyone’s eyes.
If individual stations are going to be treated as independent random variables then their standard deviations are going to be used and their variances added.
If anyone is still following this thread, I sent a response to Dr. Possolo’s email back on Monday, and never heard back. Oh well.
ce la vie!
Probably doesn’t want to get mixed up with “deniers” with appropriate questions about what is happening.
I’m glad someone posted the NIST analysis of temperature. It is obvious that NIST expects one to use either the standard deviation or expanded uncertainty by using a coverage factor. If monthly averages are what is used to calculate anomalies then that deviation should be carried through to annual averages.
That will far exceed the uncertainty that is currently being quoted for anomalies and will make the quotes much more believable.
Perhaps. With some trepidation I posted the link to our discussion, and it’s possible the site name turned off his interest in responding.
I shoulda stuck with the boards instead of moving to temps.
Still, reducing the uncertainty in the mean doesn’t improve the precision of the mean.
The mathematicians have been trained that with regular counting numbers, adding precision is not a problem. Rounding is done for convenience, not for purposes of indicating the amount of information inherent in a measurement.
If I measure voltage at three locations and get 110, 110, 100, I can’t find the average and say that the mean is 106.666. That number is not a measurement. I can’t find the sqrt(.5^2+.5^2+.5^2)/3 and say the uncertainty in measurement is 0.289.
I think the E2 example in the NIST document and Section 5 in the GUM pretty much covers why it is important to use the Standard Deviation or expanded SEM to convey the uncertainty in sample data.
“Still, reducing the uncertainty in the mean doesn’t improve the precision of the mean.”
Improving the precision of the mean doesn’t reduce the uncertainty of the mean either.