Guest Post by Willis Eschenbach (see Arctic/Antarctic update at the end)
Over in the comments at a post on a totally different subject, you’ll find a debate there about interpolation for areas where you have no data. Let me give a few examples, names left off.
•
Kriging is nothing more than a spatially weighted averaging process. Interpolated data will therefore show lower variance than the observations.
The idea that interpolation could be better than observation is absurd. You only know things that you measure.
•
I’m not saying that interpolation is better than observation. I’m saying interpolation using locality based approach is better than one that uses a global approach. Do you disagree?
•
I disagree, generally interpolation in the context of global temperature does not make things better. For surface datasets I have always preferred HadCRUT4 over others because it’s not interpolated.
Once you interpolate you are analysing a hybrid of data+model, not data. What you are analysing then takes on characteristics of the model as much as the data. Bad.
•
How do you estimate the value of empty grid cells without doing some kind of interpolation?
•
YOU DON’T! You tell the people what you *know*. You don’t make up what you don’t know and try to pass it off as the truth.
If you only know the temp for 85% of the globe then just say “our metric for 85% of the earth is such and such. We don’t have good data for the other 15% and can only guess at its metric value.”.
•
If you don’t have the measurements, then you cannot assume anything about the missing data. If you do, then you’re making things up.
Hmmm … folks who know me know that I prefer experiment to theory. So I thought I’d see if I could fill in empty data and get a better answer than leaving the empty data untouched. Here’s my experiment. I start with the CERES estimate of the average temperature 2000 – 2020.

Figure 1. CERES surface temperature average, 2000-2020
Note that the average temperature of the globe is 15.2°C, the land is 8.7°C, and the ocean is 17.7°C. Note also that you can see that the Andes mountains on the left side of upper South America are much cooler than the other South American Land.
Next, I punch out a chunk of the data. Figure 2 shows that result.

Figure 2. CERES surface temperature average with removed data, 2000-2020
Note that average global temperatures are now cooler with the missing data, with the globe at 14.6°C versus 15.2°C for the full data, a significant error of about 0.6°C. Land and sea temperatures are too low as well, by 1.3°C and 0.4°C respectively.
Next, I use a mathematical analysis to fill up the hole. Here’s that result:

Figure 3. CERES surface temperature average with patched data, 2000-2020
Note that the errors for land temperature, sea temperature, and global temperature have all gotten smaller. In particular, the land error has gone from 1.4°C to 0.1°C. The estimate for the ocean is warm in some areas, as can be seen in Figure 3. However, the global average ocean temperature is still better than just leaving the data out (0.1°C error rather than 0.4°C error).
My point here is simple. There are often times when you can use knowledge about the overall parameters of the system to improve the situation when you are missing data.
And how did I create the patch to fill in the missing data?
Well … I think I’ll leave that unspecified at this time, to be revealed later. Although I’m sure that the readers of WUWT will suss it out soon enough …
My best wishes to all,
w.
PS—To avoid the misunderstandings that are the bane of the intarwebs, PLEASE quote the exact words that you are discussing.
[UPDATE] A commenter below said:
The obvious problem that WE creates with his “punch out” is he chose the big chunk of equatorial solar heated region.
If he did that extraction to Antarctica (or Greenland interior), where we really do have very few spatial measurements, the opposite would occur, the the average Globe would dramatically warmer, the SH (NH) would warm even more, and the NH (SH) would be unaffected.
Here you go …



You can see the outlines of the patches, but overall, I’d say my method is working well.
Why does the kriging algorithm produce discontinuities at the patch boundaries?
Either (1) the kriging parameters need tweaking, or (2) a better interpolation algorithm should be used.
NOAA says the temperature sensors are accurate to +/- 0.5 F or about +/- 0.28 C. You can’t improve on that by increasing the sample size. Why is this never mentioned, or seemingly accounted for?
It’s worse than that for non-digital and pre-WW2 observations. And you’re right you can’t improve the uncertainty on individual measurements by increasing the sample size. But you can improve the uncertainty on the global mean temperature by increasing the sample size (with caveats). The uncertainty on post WW2 estimates of the global mean temperature are around ±0.05C (give or take a bit). See Lenssen 2019 and Rhode 2013 for details on two completely different different uncertainty analysis that arrive at the same conclusion.
No, you can’t do this. The uncertainty of the GAT is strictly dependent on the uncertainty of the data used and interpreted as root-sum-square.
No amount of averaging will ever eliminate this uncertainty. You can calculate the mean as precisely as you want, it will still have the root-sum-square uncertainty as the sum you use to calculate the mean.
Quite right. That’s why if you take the average of a million temperature readings each with an uncertainty of 0.5°C, you could well end up with figure that is out by ±500°C.
Uncertainty is NOT error.
Never said it was.
Uncertainty is a measure of the expected range of errors. If you are saying that the uncertainty of a measurement could be ±500°C even though the actual error could never be more than ±0.5°C, then what exactly do you mean by “uncertainty”?
The GUM defines uncertainty of measurement as
Do you thing the 500°C range could reasonably be attributed to the average of a million readings, given the uncertainty of each reading is only 0.5°C and say the range of the individual measurements was between -100 and +100°C?
Completely wrong and contradicted by your quote from the GUM. Uncertainty is a measure of the state of your knowledge about a measured quantity.
The GUM also tells you how to calculate uncertainty from Eq. 1; if you haven’t done this for this grand averaging of averages, then you don’t know what you don’t know. There is a whole lot more involved beyond your mindless division by the square root of n.
The definition I quoted defines measurement uncertainty in terms of the state of your knowledge. Uncertainty will always be about the state of your knowledge. If you knew the size of the error you would have perfect knowledge and no uncertainty.
My point was the use of the word “reasonably”. Do you think it reasonable that an average of a million thermometers, each with a measurement uncertainty of 0.5°C could be out by 500°C?
Regarding the GUM, it goes on to say that their definition is “not inconsistent” with other concepts of uncertainty such as
and
And my point is that your question is ill-formed (actually it is a loaded question). You need to do a formal uncertainty analysis before coming up with this absurd 500°C number.
That 500C number comes from Gorman’s insistence that the uncertainty of the mean follows RSS such that for an individual measurement uncertainty of ±0.5C and given 1 million measurements the final uncertainty of the mean is sqrt(0.5^2 * 1000000) = 500C. Does that sound like a plausible uncertainty to you?
Still no uncertainty analysis, I wonder why.
I did the analysis. The simulated uncertainty on the computed mean landed almost spot on to the 0.0005C value predicted by the formula σ^ = σ/sqrt(N) and miles away from the 500C that Gorman claims.
And I again ask…is 500C even remotely plausible? Does it even pass the sniff test? What if the mean were 50C…do you really think -450C would occur 16% of the time?
Again, once your uncertainty exceeds what you are trying to measure then why would you continue?
Think about the boards again. If your result is 10feet +/- 10feet then why add any more boards? You can’t have negative lengths! And your uncertainty is only going to continue growing!
Have you *ever* done any physical science or only academic exercises?
You just demonstrated your abject ignorance of what uncertainty is.
Attempting to educate you is quite pointless.
It is totally reasonable. Once your uncertainty exceeds what you are measuring it is time to stop. Once the uncertainty is more than what you are measuring then why would you continue to grow your data set?
It’s not my absurd claim, it’s what Tim Gorman claims. I’m saying he’s wrong and using the absurdity to refute the claim that uncertainty of a mean increases by the square root of the sample size.
Go argue with statisticians.
Because statisticians agree with me. So, as far as I can tell do metrologists. It’s a small clique here who don’t seem to understand what either are saying.
Now you are just lying.
Point me to a statistician who thinks the uncertainty of a mean increases as the sample size increases, and we can can discuss it.
I gave you a site from the EPA that says just that. I guess you didn’t bother to go look at, did you?
Standard propagation of uncertainty for random, independent measurements:
u_total^2 = u_x^2 + u_y^2 + correlation factor. Section 19.4.3, Table 2.
“I gave you a site from the EPA that says just that. I guess you didn’t bother to go look at, did you?”
Sorry, must have missed that one. I’ve just noticed your link below. I’ll check it out.
Brief look through and the table you gave me shows that the EPA agree with me, as I explain below.
Because most statisticians are physical scientists. Metrologists agree with me. Are you confusing metrology with meteorology?
No, I’m referring to all the metrology documents you keep chucking at me, who you think agree with you, only because you are incapable of following any of their formulae or examples.
We should we? We agree with them.
Sorry, “We should we?” should have said “Why should we?”
I’m not wrong. Think abut the climate models. Once their uncertainty exceeds +/- 2C (a little more than the anomaly they are trying to predict) then why continue? When the uncertainty exceeds what you are trying to analyze then what’s the purpose for continuing?
Again, trying to disprove that you are wrong by changing the subject. This has nothing to do with climate models, it’s entirely about your claim that uncertainty increases with sample size. That’s what I’m saying is wrong.
They’ve been told this again and again, yet refuse to see reality.
I gave bellman this site: https://www.epa.gov/sites/default/files/2015-05/documents/402-b-04-001c-19-final.pdf
Table 19.2 gives the first order uncertainty propagation formula.
It has nothing in it about dividing by N or the sqrt(N).
Probably because Table 19.2 isn’t talking about means. But you can derive the concept from the formula for the product, just as Taylor does, by making one of the measures as a constant.
If y is a constant, B, then u(y) = 0, and u(x,y) = 0, then that becomes.
You can also derive this from the Sums and Differences formula, where a and b are constants.
If b = 0, then this becomes
And if you want to see an example of dividing by the sqrt(N) look at equation (19.3), where they show how to calculate the standard error of the mean.
Then you could look at Example 19.1, where they, guess what, take an average of 10 values and given to 3 decimal places, and calculate the standard uncertainty as 0.0011.
I am very glad that did not have to take any university courses of which you were the instructor.
Feel free to explain where I’m wrong.
(Thanks the complement by the way. I feel honoured that someone mistakes me for a university lecturer.)
You’ve been told again and again yet refuse to even consider that you might be wrong. The reason is obviously that if you were to perform an honest analysis, it would show how these averages of averages have very little useful information.
What I’m told over and over again are assertions which are never backed up by evidence. The EPA document was meant to be statistical claiming that you never reduce uncertainty. I pointed out it didn’t make that claim and the standard formula they use implies that you do indeed have to multiply the uncertainty by a constant when the measure is multiplied by a constant. As always nobody explains why they think my reasoning is wrong, they just down vote and move on, only to repeat their mistakes again and again.
I do consider that I may be wrong. I’m not an expert, most of my statistical learning had fallen into obscurity, and many here who disagree seem to come with some experience of applying statistical methods to engineering. Which is why I keep asking for evidence that I might be wrong.
I keep pointing the Gormans to passages in their preferred sources that as far as I can tell directly contradict what they are saying. They never consider they might be wrong, but never constructively explain why my interpretation is wrong. There are only so many times you can have Tim Gorman insist that you never divide the uncertainty, or that you can only average dependent measures, or state a baseless equation as if it was a fact, before you begin to suspect that he is wrong, and just refuses to accept it.
So again, if you have any evidence of any kind that
a) uncertainties in a mean increase as sample size increases
b) you never scale an uncertainty when you scale the measure
c) that the calculation for the standard error of the mean, cannot be used with independent data
please let me know.
A total waste of time, request DENIED.
As I suspected, you won’t provide the evidence. Let me know if you change your mind.
110 pages, that is a ton of work.
Has anyone ever attempted a real uncertainty analysis of these GAT calculations? (Not that Bellman & Co would believe the results.)
Yes. I posted links to Lenssen 2019 and Rhode 2013. The former uses a bottom-up approach while the later uses a top-down approach via jackknife resampling. Despite wildly different techniques they both come to the same conclusion…the uncertainty is about ±0.05C. I then tested to see how much disagreement there was between monthly global mean temperature anomalies between HadCRUTv5, GISTEMP, BEST, and ERA. The disagreement is inline with expectations that the uncertainty is ±0.05C. This is remarkable considering that they all use wildly different methodologies and subsets of available data.
“You can calculate the mean as precisely as you want, it will still have the root-sum-square uncertainty as the sum you use to calculate the mean.”
You really think the uncertainty of the mean increases each time you add a new data point? Really?
Yes, for non-stationary data! If there is a trend, then every new data point changes the mean and standard deviation. Only for stationary data, where the changes are random, and tend to cancel, can the precision be improved by taking more measurements of the same thing with the same instrument!
Did you test that hypothesis with a monte carlo simulation?
Where the fixed population from which you are sampling?
I’m asking if you tested your hypothesis that the uncertainty of the mean increases as you increase the sample size when the samples represent different measurements, using different instruments, and/or when the data is non-stationary. You can test this quite easily with a monte carlo simulation. Have you done that?
I gave you the example of boards laid end-to-end. Why do you refuse to address that example? You don’t need a monte carlo simulation.
Nor is generating random numbers sufficient. They have to be independent. Something you don’t seem to understand.
If there is no fixed population, you don’t divide by the square root of N. This is Stats 101.
You don’t need a Monte Carlo simulation to demonstrate my claim!
If you’re claim is right then the results of the monte carlo simulation would confirm it. It doesn’t. I know because I actually did it. It doesn’t matter if the measurement is of the exact same thing, done by the same instrument, or whether the data is stationary. I even simulated both precision and accuracy uncertainty on each measurement. It just doesn’t matter. The uncertainty of the mean is always less than the uncertainty of the individual measurements whenever the measurement error is randomly distributed (it doesn’t even need to be normally distributed).
You and Bellman consistently misunderstand error and uncertainty. You CAN NOT reduce uncertainty through statistics. You can remove random error.
Take my word for it. Draw a normal distribution only make the line width 1/2 of distance between the horizontal graduations. That is uncertainty. You can not tell what the actual value is because it could be anything value inside the line. It is something you don’t know and can never know. There is no statistical analysis you can do to reduce the width of that line.
“You CAN NOT reduce uncertainty through statistics. You can remove random error.”
I’ve written about this in more detail elsewhere, but you seem to have a definition of uncertainty that is very limited and not in keeping with things like the GUM.
Uncertainty of measurement can come from random error, it can come from systematic error. You can reduce the uncertainty caused by random error using statistical techniques. You can also try to reduce the effects of systematic errors, by adjusting the measurement to correct the error.
“Draw a normal distribution only make the line width 1/2 of distance between the horizontal graduations. That is uncertainty. You can not tell what the actual value is because it could be anything value inside the line.”
Sorry, I’m not sure what your describing in the experiment.
But I don’t know why you keep repeating things like “you cannot tell what the actual value is”. Of course you cannot and nobody is saying you can. What you can do is try to reduce the size of the uncertainty.
Are you intentionally not understanding what he’s saying?
I know what is being said and I’ve done the monte carlo simulation of it.
Yes, absolutely.
You refuse to answer my example using boards of random, independent measurements.
If the uncertainty in the overall length grows as you add boards then why do you think the uncertainty in the mean won’t do the same?
Again, if q=Bx then delta-q = delta-B + delta_x
Since B is a constant (be in 1/N or 1/sqrt(N)) then delta-B is zero.
delta-q = delta-x.
What is so hard about this? Why won’t you address how this contradicts your assertion concerning the mean? if delta-q grows then why doesn’t the uncertainty of the mean grow as well since it is dependent on the uncertainty of q?
You simply cannot decrease the uncertainty by increasing the uncertainty.
Random, independent measurements do not meet the requirements for being a Gaussian distribution. Therefore you can’t find a true value that is the mean. In fact, none of the units making up the universe may actually equal the mean.
Neither is uncertainty a probability distribution. Uncertainty is, therefore, not amenable to statistical analysis.
You are laboring under the misconception that statistics is a hammer and everything in the world is a nail. Independent, random data is not a nail, it is a screw. Learn it, love it, live it.
“Again, if q=Bx then delta-q = delta-B + delta_x”
Again, are you gong to provide a source for this formula, preferably one that is the same as it.
“If the uncertainty in the overall length grows as you add boards then why do you think the uncertainty in the mean won’t do the same?”
Because you use RSS when you place boards together end-to-end and want to know the final uncertainty of the combined length. You use SEM when you want to known the final uncertainty of the mean of boards. That’s what every statistics text including your own source (the GUM) say to do.
But not being satisfied with these texts alone I double checked this with a monte carlo simulation. Unsurprisingly the text are right and you are wrong. This is why I keep asking you to test your claims with monte carlo simulations. You’d see within a few minutes that your claims have no merit.
Honestly though, your claim that the uncertainty of the mean follows RSS is so absurd you shouldn’t even need to consult statistics text or do a monte carlo simulation to accept that it is absurd. It should be intuitive that it cannot be right.
If you use uncertain values to calculate a mean, then the mean will also be uncertain. Please show a reference that says otherwise.
Nobody is saying the mean won’t be uncertain. Just that it will be less uncertain the larger the sample size, and certainly not as you claim more uncertain.
The example in 4.4.3 shows how the mean reduces uncertainty. 20 temperature readings ranging from 96.90°C to 102.72°C, with a standard deviation of 1.49°C. But the standard uncertainty of the mean is 0.33°C.
Isn’t it funny to you how the temperatures can be precise out to two decimal places which would normally give an uncertainty of +/- 0.005 yet end up with an uncertainty of +/-0.33? How do you think that happened?
Are you criticizing the GUM? They actually calculate to 3dp then round to 2, and point out that “for further calculations, it is likely that all of the digits would be retained.”
GUM 7.2.6 says that you shouldn’t show uncertainty to an excessive number of digits, and at most 2 significant figures are generally sufficient, but says you may want more to avoid rounding errors in subsequent calculations.
“But you can improve the uncertainty on the global mean temperature by increasing the sample size (with caveats).”
A truth not yet grasped by a number of posters here who just don’t think it “feels” right. Hence the nonsense about how distributed data, with known distributions, expected values, and known correlation, can’t be evaluated together because it “Comes from different sources”.
FYI, most petroleum reservoir simulations use rock and fluid properties, each from a dozen or more “sources” – most of which have much larger error bands than the data under discussion here. And yet, somehow, the tool has become indispensable to hydrocarbon extraction.
Being useful is not the same thing as being high-precision. “All models are wrong. Some are useful.”
I agree, if you define “wrong” as having any error band at all. Got a point in there anywhere?
Yes I do and apparently you missed it. I don’t think it is worth my time to try to explain something you missed when it should be evident.
“I don’t think it is worth my time to try to explain something you missed when it should be evident.”
Or rather, you have no salient point, or one you would deign to defend, so you are covering it up with your pearl clutching “It’s SO obvious”
The statistical uncertainty can be improved on, but the instrument uncertainty never goes away, and that has to be included in the result.
Consider a very simple example of using ten samples vs. ten thousand samples. If each sample has a measurement error of +/- 0.5C, then the measurement error on the mean of those samples is sqrt(((0.5^2) X 10) / 10) = 0.5C (the square root of the sum of the squares of the measurement error of each sample divided by the number of samples). If there are ten thousand samples, the 10s in the equation are replaced by 10,000 and the resulting instrument uncertainty is the same.
It’s the same with the standard deviation. Each measurement has instrument uncertainty 0.5C. The instrument uncertainty in the mean is 0.5C. The instrument uncertainty of each measurement plus the instrument uncertainty of the mean is sqrt(0.5^2 + 0.5^2) = sqrt(0.5) = 0.7. Then it’s times N and then divided by N and finally the square root of 0.7 = 0.84C = 0.8C
Whatever standard deviation worked out for any number of temperature samples with instrument uncertainty of +/- 0.5C is going to have an instrument uncertainty of 0.8C. Since the temperatures have a precision of 0.1C, then the instrument uncertainty can’t be any more precise than that, and neither can any calculations using those measurements.
This might be true if you have random, dependent data. I.e. the ten samples and 10,000 samples are based on measuring the same thing. This results in a probability distribution of readings around the true value. Hopefully the distribution is Gaussian. In this case the uncertainty gets less by sample size.
When you have thousands of random, INDEPENDENT samples then the uncertainty of the mean is the root-sum-square of the uncertainty of the samples. No dividing by sample size to determine uncertainty.
The uncertainty propagation formula for random, independent data is
u_total^2 = u1^2 + … + un^2) + correlation factor (for random, independent measurements the correlation factor is zero.
The more samples you have the wider the uncertainty interval becomes. The uncertainty interval of the mean can become wider than the absolute value of the mean when enough random, independent samples are involved. In other words, you have no idea what the mean actually is.
Averages of these temperature measurements in some form is the sine qua non of temperature anomalies. If the root-sum-square of the instrument measurement error is all that’s used, that is only the error in the sum of the measurements. If the mean is taken, then just as the sum of the measurements was divided by N to get the mean, the sum-square has to be divided by N as well, and then the root taken.
I don’t agree. If q = Bx then the uncertainty of q is:
delta-q^2 = delta-B^2 + delta-x^2
Since delta-B is 0 (a constant has no uncertainty,
delta-q^2 = delta-x^2
It doesn’t matter if B is an integer or a fraction (i.e. 1/N), it contributes nothing to the uncertainty of q.
We keep going over this. Where do you get the equation delta-q^2 = delta-B^2 + delta-x^2?
As your authority, Taylor, shows the correct equation for when
, is
which implies
And he spells you what this means with examples. You seem to have a strong motivation not to be able to read or understand Taylor at this point.
Keep in mind, when you sum the uncertainties the uncertainty of the total increases. It is normal to assume that the errors are orthogonal, i.e. like a right triangle. To find the actual error you sum the squares then take the square root. The length of the hypotenuse is not reduced further in orthogonal calculations.
“…then the measurement error on the mean of those samples is sqrt(((0.5^2) X 10) / 10) = 0.5C”
No, it’s sqrt(0.5^2 X 10) / 10 = 0.5 / sqrt(10) ≈ 0.16°C, assuming the measurement errors are random and independent.
Where is your analysis starting with equation 1 of the GUM? I don’t see it.
These are instrument measurement errors. They are not random and independent. It’s the uncertainty in the accuracy of the measuring device. Instrument measurement error can’t be improved on. Your version indicates that you were able to use an instrument with known uncertainty of +/- 0.5C to suddenly measure with a +/- 0.16C accuracy.
If the mean of the measurements is the sum/N, then the uncertainty in the mean of the measurements has to be the sum of the squares of the uncertainties / N, and so the square root of the mean’s uncertainty has to be the square root of the sum of uncertainties / N.
If they’re not random then they’re biased. What is the bias on these measurements?
A bold prediction. This question will remain unanswered.
Miscalibration/drift of the thermometers, poor siting of the Stevens Screen, which may change with the season, dust removed from the Stevens Screen by rain, nearby urbanization or fluctuating reservoir levels nearby, upwind clear-cut logging, etc.
Let me be more clear…what is the magnitude of the bias in units of degrees C?
It depends on which and how many of the error sources are active for a particular station, and how large they are. However, I would guess that it could often be as much as one or two degrees in aggregate.
Irrelevant; a time series is not a fixed population.
If you can’t quantify the bias then how do you know it is there to begin with?
How many more non sequiturs can you jump to?
It’s a fair question. If the bias is 0C is there really a bias?
Why do you think a bias is definable except by calibration? A simple bias of 0.5 degrees could look like major cooling or could be a large percent of any supposed warming. That is a small value to judge from other stations when temps can vary by degrees among them.
Measurement bias refers to systematic or non-random error that occurs in the collection of data. It is not amenable to statistical analysis since it occurs in every measurement by the same amount.
In order to find an average, you add all the numbers together. When you add measurements the uncertainty of the total will accumulate by RSS.
“These are instrument measurement errors. They are not random and independent.”
If they aren’t independent your formula makes no sense. Where does the square root come from? The uncertainties coming from dependent errors can be propagated by adding the uncertainties, so the correct equation would be
(0.5 X 10) / 10 = 0.5
It’s the same result but without the needless squaring and unsquaring.
“Your version indicates that you were able to use an instrument with known uncertainty of +/- 0.5C to suddenly measure with a +/- 0.16C accuracy.”
Yes, that’s the advantage of averaging when you have independent observations.
“If the mean of the measurements is the sum/N, then the uncertainty in the mean of the measurements has to be the sum of the squares of the uncertainties / N”
No. If the measurements are independent, then the uncertainties of the sum of the measurements can use the root sum squared technique, meaning adding the squares of the uncertainty and taking the square root of the sum, which in your example comes to sqrt(0.5^2 X 10) = sqrt(10) X 0.5. Then you divide by 10 to get the uncertainty of the mean.
Nope. We’ve been over this.
if q = Bx then delta-q = delta-B + delta-x
Since B is a constant delta-B = 0. (1/N) is a constant)
delta-q = delta-x
No dividing the uncertainty in x (i.e. delta-x) by a constant to lower the uncertainty of the mean. The uncertainty of the mean is the same uncertainty as the final result – root-sum-square.
If you have two random, independent boards laid end-to-end their possible lengths range from b1 + b2 + (u_b1 + u_b2) to b1+b2 – (u_b1 + u_b2).
So the uncertainty becomes+/- (u_b1 + u_b2).
You don’t divide by 2 to lower the uncertainty of the mean.
Based on the above where q = Bx and the uncertainty is delta-q = delta-x the constant B =(1/N) has a zero uncertainty and neither increases or decreases the uncertainty.
In fact, the mean of independent, random measurements simply tells you nothing because it doesn’t create a probability distribution. Not one single sample is guaranteed to be the mean – which implies that the Gaussian distribution cannot be guaranteed. Therefore standard statistical analysis which assumes a Gaussian distribution simply doesn’t apply. *Assuming* a Gaussian distribution is nothing more than bias on the part of the mathematician or statistician.
“Nope. We’ve been over this”
Yes we keep going over this multiple times and you are still wrong. I gave you a snip from Taylor where he says you are wrong. You offer no justification for why you might be right, and what you say makes no sense. But as I like bashing my head against a brick wall I’ll try again.
You say:
“if q = Bx then delta-q = delta-B + delta-x”
This is wrong.
“Since B is a constant delta-B = 0. (1/N) is a constant”
This is correct.
“delta-q = delta-x”
This is wrong.
The correct statement should be, if q = Bx then
If B is a constant, then
, so this becomes
which implies
All of this is explained by your Taylor, in section 3.4.
Even for much older measurement methods, whatever tiny “uncertainty in the accuracy of the measuring device” is well known. And it has been taken care of long ago. The “issue” is almost exclusively that of independent measurement uncertainties, from parallax error, to typo’s to sloppy measurement by workers wanting to get home, and so on. Between spotting and jetting obviously bad data, and then applying the time honored, correct, statistical techniques described patiently and repeatedly by bdgwx, Bellman, and others, you have no complaint.
Not true. Even the newest Argo floats have an uncertainty of +/- 0.6C. The federal handbook on metrology only requires temperature measurement devices to to meet a +-/ 0.6C standard. That is *NOT* a tiny uncertainty. And it is one reason why reported temperatures are usually done to the nearest degree, even today.
“That is *NOT* a tiny uncertainty.”
In hundreds of floats, multiple observations/time period, over many decades, and with NO data showing that these uncertainties have any dependence*, it is indeed “tiny” uncertainty.
Why do you deny the N denominator, taught to you (I suppose) in statistics 101? Not rhetorical, why?
*FYI, positive dependencies might make a data group less accurate, which is why they are spotted and vetted. But, they actually tighten up the error bands. Negative dependencies are not part of this discussion, because the conditions that would cause them do not functionally exist.
Sampling a non-fixed population. Do you understand what this means? Apparently not.
You went one step too far. You are trying to find the uncertainty of the mean.
The uncertainty of the the mean only tells you the interval in which the true mean may lay. It does not tell tell you the uncertainty (precision) of the measured mean.
Think of uncertainty as how precise a measurement is. You simply can not increase the precision of any given measurement by averaging any number of samples. The uncertainty will remain or in other words the “precision” can not be decreased.
You need to display a logical explanation of why significant digit rules were developed. The logic needs to encompass both uncertainty and precision. You then need to show how overall precision can be increased through averaging a number of independent measurements of different things.
Keep in mind a “mean” and the statistics surrounding it only deals with where the mean lies within a given distribution and how closely you can calculate that mean. The data in a distribution is not affected by statistical analysis. That means any uncertainty and/or precision of the measurements remains unaffected.
What we are telling you is that the mean can carry no more precision than the measurements used to calculate it. Same with uncertainty.
“The uncertainty of the the mean only tells you the interval in which the true mean may lay. It does not tell tell you the uncertainty (precision) of the measured mean.”
You are going to have to run that by me again. If you know that the true mean probably lies within an interval, how is that not synonymous with the uncertainty?
“Think of uncertainty as how precise a measurement is. You simply can not increase the precision of any given measurement by averaging any number of samples.”
Except that all the books you keep pointing me to say you can as long as the measurements are random and independent.
“The uncertainty will remain or in other words the “precision” can not be decreased.”
See the “Graphical illustration of evaluating standard uncertainty” in the GUM, section 4.4.3, where they average 20 temperature readings. The standard deviation of the sample (i.e. the precision of each measurement) is 1.489°C, the standard uncertainty of the mean is 0.333°C. The uncertainty has been decreased.
You’ve got it backwards. You can decrease the uncertainty of the mean if the samples are DEPENDENT and random.
E.g. multiple measurements of the same thing which creates a typical Gaussian distribution around the true value.
You really don’t understand what independent means.
What you are describing are independent samples, measuring the same thing but getting random independent errors.
The assumption in all the talk of using root sum squared is based on the assumption of independent samples.
Not at all. As you have been told many times, this “reduction in uncertainty” is only applicable for random sampling a fixed population.
A time series of anything is not a fixed population.
We are sampling a fixed population, e.g. the temperature across the earth on a single day. But you can still take an average of a time series, e.g. daily values across a month, or annual values across multiple decades. The fixed population is the population across the time period.
No, you are not. As soon as a single nanosecond lapses, all the temperatures everywhere change.
This is what you think is a fixed population?
How do you think thermometers work? They are always averaging the recent past over more than a single nanosecond.
Doesn’t matter how thermometers “work”, temperature is continually changing, which is the issue you ran away from answering.
The issue you raised was that every nanosecond all thermometers change and therefore it’s never possible to take an average with any thermometer anywhere. I addressed that comment with the seriousness it deserved.
Arguing from Reductio ad absurdum, one might ask what the average taste of 10 apples and 10 oranges is. That is, the more dissimilar two things are, the less meaningful any statistical characterization is.
The ideal situation is where something has an unchanging value, such as the density of a cube of gold. If you attempt to determine what that is using different methods of measurement, or different samples of unknown purity, there is going to be greater uncertainty in the estimate for the density of ‘gold’ than if one relied on a single sample and the same technique multiple times.
That’s only a Reductio ad absurdum if anyone is suggesting all averages are equally meaningful. Otherwise it’s like arguing that the rules of addition are wrong because you cannot add two dogs to three hours,
In your gold example you will get a great deal of certainty by measuring a single sample multiple times and taking the average – but it will only be accurate for that one sample and technique. If you have reason to believe that different samples or different techniques will give different results, how do you know that your one sample is the true density of gold?
“how do you know that your one sample is the true density of gold”
You DON’T!
That’s the very definition of uncertainty! You also don’t know if other samples represent the true density of gold either!
Again, it’s the very definition of uncertainty!
So why limit yourself to one sample rather than multiple ones?
Yes!!! You are smarter than I have been giving you credit for. By extension, station temperatures are only as good as the confounding factors allow, for that station. Averaging many doesn’t improve them.
Bullsh!t, again.
And yet everyone who does a rigorous uncertainty analysis whether it be bottom-up or top-down comes to the same conclusion. That conclusion being that the post WW2 monthly global mean temperature uncertainty is about ±0.05C. And as always if you can provide a publication documenting a global mean temperature dataset with an accompanying rigorous uncertainty analysis that comes to a significantly different conclusion then now is the time to post it. Otherwise we have no choice but to form our position around what is available today.
Cite? Who is “everyone”?
Those that publish a global mean temperature dataset with accompanying uncertainty analysis.
Still no references.
Note: still no references.
You mean an uncertainty analysis by those who think independent. random measurements give a more accurate mean by dividing by N. ROFL!!!
The mistake many make is interpreting the “uncertainty of the sample mean” as also describing the precision of the mean. The uncertainty of the sample mean only describes the standard deviation of the sample distribution around the true mean. It does not affect the precision of the true mean. Significant digit rules determine the precision of the true mean!
What mistakes did Lenssen 2019 and Rhode 2013 make?
Awesome examples
It’s all in the assumptions. It’s painfully obvious in these things, such as Lenssen 2019 that every single choice in the assumptions has the cumulative effect of minimising or reducing reported uncertainty to nil. It’s a chronic illness in many scientific disciplines, particularly climatology, to vastly under-report uncertainty. It’s a race to the bottom with absolutely ludicrous abuse of statistical methods.
Various judgements where they cite cherry-picked references –
Judgement 1:
“We proceed on the assumption that the land and ocean uncertainties are independent. However, there is potentially correlation between the uncertainty due to the land calculation and the uncertainty due to the ocean calculation.” They just ignore that anyway and hide it in the text.
Judgement 2:
“Our assumption in this calculation is that the ERSST large ensemble is symmetric about the median for global and hemispheric means and that ERSSTv5 is the median value of the ensemble. Both of these assumptions are not perfect, but reasonable for these large-scale means.” Issue ignored with a judgement of reasonableness hidden in the text. It washes out in the mean, right?
Judgement 3:
“we are making the assumption that the arctic temperature is changing at a fixed multiple of the global average. This assumption is reasonable as model studies have shown that modeling the amplification trend linearly is a reasonable choice over recent decades”. Empirical analysis relying on a crude model for a critical factor. Judgement of reasonableness hidden in the text.
Judgement 4:
Sampling uncertainty “Note that this method assumes that our method of calculating the global mean does not have any systematic bias.” Oh that’s convenient. “the additive bias αk=0 for all decades as all of our grid box time series are mean zero”. Go figure. “We expect the limiting uncertainty to be greater than 0 as the smoothing arising from interpolation increases the uncertainty in the global mean.” VERY Interesting!
How does one then account for the cumulative effect of all these assumptions, and more, on the resulting uncertainty? What about all the things that are not mentioned? It’s a total judgement call in a race to be the one with lowest reported uncertainty at the cost of any resemblance of valid scientific method, truth, and honesty. On the flip side, maybe it’s just more convenient to make all these assumptions for simplicity of analysis – this is very common – but the result is the same – a total misrepresentation of the state of knowledge.
Take a step back, look out the window, forget about the number tricks and think to yourself, does an uncertainty of 0.05C for an global average make any meaningful physical sense? Or is it just an illusion borne out of the funny way that averages work out? It’s for you to judge, and depends a lot on your own philosophies.
respectfully,
JCM
That’s great. Can you demonstrate that these assumptions are false and that when handled to your satisfaction will significantly change the final uncertainty? Can you publish your findings so that they can reviewed for egregious mistakes?
yawn. https://www.youtube.com/watch?v=dGDbpg1nG8Y
Hurrah! Someone with some common sense!
A variation on Appeal to Authority logical fallacy. Look, it isn’t up to someone else to disprove an assertion/assumption. It is up to the person making the assertion/assumption to prove that it is reasonable, necessary, and true.
Look at the following image. The population is not normal, it is very skewed. That makes it hard to determine the mean. Sampling can be used to obtain a normal distribution with an easily determined mean.
The uncertainty of the mean can be made smaller by using a large number for the sample size and by taking a large number of samples. For example, a sample size of 50 and 1 million samples. The uncertainty of the sample mean IS the standard deviation of the sample distribution. That means you can make it more and more narrow and get a very small SD. By doing so you can get closer and closer to the true mean of the population.
However, that is not the precision of the value of the population mean. That precision is determined by analyzing the original data and using significant digit rules to determine the precision of the actual mean of the population.
If you do not understand this, you have no business dealing with measurement data. This is just one part of metrology. Uncertainty is another and obviously one you do not understand. Think about how measurement of multiple things with multiple devices all with different uncertainties can be combined to reach a lower uncertainty. You’ll find the uncertainty can not be lowered.
Forgot the image.
Your image is of 2 data sources, distributed differently. Yes, the square root N rule applies to normal distributions. But if you took increasing samples of both of these distributions, and averaged them, the standard deviation of those averages would ultimately fall, as the number of samples increased. No matter what combo of distributions you picked. Always.
It is NOT 2 data sources, can you read the captions? There is an original population that is very skewed. There is a second distribution called a sample distribution that is a normal distribution. The sample distribution is generated by sampling the population, calculating the mean of each sample, and plotting the frequency of each value of the mean.
The standard deviation of the sample distribution is called the “error of the sample mean” or “error of the mean” as a shorter version. It is the interval in which the population mean lays.
I haven’t even gotten into why you are dealing with “sampling” as a useful tool for determining a Global Average Temperature (GAT) or its uncertainty. The population of temperatures is all you have and all you will ever have. Taking samples of the entire population and finding a sample mean will give you nothing more than a simple calculation of the mean of the population data set. Think significant figure rules when doing this.
“It is NOT 2 data sources, can you read the captions? There is an original population that is very skewed. There is a second distribution called a sample distribution that is a normal distribution.”
Your first sentence is contradicted by your next 2. Would you like for me to call the data sources “distributions”? OK.
“I haven’t even gotten into why you are dealing with “sampling” as a useful tool for determining a Global Average Temperature (GAT) or its uncertainty.”
Because you can’t.
“The population of temperatures is all you have and all you will ever have.”
That, and the standard error of each data point.
“Taking samples of the entire population and finding a sample mean will give you nothing more than a simple calculation of the mean of the population data set. Think significant figure rules when doing this.”
Correct spatial interpolation of that population will not only give you an estimate of it’s mean, but also the uncertainty of that estimate. An uncertainty that INVARIABLY drops the more data you have. You continue with your hysterical blindness w.r.t. this tenet of statistics 101.
Again, where does the standard error of a data point originate from? How is it calculated?
“The uncertainty of the mean can be made smaller by using a large number for the sample size and by taking a large number of samples. For example, a sample size of 50 and 1 million samples.”
I’ve tried to explain this to you before, but you don;t normally take multiple samples, unless say you are running a Monte Carlo simulation. It would be very expensive to take a million samples, and if you did why wouldn’t you just use the 50 million individual samples to make one enormous sample? In any event, taking multiple samples does not mean you get closer and closer to the average, what does tend to move you closer is to use a larger sample size.
“However, that is not the precision of the value of the population mean.”
Yes it is, though I don;t think the term is usually used in statistics, what you are describing, the standard error of the mean is exactly the same as measurement precision. The smaller the SEM is the closer one sample of a set size would be to another. That determines precision, not necessarily trueness.
“That precision is determined by analyzing the original data and using significant digit rules to determine the precision of the actual mean of the population.”
I don;t understand what you are saying here. You don;t need to determine the actual precision of the population, it is fixed and will have zero uncertainty. Significant digit rules don’t determine the precision.
“If you do not understand this, you have no business dealing with measurement data.”
As I keep trying to point out, when you are dealing with the mean of a broad population, the uncertainties in the measurements are largely irrelevant. The uncertainty in the mean is mostly down to the randomness of the sample, even if you can assume a random sample (which is definitely not the case with global temperature estimates). After that a random half a degree error on each thermometer is vanishingly irrelevant.
“the standard error of the mean is exactly the same as measurement precision.”
NO!
If I lay three boards of random, independent length end-to-end how closely an you calculate their mean? You can calculate the mean no closer than the precision with which they are measured. If they are measured to +/- 1″ then your mean can be no more precise that +/- 1″. Trying to say that you can calculate it more precisely is only fooling yourself. First, that mean may or may not be an actual mean, none of the three boards may be of mean length. For random, independent measurements that’s true no matter how many measurements of independent, random items you make. You can measure 1000 people, average their height and find that not one single person in that group is of the same height as your calculated mean. Does that mean that the population does not meet the requirement for being a Gaussian distribution? If it isn’t then what does that imply for your statistical analysis?
Second, uncertainty *always* grows with independent, random measurements. You simply cannot calculate that away. The more random, independent measurements you include in the data set the more uncertain your final result will be. In that case, trying to say you can calculate the mean to the 10^-6 decimal place is foolish. It should be common sense that if you don’t know the final result with any certainty then how can you know the mean with arbitrary certainty?
” If they are measured to +/- 1″ then your mean can be no more precise that +/- 1″.”
Sure it can. Consider the lengths of these three boards are 0.5m, 2.5m, and 3.0m. The true mean is 2.0m. If I could measure the three boards with zero uncertainty, that;s what I’d get. But there are uncertainties in each measurement, so each measurement has an error, within the range of ±1cm. Assume these errors are random and uncertain, and to keep things simple say each board is measured either 1cm too short or 1cm too long, with a 50% chance of either. If I’m unlucky all three measurements go the same way, say 1cm too long. In that case my average is 2.01cm, the same error as the uncertainty in the original measurements. But there’s only a 25% chance that all three errors go the same way. Anything else gives me an error less than the individual errors. Hence there’s a 75% chance that the actual sum of errors will only be 1cm, which will then be divided by 3 when I make the average, making the error ±0.33cm.
Just by taking the average of 3 boards, of different lengths I’ve already greatly increased the chance that the error will be smaller than the ±1cm of each individual measurement.
Now if we increase the sample size, these probabilities multiply, and it becomes vanishingly small that say 100 random errors could all be +1cm, and much more likely that the average of the errors will be close to zero.
Of course, in this example I’m trying to follow your example, so we are only interested in the actual average of the 3 boards. In reality we are usually interested in the average of the population of the boards, and 3 boards will be a poor sample, especially if they differ by as much as those in my example.
“First, that mean may or may not be an actual mean, none of the three boards may be of mean length.”
There is absolutely no requirement that any of the boards be of average length. The fact you think so, suggests you don;t understand the idea of averaging. Though of course if we were talking about average global temperatures it would be inevitable that somewhere on the earth was the average temperature, by the intermediate value theorem.
“Second, uncertainty *always* grows with independent, random measurements.”
I perfect example of a precise but inaccurate statement.
You continue to mix up the criteria for errors versus uncertainty. I’ll say it again, uncertainty describes what you don’t know and can never know. Statistical analysis will not reduce it.
You can measure a board and say it is 3 feet long. However, if the uncertainty of your measuring device is +/- 0.5 feet, then you will never know what the actual length truly is. You can measure a million other boards but you can not reduce the uncertainty of that very first board you measured. Its real length is something you don’t know and can never know regardless of all the averages and statistical analysis you do with it and other boards. The uncertainty remains.
What’s sad is that you can not recognize that a mean that uses that measurement is also carrying that uncertainty. If you can not reduce the uncertainty of a single measurement, then you can not reduce the uncertainty of any other measurement included in that mean.
“uncertainty describes what you don’t know and can never know.”
Of course you can never know, it wouldn’t be uncertain if you could. But there are rigidly defined areas of the limits of uncertainty. If you say the uncertainty is 0.5cm, you are saying it’s unlikely or highly unlikely that the error could be more than that – depending on exactly how you’ve defined the uncertainty
“Statistical analysis will not reduce it.”
The claim as that statistical analysis can only make it less certain.
I’m still puzled about what either of you are saying in this regard. Half the time I’m told that uncertainty can never be reduced, but the rest of the time it’s possible to reduce uncertainty in measurements of a single object, and the you can never reduce uncertainty rule only applies to averaging different things.
“You can measure a million other boards but you can not reduce the uncertainty of that very first board you measured.”
Which is not a problem as I’m not trying to find the exact length of the first board I measure. What I’m trying to estimate is the more interesting question of what the average length of a board is.
“What’s sad is that you can not recognize that a mean that uses that measurement is also carrying that uncertainty.”
More appeals to some mystical purity of a single uncertainty. You and Tim can say this sort of thing all you like, but unless you can demonstrate how this one in a million uncertainty somehow remains in effect to it’s full extent in the final mean, I cannot take your claims seriously.
All I ask is that you provide one piece of evidence that some text book says the uncertainty of a mean behaves like that, or one experiment, or one example, then we could discuss it.
No. Get this in your head — Error and Uncertainty are two different things!
Error can be systematic or human error. Different people can read a meniscus differently. Systematic error, i.e., a built in problem can not be resolved through statistics. Human error can be random and if enough measurements are taken an average will remove some of the difference.
Uncertainty is what you don’t know. If it is +/- 0.5 then a measurement of 1 can vary between 0.5 and 1.5 and you can never know what the actual measurement truly is. The uncertainty defines an interval within which any “value” is as likely as any other and you have no way to discern what is the correct one. You can’t eliminate this kind of uncertainty through statistics since there is no distribution to analyze. Can you analyze the possible effects of uncertainty? Surely.
Give me a break. If you don’t know the measurement of the first board, then you don’t know the measurement of any other board either. The uncertainty will then propagate through any calculations you make and the average will have a very large uncertainty. Remember, this is not error. It can not be reduced through averaging like error can be.
This tells me you have never studied metrology or never understood what it is trying to accomplish.
“No. Get this in your head — Error and Uncertainty are two different things!“
I know they are not the same thing. But I think they are related by the fact that errors are what make up the uncertainty, whereas you seem to think they are completely unrelated. I’m not sure if this is because we have different understanding of what an error is or what an uncertainty is.
Rather than simply shout at each other maybe it would be an idea to define terms.
When I use the word error I mean in the statistical sense of the difference between a measure and the thing it is measuring – which includes a measure of a mean and the actual mean. Errors can be both random and systemic and the two have different effects.
When talking about averaging different sized things to estimate a population mean, the error is the difference between the specific value and the true mean. Some of this may be due to measurement error, but most is likely to be due to the random variation int he population. (Note, error does not mean mistake). If you are taking a single measurement from a single thing, the error is the difference between the measurement and the actual value. These are errors are going to come from a variety of causes, some of which might be random and others systemic.
I’m trying to stick to the GUM defintions of uncertainty, which I thought you wanted, but your definition seems different.
The GUM defines uncertainty of measurement as
With notes saying
It goes on to say that this is not inconstant with other definitions such as
Then it gives definitions for standard uncertainty
and Type A Uncertainty
All of this leads me to believe that metrology regards uncertainty as being defined as the range of possible errors on a measurement, with range being allowed to have different definitions including a standard deviation.
Meanwhile you say
“The uncertainty defines an interval within which any “value” is as likely as any other and you have no way to discern what is the correct one. You can’t eliminate this kind of uncertainty through statistics since there is no distribution to analyze.”
Which doesn’t resemble any of the definitions used above. Why do you assume all values are equally likely? Why can’t you determine the distribution simply by taking multiple readings?
Even if you don’t know the distribution you still reduce the uncertainty by averaging, and contrary to what you say there has to be a distribution. Your claim that all values are equally likely implies a uniform distribution.
I think this post demonstrates how we’re talking past each other. You’re talking about random errors and how they might “even out.”
There’s the inaccuracy of the measuring instrument that’s being ignored. If each measurement has an uncertainty, that has to be propagated.
0.5m +/- 0.1m + 2.5m +/- 0.1m + 2.0m +/- 0.1m =
0.5m + 2.5m + 3.0m = 6.0m
sqrt((0.1^2) + 0.1^2) + (0.1^2)) = sqrt(0.03) = 0.17
The answer is 6.0m +/- 0.17m
The mean would be
1/3(0.5 + 2.5 + 3.0) = 1/3(6.0) = 2.0m
sqrt(1/3(0.1^2)+(0.1^2)+(0.1^2)) = sqrt(1/3(0.03))
= sqrt(0.01) = 0.1m.
mean = 2.0m +/- 0.1m
The uncertainty in your calculation of the mean is +/- 0.1m.
The uncertainty of the mean is +/- 0.17m.
The mean simply cannot have less uncertainty than the sum of the elements making up that mean. Not if the elements are random and independent which do no define a probability distribution amenable to statistical analysis.
If I have ten boards, five of which are 2′ +/- 1″ and five of which are 10′ +/- 1″ what is the uncertainty of the mean?
The sum can range from
(5)2′ + (5)10′ + 10(.1″) = 60′ + 1″ to
(5)2′ + (5)10′ – 10(.1″) = 60′ – 1″
So the uncertainty in the sum is no +/- 1″
The mean will vary from 60.08’/10 to 59.9/10 or
6.008′ and 5.99′, (72.1″ and 71.9″)
a difference of .2′ . NOT .1″/10 = .01″
The uncertainty of the mean grows just like the uncertainty of the sum grows. There is no dividing the uncertainty by N or sqrt(N)
You cannot decrease the uncertainty of the mean unless the measurements represent a random distribution around a true value. Since there is no true value associated with a set of random, independent measurements of different things you cannot calculate a true value, thus there is no true value.
A great example of why not to use imperial measurements.
But I’m not sure I follow your argument. You say there is an uncertainty of ±0.1″ (I think that’s what you meant, not the 1″ you say) in each measurement and therefore the worse case will be that the sum of the ten boards will be out by ±1.0″. This of course assumes the errors are not independent, so don’t benefit from RSS.
So the mean may vary between ±0.1″ (which for some reason you claim is 0.2′, rather than ±0.0083′. This is waht you expect if the errors are not independent. The uncertainty of the mean is the same as the uncertainty of each measure.
But then you claim this shows the uncertainty grows, when in fact it’s stayed the same. You also point out the uncertainty is not 0.01″, but nobody says it should be. If the errors were random the uncertainty should be 0.1 / sqrt(10), about 0.03″.
Finally, all this is only true if we are only interested in the average of these specific 10 boards. If they are meant to be a random sample from a population the actual standard error of the mean would depend on the standard deviation of your boards. My estimate for this is about 0.4′ or 5″.
“As I keep trying to point out, when you are dealing with the mean of a broad population, the uncertainties in the measurements are largely irrelevant”
What you keep trying to point out is pure bullshite.
Where are you getting this stuff?
From https://www.indeed.com/career-advice/career-development/standard-error-mean#:~:text=How%20to%20calculate%20the%20standard%20error%20of%20the,deviation%20from%20the%20mean.%20…%20More%20items…%20
From: Standard Error of The Mean (Formula & Example) (byjus.com)
Need I go on? Standard error of the (sample) mean is not the standard error of the population mean. It DOES NOT specify a new precision that can be used for the mean. The precision of the population mean is determined by using significant digit rules, not by finding out how closely a sample mean estimates the population mean.
You need to go to a metrology course to learn what some of this truly means. Statistics is not the be all and end all, especially if you don’t understand what statistical parameters are defined as. Don’t worry, there are many, many folks in climate science that don’t understand simple sample theory and what it can or can’t tell you.
You are making an assertion here that is incorrect. I need you to find a scholarly reference that says the standard error of the mean (really standard error of the sample mean), calculated from a group of independent measurements, also defines the measurement precision of the mean.
I have given you two refuting references and I can give you more if you need them. It is your turn to find a reference describing the SEM as the precision of a population mean.
There seems to be great confusion about accuracy and precision. Let me see if I can be of assistance. First, a picture to clarify:

In short, precision is the scatter of your shots. Accuracy is the distance of the shots from the bullseye.
Note that precision cannot be improved by increasing the sample size—the scatter of your shots doesn’t change over time.
Note also that the accuracy cannot be improved by increasing the sample size—if you’re aiming at the wrong point, you won’t hit the right point no matter how many times you shoot.
Next, the standard error of the mean (SEM) is an estimate of the likely value of the mean of the population from which the sample is drawn, assuming that the distribution is symmetrical, that the accuracy is perfect, and that the data is stationary..
Next, in general we can’t determine accuracy because accuracy is the distance from the bullseye, and in general the bullseye is unknown … because that’s exactly what we’re trying to measure.
Hope this is of use.
w.
As some here are very insistent that correct metrological terms be used, it should be noted that what you call accuracy is now called trueness.
“Note that precision cannot be improved by increasing the sample size—the scatter of your shots doesn’t change over time.”
That’s true if you are talking about individual shots, but the precision of the average shot does increase with sample size.
“Next, the standard error of the mean (SEM) is an estimate of the likely value of the mean of the population from which the sample is drawn, assuming that the distribution is symmetrical, that the accuracy is perfect, and that the data is stationary.”
Those assumptions are not all necessary. The population distribution does not need to be symmetrical, it can be any shape you like, as long as the sample size is large enough the sample mean will converge to the population mean.
Of course, if the samples are not true then the sample mean will not be true. That’s why I liken SEM to precision, you can get a precise mean, but it may not be true if there’s a systematic error in your readings. You can also get a true but not very accurate estimate if your SEM is not precise
My understanding would be that stationarity is only an issue if your sample is biased, which is why the assumption is that the samples are random and independent.
Yep. That is my understanding too. Non-stationary data opens the possibility of sampling bias, but it doesn’t invalidate the SEM rule in general. In fact, I simulated non-stationary data and confirmed that the SEM still worked as long as my sampling was random and unbiased.
Bull pucky!!! If I have a rifle whose precision is 1 Minute Of Angle (MOA), I can never get more precision than that. If I shoot 10 times my shots will be within a circle of that size, If I shoot 10,000 times, the shots will still be within a circle that size, 100,000 shots will not decrease the precision, i.e., the size of that circle. That is the precision of the instrument. The only way to increase the precision is to use a rifle with a 1/2 MOA.
Measuring devices are the same way. If I have a yardstick marked in 1/16ths of an inch, I can’t increase the precision of measurements by taking more and more measurements of the same thing with the same yardstick. Otherwise, there would be no need for micrometers would there?
No, you have eyes but can not see. The SEM is an interval within which the population lies. Each sample mean in a sample distribution can have a different value. The resulting interval (SEM) has no relationship to the absolute value of the population mean.
As I’ve stated elsewhere, when the mean of a sample is calculated, significant figure rules should be used,
If have two samples,
10, 21, 33, 42, 57 –> 33
13, 25, 33, 48, 54 –> 35
The means of each sample are 33 (not 32.6) and 35 (not 34.6). This maintains the precision of the original measurements.
Willis is exactly correct in his descriptions and your interpretations are not.
Bellman,
By the way, where are your scholarly references that prove your assertions.
You have simply used your own interpretations to generalize that the references I showed are incorrect. That just won’t suffice.
Which assertions are these, what references are you talking about? There are so many arguments going on it’s difficult to keep track. I don’t think I’ve suggested any of your references are incorrect, I just think you are misunderstanding them.
“If I shoot 10,000 times, the shots will still be within a circle that size, 100,000 shots will not decrease the precision, i.e., the size of that circle. That is the precision of the instrument.”
Again, I was talking about the precision of the mean, not of individual shots. If you don’t think the precision of the mean will improve by increasing sample size, how do you justify the VIM’s definition of trueness, that says it’s how close an average of an infinite number of measurements is to the true value?
“If I have a yardstick marked in 1/16ths of an inch, I can’t increase the precision of measurements by taking more and more measurements of the same thing with the same yardstick”
Using the VIM definition of measurement precision you might well get a more precise measurement using a course resolution, simply because it’s more likely that all measurements will be the same.
“Each sample mean in a sample distribution can have a different value.”
Yes of course it can, that’s the whole point of the SEM, to estimate how close different samples are likely to be to each other, and assuming no bias, the population mean.
“The resulting interval (SEM) has no relationship to the absolute value of the population mean.”
Of course it does. Are you sure you actually understand what the SEM is?
Come on dude, with your assertion we could get precision to the 1/100th of inch from a yardstick. Do you really believe that?
You go around and around the bush with uncertainty and error. Of course you can reduce the random error of a measure of one thing with the same device and multiple measurements. No one disputes that. That doesn’t change the uncertainty in each measurement.
“Come on dude, with your assertion we could get precision to the 1/100th of inch from a yardstick.”
I was making a statement about the VIM definition of measurement precision. It doesn’t necessarily agree with definitions used outside of metrology. If you define precision as the closeness of indications of measurements on the same or similar object, then a consequence is that if you use an imprecise instrument (in the sense of only using course units) might yield a very precise result if all readings are the same.
“That doesn’t change the uncertainty in each measurement.”
And you keep going round the bush with the difference between individual measurements and the mean of those measurements.
Measure one thing repeatedly using multiple measurements. Reduce the random independent errors in the average result. Then take that average as the measurement of that one thing. Is that average measurement more uncertain, less uncertain, or the same level of uncertainty as any of the individual measurements?
As an example if I have a voltmeter with a two digit integer display and I read a voltage of 36 V and I get that same voltage each time I measure that is the precision you get. The uncertainty resolves to the fact that the voltage could have been 36.9999 V or 35.0001 V. In essence, an uncertainty of +/- 1 V.
Now, I want better resolution/precision and so I buy a voltmeter that reads to 1 decimal place. I get numerous readings of 36.2 V. Do I know if the voltage is 36.2499 or 36.1501? Now what is my uncertainty? How about +/- 0.05?
Here is the key. Can numerous readings be used to reduce/eliminate the uncertainty by increasing the precision? Nope! It is what you don’t know and can never know!
“Here is the key. Can numerous readings be used to reduce/eliminate the uncertainty by increasing the precision?”
Not with those readings you can’t. That’s a problem with rounding. If all your readings are to the nearest integer, and are all showing the same value, your average will be the same. The readings and your average will be precise, but not necessarily true. The rounding in this case adds a bias.
Say the true value was 35.7V, and your voltmeter gets it right to within 0.1V each time, but then always rounds the value up to 36V. It’s a precise measurement that is off by 0.3V.
You would be able to make a better estimate of the true value, by using a less precise instrument. If the instrument had a random independent error in each reading of ±1V, some of the rounded readings would be 35V sometimes, and with enough samples the mean would tend closer to the true value.
The bias caused by the rounding error would also be a lot less of a problem if you were measuring multiple different voltages. The sample mean is going to show less of a bias, simply because a lot of the roundings will cancel out.
“That’s a problem with rounding. If all your readings are to the nearest integer, and are all showing the same value, your average will be the same.”
Will be? Or should be? What if you have 3 readings that add to 4?
“Say the true value was 35.7V, and your voltmeter gets it right to within 0.1V each time, but then always rounds the value up to 36V. It’s a precise measurement that is off by 0.3V.”
The meter rounds the value up to 36v? How does it do this when it is reading to the tenth digit?
“The sample mean is going to show less of a bias, simply because a lot of the roundings will cancel out.”
When you are measuring different voltages then how does the mean represent a true value with less bias? The mean won’t be a true value! Each individual voltage will have its own true value. The only way to determine the true value of each voltage is to take multiple readings of each voltage!
“The meter rounds the value up to 36v? How does it do this when it is reading to the tenth digit?”
I was talking about Jim’s first example.
Thought I’d test this for myself, so produced a sample of measurements each with a normal distribution around 35.7V, with an sd of 0.05, then rounded all results to the nearest integer. With a sample of 100 the average was almost always 36V, all out by 0.3V.
Increasing the standard deviation of the measurements to 0.2 and I’m now seeing averages ranging from about 35.75 to 35.9V.
Increase sd to and the averages are ranging from around 35.5 to 36.0V. Taking 10000 of these samples the mean was 35.70V, with a standard error of ±0.06V.
I’m rather impressed by that. Sample size is only 100, all values are rounded to the nearest unit, so mostly just 35V or 36V, yet I’m able to get results that are mostly within 0.1V of the true value. Yet Gorman would insist I have to round each result to 36V.
Here’s a histogram of my 10000 trials.
You just created a population with a probability distribution made to prove your assertion.
What if the measurements don’t have the mean as the most numerous reading? What if there is *NO* measurement that matches the mean? How then can the mean be the true value?
It was the example Jim presented. The readings don’t have the mean as the most numerous reading. As I said all measurements were rounded to the nearest integer. No reading had an average value, as that was 35.7V. The closes readings were 36V.
The mean is the true value because that’s what I said it was in the simulation.
I didn’t say it defined the measurement precision, I said they where the same thing. Measurement precision is defined as the the closeness of measurements to each other. If you think of an estimate of the mean as a measurement of the mean, than the SEM defines how close repeated measurements will be to each other.
I don’t see anything in the two quotes you gave that contradicts the idea that the SEM is a measure of the precision of the mean.
Here are a couple of not very academic references I found after a brief search
https://www.investopedia.com/ask/answers/042415/what-difference-between-standard-error-means-and-standard-deviation.asp
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1255808/
Again you do not understand what is being said.
The standard error of the mean is not a precision estimator. It is the interval within which the population mean should lie. This interval can be decreased by using large sample sizes and performing a large number of samples.
The sample mean distribution is a measure of each sample’s mean. In other words, if you used a sample size of 10 and had one hundred samples you would see an approximate normal distribution with the center being the estimated population mean.
This has no bearing on the precision of the data used. In fact, each sample mean should use significant figure rules to calculate the value. This is where many, including climate scientists go awry. Not using significant digit rules at this point artificially adds digits of precision that are not warranted,.
“The sample mean distribution is a measure of each sample’s mean. In other words, if you used a sample size of 10 and had one hundred samples you would see an approximate normal distribution with the center being the estimated population mean”
And I still don;t understand how you think that differs from a definition of precision. The precision we are talking about here is the precision of the sample mean.
“This has no bearing on the precision of the data used.”
And maybe that’s why you are getting confused. As I said, I’m talking about the SEM as the value of the precision of the mean, not of the data used. The failure to understand this distinction seems to be at the heart of much of these endless discussions.
“In fact, each sample mean should use significant figure rules to calculate the value.”
You keep using the word rule as if it’s a mathematical concept rather than a useful guide to presenting data. Rounding measurements that are going into the calculation of the average will not help it at all. At best it will probably have next to zero effect on the mean, because as I keep saying, measurement errors are normally trivial compared with the sampling errors. All rounding the figures does is increase the uncertainty of the measurements a little, and risks adding a bias.
“Measurement precision is defined as the the closeness of measurements to each other.”
That is only true IF YOU ARE MEASURING THE SAME THING EACH TIME!
If you are measuring different things then how do you know how close they should be?
You still haven’t defined what you mean by “THE SAME THING EACH TIME”. If I’m measuring the mean of a population, is that the same thing each time.
If I’m measuring different things with different sizes I can estimate how close they are by talking the sample standard distribution. But I’m not interested in that, what I’m interested in is how close each sample mean is to each other. Which I can estimate by calculating the standard error of the mean.
Go to Tractor Supply and buy a 1/4″ round rod.
Measure it 100 times with the best measurement tool you have.
The average (mean) of those 100 measurements will cluster around the true value hopefully with a Gaussian distribution.
Now, go to several different Tractor Supply stores and buy a rod of random length at each one. Buy a tape measure at each store to use in measuring the rod you buy there.
Now, average all those random, independent measures together to get a mean. Do you *truly* believe that mean will describe a “true value” that actually tells you something useful?
It will give me an estimate of the average length of each rod, and if the stores are selling random independent rods it will give me an estimate of the mean of all rods sold by stores. Whether that’s useful depends on the point of this experiment.
It’s a bit difficult to know how to do this experiment properly though. Am I supposed to be asking the store for a random length rod, or am I asking for a rod of a specific length that’s I’ve randomly selected?
From your first link
You still don’t understand what that says do you?
It says that if you use a large enough sample and a large number of samples, your sample mean distribution will become tighter and tighter until is approximates a vertical line. That line is where the population mean is. Notice that nowhere does it say that is the precision of the population mean, only that the interval where the mean lies is very small.
That leads to the next problem you and climate scientists make. When computing the mean of each sample, you do it normally. However you SHOULD ALSO USE SIGNIFICANT FIGURE RULES when computing the mean of each sample. If the temps available, such as prior to 1980, are integers then the mean of each sample should be an integer also. (Please note that some use a number to one decimal place as a guard digit to insure proper rounding but the final answer after rounding is still an integer.)
I know this is never taught in statistics because the mathematicians only deal in numbers, not scientific measurements. Heck, mathematicians would just as soon use numbers with decimal places out to the limit of any given computer when computing averages.
“It says that if you use a large enough sample and a large number of samples, your sample mean distribution will become tighter and tighter until is approximates a vertical line.”
You still don;t get that increasing the number of samples, as opposed to the sample size, does not affect the error of the mean. But if you increase sample size the distribution of the potential sample meas will get tighter. This means the larger the sample size the closer any mean from a specific random sampling will be to each other. Hence my comparison to measurement precision, which is defined as the closeness of each independent measurement to each other.
If you could take an infinite sample size it would converge to a single point, but this would not necessary be the population mean as there may be bias in the sampling. That is the same as measurement trueness.
“Notice that nowhere does it say that is the precision of the population mean, only that the interval where the mean lies is very small.”
Notice that the definition of measurement precision says nothing about the precision of the true value of the measurement, it’s about the closeness of individual measurements.
“If the temps available, such as prior to 1980, are integers then the mean of each sample should be an integer also.”
Again, a claim you make with no evidence. My understanding of the “rules” for number of digits is that the uncertainty should be stated to 1 or 2 sf, and the measure to the same number of decimal places. If the uncertainty of a mean is expressed in hundredths of a degree, so should the mean. But as I’ve also said before, if figures are going to be used as the basis of future calculations you can, probably should, retain more digits.
You example of integers is obvious nonsense. What is the average number of children in a family. You can only count them using natural numbers, but rounding the average to an integer will be meaningless.
f”Again, a claim you make with no evidence. My understanding of the “rules” for number of digits is that the uncertainty should be stated to 1 or 2 sf, and the measure to the same number of decimal places. If the uncertainty of a mean is expressed in hundredths of a degree, so should the mean. But as I’ve also said before, if figures are going to be used as the basis of future calculations you can, probably should, retain more digits.”
Nope, the mean should have the same number of significant digits as the measure, not the other way around. Nor does uncertainty determine the number of significant digits. The number of significant digits in the uncertainty is, again, determined by the number of significant digits in the measure.
“Nope, the mean should have the same number of significant digits as the measure,…”
Then why keep recommending I read Taylor. His rule is
And there a re plenty of examples of the uncertainty being less than the measured uncertainty. For example, the 200 sheet example, where the stack is measured to 1/10 of an inch, but the thickness of a single sheet of paper is given as 0.0065 ± 0.0005 inches.
Why do you always *insist* on misquoting Taylor?
“Experimental uncertainties should almost always be rounded to one significant figure.”
“The rule (2.5) has only one significant exception. If the leading digit in the uncertainty delta-x is a 1, then keeping two significant figures in delta-x may be better. For example, suppose that some calculation gave the uncertainty delta-x = 0.14. Rounding this number to delta-x = 0.1 would be a substantial proportionate reduction, so we could argue that retaining two figures might be less misleading, and quote delta-x = 0.14.”
I quoted him exactly. I copied and pasted directly from the pdf. The passage you quote says to give uncertainty to 1 sf. The rule I quoted is just below that (2.9).
There’s nothing incompatible between the two. Quote the uncertainty to 1 sf, quote the value to the same order of magnitude. If uncertainty is 28, quote it as 30, if the answer is 2468.9, quote it as 2470. If uncertainty is 0.0026, quote it as 0.003 and the answer to 3 decimal places.
You did what you have accused me of – quoting out of context.
You can’t get uncertainty to four decimals unless you can measure to four decimal points. Since when do we have temperature stations that can measure to +/- 0.0005C?
“You did what you have accused me of – quoting out of context.“.
I’m not sure what context you want me to put on it. It is the general rule stated by Taylor, and demonstrates why your claim that you can only quote a calculated answer to the same number of significant figures as you measured is wrong. Nothing in any of the rest of the book contradicts that. And I gave you an example that shows how you can have more certainty in the answer than in the initial measurement.
“You can’t get uncertainty to four decimals unless you can measure to four decimal points. Since when do we have temperature stations that can measure to +/- 0.0005C?”
This wasn’t about temperature but about the thickness of paper, and Taylor shows how you can do it – by dividing the measurement by 200. If you understood that the whole point of that section is to show that uncertainties scale when you scale the measurement you would understand what he is doing.
You tried to imply something Taylor didn’t say. That’s quoting out of context.
“why your claim that you can only quote a calculated answer to the same number of significant figures as you measured is wrong”
If you only measure to 1 significant digit then your uncertainty should have no more significant digits. How do you get an uncertainty less than what you can measure?
You didn’t understand what Taylor was doing. I asked you to go back and reread that section. Apparently you didn’t.
Suppose you measure the paper stack of 200 pages to be 10″ +/- .1″.
When you divide this up he said each paper would be .05″ +/- .0005″
The point is that the uncertainties of each piece of paper ADD when you stack them together! Using your logic the uncertainty of the stack would be (.0005)/sqrt(200) or .00003″ (or maybe 0.1/sqrt200 = .07) instead of .1″. Taylor is showing that the uncertainty is not divided by either N or sqrt(N).
You have to *understand* what Taylor writes. You just try to cherry pick things that you hope prove your point. But you never really try to work out the math!
“You simply don’t know! Even if the uncertainty interval is +/- 10mm the stated value may actually be the true value!”
Tell me what I tried to imply. All I was saying is that the rule Taylor lays down is to quote the answer to the number of significant figures dictated by the magnitude of the uncertainty, not dictated by the significant figures in the measurements. I don’t accept that this is quoting him out of context, becasue I don;t think he intends the latter. If you can find him saying that, it’s up to you to provide a quote. I’ve not only given the quote, I’ve given you an example of where you quotes significant figures beyond those measured.
“How do you get an uncertainty less than what you can measure?”
By calculation, specifically by taking a mean.
“The point is that the uncertainties of each piece of paper ADD when you stack them together!”
He does not measure any individual piece of paper, he measures the whole stack.He derives the uncertainty of each piece of paper from the uncertainty of the measurement of the stack.
” Taylor is showing that the uncertainty is not divided by either N or sqrt(N).”
He’s literally shown that you divide the uncertainty by N. He says it in many places. I really cannot help you with your reading comprehension any further, but lets try to spell it out.
Section 3.4 Two Special cases, part one
“MEASURED QUANTITY TIMES EXACT NUMBER”
First sentence spells out what he is talking about. We have a measured quantity x, then multiply it by a constant B to derive a new value q.
He then goes on to give to examples. On is measuring the diameter of a circle and calculating its circumference by multiplying the measured diameter by pi. The second is our example, measuring a stack of 200 identical sheets of paper and multiplying that by 1/200 to get the thickness of a single sheet of paper.
Then he goes on to explain, via the previous result, the fact that a constant has no uncertainty and the fact that you are left with equal fractional uncertainties, that this implies you multiply the uncertainty (not the fractional uncertainty) of x by B to get the uncertainty (again not fractional) of q. This he gives as equation (3.9), which he describes as a “useful” rule.
Now the fun bit. He goes on to explain why the rule is useful. Pay attention.
This is followed by the example you seem not to understand where he measures the stack of paper and, guess what, divides both the total thickness and the uncertainty by 200 to get a much smaller uncertainty for an individual uncertainty.
In case some are not following this he goes on (my embolding)
Hopefully, that’s clear. He does divide the uncertainty by 200, and he does end up with an uncertainty that is small compared to the measurement uncertainty.
But you have to *understand* what Taylor writes. You just try to cherry pick things that you hope prove your point. But you never really try to work out the math!
“By calculation, specifically by taking a mean.”
Once again, you cannot reduce uncertainty of independent, random measurands by taking a mean. Since there is no guarantee that the mean actually describes any specific measurand (i.e. Gaussian distribution not guaranteed) calculating the mean more and more precisely does not lessen the uncertainty of the mean, that uncertainty is defined by the uncertainty of the sum of the measurands.
“He does not measure any individual piece of paper, he measures the whole stack.He derives the uncertainty of each piece of paper from the uncertainty of the measurement of the stack.”
delta-q/200 = delta-x. ( delta-x)200 = delta-q.
Why is this math so hard for you to understand?
“He’s literally shown that you divide the uncertainty by N. He says it in many places. I really cannot help you with your reading comprehension any further, but lets try to spell it out.”
That is *NOT* what he is showing! You have 200 measurands. He divides the uncertainty of the total stack by 200 to get the individual uncertainty. He is *NOT* reducing the uncertainty of the stack in any way, shape, or form.
“this implies you multiply the uncertainty (not the fractional uncertainty) of x by B to get the uncertainty (again not fractional) of q.”
Unbelievable Bell. q = Bx. x is the value for each individual sheet as well as its uncertainty. To get the uncertainty for a stack of 200 sheets (i.e. for q) you then multiply by B (i.e. 200). Wow, a second grader could figure this one out!
” divides both the total thickness and the uncertainty by 200 to get a much smaller uncertainty for an individual uncertainty.”
But he does not divide by 200 to reduce the uncertainty associated with the stack! He divides by 200 to get the uncertainty of each individual sheet!
delta-x1 + delta-x2 + ….. + delta-x200 = delta-q
It seems that climatologists routinely disregard the standards of metrology developed in other disciplines such as chemistry and physics. That is why even simple error bars are so routinely left off graphics.
What many of us here are attempting to do is point out that “The king has no clothes!”
100%. They presume all of their numbers are pristine, with any errors canceling, and then think the standard deviation of a sample average is the precision.
Wrong. You have thousands of temperature stations whose noise functions are not known. You cannot assume the errors cancel. In fact, they most certainly do not. Uncertainty does not reduce to ±0.05C. Look at the damn math.
What does the uncertainty reduce to? Can you provide a specific value and a rigorous analysis justifying it?
Irony alert.
No, you can’t do that. It completely depends on how the error terms on the different measurement sites correlate. Since you are usually enormously ignorant of this crucial fact, the errors can’t be assumed to cancel.
It is quite astounding the magical properties increasing N is supposed to have on measurement error. Pat Frank proved very clearly that errors in temperature readings are not proven to cancel. He estimated a representative lower bound of uncertainty of 0.46, if I recall correctly. 2 sigma for that is 0.92C, which is right around the IPCC’s presumed current temperature anomaly compared with the 20th century average.
Can you point me to Pat Frank’s global mean temperature dataset with accompanying uncertainty analysis showing that the monthly global mean temperatures anomalies have an uncertainty of ±0.46C? I’d like to review it.
He doesn’t need a data set to prove it. All he has to do is demonstrate that without proof that errors cancel (which has not been provided) that you can’t get down to 0.05C. If you can provide the proof that errors cancel for temperature stations whose noise functions aren’t known, you’ll help a lot of people go from magical thinking to doing science.
I found it here. ±0.46C is 1σ. That means given two independent timeseries we expect I disagreement between the two at σ=0.65C. The actual disagreement between HadCRUTv5 and ERA was σ=0.06C, HadCRUTv5 and BEST was σ=0.04C, and HadCRUTv5 and GISTEMP was σ=0.04C from 1979-2021. Why is the actual disagreement so much lower than Frank predicts? Why does the actual disagreement corroborate an uncertainty on the order ±0.05C?
You’re mistaking variance for uncertainty. Until you understand the difference there really isn’t a point to going back and forth. And as for the ±0.05C, people can and so share the same flawed statistical reasoning of just wishing the error terms away. This entire field is a joke when it comes to statistics. They make up their own rules, keep subject matter experts out, and then spend decades publishing garbage based off the junk. Judith Curry just showed that the optimal fingerprinting method was based off junk stats. Will the field change? Of course not. Publish or perish.
If two measurements both have ±0.46C (1σ) uncertainty what is the probability that they would be different by only 0.04C?
Only in the alternate universe that you inhabit.
You don’t have a clue do you? “σ” is a measure of variance. It describes the +/- interval that says a measurement has about a 65% chance of being in a 1σ interval, That also means that there is a 45% chance it lies outside that interval,
The GUM allows stating uncertainty for a single measurand as a standard deviation (SD) as long as it is stated as being the SD.
As to your assertion that there is little difference between the series and that translates to a small uncertainty, that is entirely untrue.
Uncertainty is “inside” each time series and is not amenable to reduction by comparing one series to another. In other words, one series could have a high uncertainty and the other a low uncertainty or vice versa, Comparing them will not tell you anything about the individual uncertainty of each.
You’re basically doing what happens when an “average” of model outputs is taken as the correct projection. You can’t average wrong answers and get the right answer except by pure luck. What Dr. Frank has done is show that each model has a very large uncertainty so that whatever value they arrive at is just as likely as another answer within that interval,
I’ll ask you the same question…If two measurements both have ±0.46C (1σ) uncertainty what is the probability that they would be different by only 0.04C?
They would have a ~5% chance of being different by 0.04C, or less.
Ding. Ding. Ding. That is exactly right. Well technically I think it is closer to 4.7%, but that’s just being unnecessarily pedantic.
What that tells us is that given an uncertainty of ±0.46C two independent measurements of the global mean temperature should follow abs(Ta-Tb) <= 0.04C about 5% of the time and abs(Ta-Tb) > 0.04C about 95% of the time.
In reality the disagreement between two independent measurements is no more than 0.04C a whopping 70% of the time. I did this for each combination of GISTEMP, BEST, HadCRUTv5, and ERA resulting in about 5000 comparisons. Note that each does their own independent measurement of the global mean temperature using different techniques and subsets of available data. This is consistent with the rigorously assessed uncertainty of about ±0.05C (see Lenssen 2019 and Rhode 2013).
You don’t know that because you have no way to know where inside the uncertainty interval the measurement actually lays.
It doesn’t matter. We are assessing the probability that two independent measurements would agree/disagree by a certain amount. The true value is of little relevance here.
You simply can’t answer that question. Uncertainty is not a statistical parameter. You can’t define what other temps might be within the interval in terms of a probability distribution.
What uncertainty means is that any measured value within that range can occur. Why is that so hard to understand? When you take a measurement even with the accepted standard, there will always be an uncertainty interval. This is real word physical stuff, not just dealing in abstract numbers.
To answer your question, there is simply no way to know, One measurement could really be x+0.46 and the other x-0.46 and you have no way to analyze it statistically to compute what the real values are.
It’s what you don’t know and can never know. Why is that so hard to understand.
It is obvious from your unwillingness to deal with real physical measurements that you have no experience working with your hands on machinery. Have you ever used a lathe to make a press fit shaft? How about using a micrometer on a crankshaft main bearing journal or a rod bearing journal?
“Uncertainty is not a statistical parameter.”
If that uncertainty can be assessed (as in this case), then yes, it can become a statistical parameter.
You seem to be confusing the outcome from a single sample with the outcome from a statistically significant number of them.
“It is obvious from your unwillingness to deal with real physical measurements that you have no experience working with your hands on machinery.”
These same statistical laws are part of modern numerically controlled machinery. They are part of the operation of those machines and are even taught in NCM apprenticeships….
Nope. Uncertainty by definition has no probability distribution, not even a rectangular one. There is one and only one true value existing in that uncertainty interval and it has a probability of 1. All the other values have a probability of 0. The issue is that you don’t know which value has the probability of 1, if you did then there would be no uncertainty!
Therefore there are no statistical tools available to you to help you determine the true value.
“You seem to be confusing the outcome from a single sample with the outcome from a statistically significant number of them.”
That “significant number” has to be from the same measurand. If it isn’t then all you have a single measurements from a number of different measurands. There will not be a true value that can be calculated by determining the mean. Not a single measurand may match the value calculated for the mean so how can the the mean be a true value?
“These same statistical laws are part of modern numerically controlled machinery. They are part of the operation of those machines and are even taught in NCM apprenticeships….”
Even a numerically controlled machine can turn out product that does not match the specified “true value”. As water nozzles wear out, lasers wear, or cutting nibs wear down the product will vary in size. That implies that you will wind up with a non-Gaussian, skewed distribution where the mean and the true value don’t match and measurements of the product will not be a random distribution around a true value.
Folks, I tried to help. But I take heart in the fact that these Lost In Spacers only talk to each other, here. In both actual tech blogs and in peer reviewed lit, they have been laughed out of superterranea.
Only inside your head.
“temperature sensors are accurate to +/- 0.5 F or about +/- 0.28 C”
Hmmm, this may be true if temperature sensors are accurate to +/- 0.50 degrees with two significant digits. It may be more appropriate to say “or about +/- 0.3 C” with one significant digit.
To be honest, a scale that has 212 graduations between freezing and the boiling of water is more precise than a scale that has only 100 graduations. That’s why a meter stick gives a closer measurement than a yardstick!
The smaller the gradations, the greater the chance of parallax error.
which is why you make multiple, dependent measurementsof the same thing in order to create a probability distribution.
We don’t make multiple, dependent measurements of the same thing with temperature. We measure thousands of independent temperature measurements with unknown noise functions at different times and the warmest idiots pretend those are all repeat measurements of the same thing.
You are correct. I would only note that sensor accuracy is not “station” accuracy. The sensor in a land based station may be very accurate but if mud dauber wasps plug up the station air flow of if ants leave grime on the sensor then the accuracy of the station is compromised, regardless of the accuracy of the sensor. That’s why the government uses +/- .6C as the standard for the station!
As an old timer geophysicist (like Andy Middleton),it looks very much like you hand contoured in the data using your knowledge of trends from the surrounding recorded data. We did that all the time and the product was much better than a mindlessly applied gridding algorithm. This is called interpretation and is fine provided that you make it clear that you are not contouring real data. If the interpreted region looks good then better get some real data there in the future to confirm your hunch.
You might also have extracted from the good data a latitude specific look up table relating temperature to surface elevation, and then applied that to the sampled elevation grid to get the temperature grid. I’d accept that as a better product than the gridding algorithm.
Great minds. as they say
As an old-timer, thank you. Good gosh! Let’s leave interpolation regarding non-existant data to those using a slide rule and interpreting plots on semi-log paper.
However you did it, it is less than optimal. In particular, there is a large warm area in the ocean that is essentially ‘pixelated’ and has an unnatural straight NS boundary. Also, there is a region in the Andes that is colder than any of the temperatures in the data block removed. The improvement in ‘error’ is a balancing act between generating more warm areas and more cold areas. Whereas the average may be in improvement over no data, it looks to me that the variance has probably increased with the approach.
The improvement of the average with such an approach may depend on the ratio of warm to cold pixels. That is to say, the ‘accuracy’ may be highly variable, depending on where the cut-out block is located.
What is needed is to randomly remove stations with measured temperatures and see if the interpolation approaches can reproduce the missing known values of the stations. The average error for reproduced (missing) stations will tell us about the precision and reproducibility of the interpolation.
That is actually the method Berkeley Earth uses for their official dataset. Their methods paper says they use jack-knife resampling with 12% data denial on each iteration. It is computationally expensive but provides a unique insight into the combined uncertainty of the spatial sampling and kriging of the temperature field.
“What is needed is to randomly remove stations with measured temperatures and see if the interpolation approaches can reproduce the missing known values of the stations.”
Already been done, years ago, by Nick Stokes, for GAT. What would be the point of reproducing the data for single stations? Yes, the data for that station will always be preferable to it’s spatially interpolated alternative, but so what? The goal is to find TRENDS in GAT. Your aimless suggestion is a useless strawman….
This global average is meaningless, it is certainly not climate.
It may not be meaningful to you, but it is meaningful for those of us who wish to track it and see if it is increasing or decreasing.
That is a worthwhile goal. However, I have serious doubts that the NOAA claim that “the July 2021 average is the warmest on record by 0.01 deg C” is accurate, let alone contributes to that goal. Most of us are basically complaining that more precision is being claimed than is justified, and therefore giving the appearance of greater certainty than justified.
What NOAA has claimed is unknowable given the temperature record. That’s what’s so ridiculous about it. They’re claiming knowledge of something that’s impossible to know, given the instrumentation.
Funny, bdgwx said “That is actually the method Berkeley Earth uses for their official dataset.” Which one of you is right? Maybe you should confer before posting to that you present consistent criticisms.
There is no inconsistency. ALL “methods” are primarily used to find GAT trends. RUOK?
Willis
If your chunk was always left out, anomalies should not be impacted.
It is my understanding is that the chunk is always changing, locations.
The adjacent weather stations to any missing chunk are also changing.
Additional question.
If a vast wilderness was bordered by towns with weather stations at post offices, would the wilderness get hotter as the weather stations were slowly moved to airports?
I show four example anomaly grids at 50 year intervals upthread.
Nothing like the example used by Willis.
Hi willis,
If I were going to infill that missing data, I would start with the rest of the Ceres data, and at each latitude where data is missing calculate an “altitude adjusted” temperature (that is, go around the world at each latitude, and adjust the Ceres temperature for the local altitude with the moist adiabatic lapse rate… 6.5C per km) generating a “sea level equivalent” average temperature for each latitude band. I would then calculate an expected temperature at each point in the missing data block by adjusting the average temperature for that point’s latitude using the local elevation and the moist lapse rate. This procedure would generate “too cold” temperatures in the Andes, and “too warm” ocean temperatures in the upwelling area to the west of central South America. Like your infilled data.
We just need to do the Tighten Up:
https://youtu.be/uN7vm-k-AaA
The areas you are measuring now should be the same areas you are comparing with the past. The problem occurs when either/both now & past contain large volumes of changing areas, missing data & COMPUTED values making up the numbers. You start comparing an unvalidated model with another unvalidated model where the real temperatures don’t matter and the results are mostly made up.
Temperature of sites near each other may have correlation but not a fixed function. Site A & Site B may typically be within 2C of each other, sometime A is higher but sometimes B is higher. You start making assumptions of temperatures but the range of error is significant.
Temperature & climate can vary significantly every 20km in Sydney & surrounding areas. Sydney, the South West, North West, Nepean valley, Blue Mountains vary greatly. What is the current grid resolution of the models & datasets? How do they represent reality?
We have multiple weather stations because we know that simplifying it down to just 1 doesn’t represent the average.
Willis,
Thank you for raising these issues of interpolation etc.
Here are 5 responses in short pieces.
First, rely upon comments from those who have actually performed the mathematical work and rely even more on those who have applied it at an advanced level.
Recommendation – concentrate on what Thinking Scientist has written.
Geoff S
What a number of articles, some probably published here, have revealed, using the weather data downloadable from the weather stations, is that various sizeable regions in the US have flat or cooling temperature trends, as shown by each of the multiple stations in that region. However, the official NOAA reported temperatures show a warming trend. While this has been expressed as probably due to homogization, that isn’t so different from the demonstration here: some favored temperature is applied to an extended area surrounding the favored point.
Aka fart it out your @ur momisugly$$ and publish it as fact.
Comment two.
Be careful when comparing methods used in geology/resource estimates with methods used in climate research. They are not easily compared.
For example, in mineral work, an ‘anomaly’ is an observation that is significantly different to the bulk of similar observations. Like, in geochemical exploration, a stream sediment sample with 0.1 ppm of gold would be of high interest as an anomaly when other samples reported below 0.01 ppm. In geology, an anomaly is often siezed upon as important because it has historically been the first indicator of a new mine.
Whereas, in climate studies, ‘anomaly’ seems to have become associated with ‘anomaly method’ which is a mathematical derivation, commonly of temperatures, where an average value od surrounding temperatures is subtracted from all temperatures, in an effort to remove confounding variables such as the known variation of temperatures with altitudes.
In climate research, much effort has gone into homogenization and other methods to downplay the influence of occasional very high and very low values, the very opposite to that is done in mineral work.
There is a different mindset. Geoff S
Comment 3.
In strict theory, it is trivial to debate interpolation versus extrapolation on a spherical surface, because it is all interpolation in the sense that all extensions of data will eventually meet more data.
It does not matter which term is used.
What does matter is the assumptions that underlie the method. Geoff S
Nope Geoff, big difference in interpolation and extrapolation. With interpolation, infill data point values are in between that of control points (known values). With extrapolation, infill points can be much higher/ lower values than control points.
Using extrapolation, one can easily infill to give any answer for a global average. Some people think this is exactly what alarmists are doing, estimating global temperature first (using a model), then infilling unmeasured points using extrapolation techniques to match the model average.
This is really a question of your stationarity assumption. Your view is somewhat coloured by the oil industry I expect.
“Extrapolation” with, say, simple kriging (SK) which is a strict stationarity assumption, the extrapolated grid values will tend towards the declustered mean of the available samples. Under a stationary assumption kriging will not produce grid node estimates that exceed the min/max range of the input observations in any normal circumstances (it might with strange, unstable problems that induce large negative weights though).
Kriging with a non-stationary assumption eg linear regional trend will of course get larger and larger values as you estimate outside the area of the observations – but its the assumption/inference of the linear trend that is causing it, not kriging per se.
However, “extrapolation” using a gridding algorithm based on something such as local slope projection will go totally crazy as you get outside the area of the observations – as I think you are alluding too.
You are assuming that your control points are boundary limits on the interpolated data. That doesn’t mean that the interpolated data is correct however. Try to interpolate the temp on top of Pikes Peak, Boulder, and Colorado Springs. I guarantee you that no simple form of interpolation will give you the right answer. The real world is stranger than simple math can comprehend.
Point 4.
While there is some interest in how the interpolation is done, there needs to be critical examination of how the result is used. Since I cut some teeth on the Ranger uranium deposit, I’ll use examples from that type of setting. We adopted the newly-emerging field of geostatistics in the early 1970s, working closely with its French originators.
An early stage of practical geostatistics aims to determine how irregular a mineral prospect is. For example, using analysis for uranium from samples every meter down a drill hole, you examine the difference between adjacent sample values, then samples spaced two meters apart, then those spaced three apart and so on. This leads to a semivariogram plot, which can indicate that samples more than X meters apart have no predictive links, that is, you cannot help to estimate a missing value by using samples from more than X meters away. This can lead to generation of a search ellipse or 3_D body which is conceptually moved under computer program to infill volumes of missing data, as exist between drill holes.
(There is much more to the methodology than this.)
The geostatistical method, like all methods of infilling, involves an estimate of uncertainty. This is not the same as an estimate of error. You determine error in this example by drilling more holes and comparing your estimates with the new drill hole analysis results.
IIRC, the prime use of the infilling, interpolation, kriging or whatever name, is to form a view of uncertainty that can be related to the risk of starting a mine. These are essentially financial and economic considerations and they require the tightest estimates of uncertainty that you can afford to pay for.
The methods can provide numerous imaginary pictures of ore grades between drill holes, depending on subjective assumptions. They can (and do) indicate, for example, if a barren intrusive dyke interrupts the ore grades and should be rejected during mining, after its position is refined by more drilling. They can indicate changes in the fundamental geology, such as where one rock type has more or less grade than another. Features like this lead to the design of the pit or underground mine, especially the block size, which is the fundamental volume of rock that is used in ore calculations.
The main point is that interpolations are used to estimate uncertainty. Once mining starts, it is usual for each block grade to be estimated so it can be trucked to the ore heap or the waste heap. In the case of uranium, this is made easier by radiometric ore sorters. You drive the trucks under the sorters, which direct you to the appropriate heap. But, fundamentally, one relies on chemical or radiometric analysis of blocks, not on the value obtained by interpolation.
(At the conclusion of mining, we compared the interpolated grade and tonnes at Ranger One, Number One orebody, with what was actually mined and recovered. Both grade and tonnes reconciled to within +/- 5% of pre-mining estimates.)
Now, for climate work, the take-home point here is that you need to use care in the use of interpolated values. They are an estimate with an uncertainty. Uncertainty is not the same as error. This author at least has strong reservations about using interpolated climate guesses as if they were actual data. Sadly, if one makes errors in mine resource work, the penalty can bankruptcy and relegation to taxi driver work. Similar failure in climate work seem to attract to penalty – perversely, the more spectacular cases have sometimes led to fame among colleagues, which is hard to comprehend. Geoff S
Comment 5.
Following from Point 4 above, the interpolation method requires some form of confirmation that an observation at one location is related to that at another and so has some power to predict the missing value.
Some years ago, I tried to construct semivariograms from weather station temperature data, but it is hard to find data showing a useful separation of sites. I’d like to study sites that were separated by 1, 5, 10, 50, 100, 500, 1,000 …. meters and construct curves showing their relationship in temperatures. I failed to find a good enough set of stations that were there already, so I stopped.
Then I looked at parameters related to separation of sites, getting into autocorrelation concepts, by looking at separation in time rather than in distance. Using annual temperature data from the main Melbourne station, I did find some texture indicating that T in some years could be better estimated from T in previous years, some better than others, with various lags. Then it became too semantic and I dropped it.
Most of the temperature infilling work that we see used to some extent the plots of temperature correlation between stations of increasing distance separation. Most of us have seen that correlation coefficients of about 0.8 for stations with close separation to about 0.5 over separations of 1,000 km or so, so it has become common to match stations separated that far apart. I have long had doubts about this. On a graph, two straight lines with the same slope have an ideal correlation coefficient of 1, but that tells us nothing about their physical properties. There is an element of this “background” correlation in the methods commonly used, but I have failed to find an alternative.
In summary, all this means that one essential part of interpolation in mineral work, such as is derived from semivariograms, is missing or questionable in climate work. This is the demonstration that after a certain separation of stations, the power to estimate values at one station from the other is lost.
Willis, I think that your example does not involve this parameter. It would be better if it did.
Thank you for listening to me. Geoff S
Seems like there is a fundamental assumption in this post about this comment –
“If you only know the temp for 85% of the globe then just say “our metric for 85% of the earth is such and such. We don’t have good data for the other 15% and can only guess at its metric value” –
that may or may not be valid.
The exercise by Willis shows that when you produce a global average without interpolating the missing 15% it is less accurate than producing a global average by interpolation.
But using interpolation of the 15% and also leaving it out to produce and compare two global averages does not address the point raised in the comment, which seems to be that is precisely what scientists shouldn’t do. Nothing in the comment implies an average using only 85% of a sample will produce a global average more accurate than interpolating the 15%.
Now, it may be useful to interpolate globally because by doing so the inaccuracies are insignificant [though understanding their magnitude in different instances might be tougher than this post allows] or simply because one must have a global average regardless.
And if one can establish through repeated examples that the resulting inaccuracies from interpolation are always insignificant for whatever purpose the numbers are being used, fine.
But taking one random square chunk of data out and demonstrating it is more accurate to average through interpolation than average by leaving the data out in that particular case altogether tells us only how accurate it is relative to leaving the data out, not whether the resulting average is useful or its error significant or not for whatever purpose the average is being used. And it doesn’t seem to address the fundamental point in the comment.
How can you be certain of what the ‘true’ average temperature is without having a twin Earth to serve as a standard to reference the accuracy? I believe what WE has shown is that leaving out a lot of data gives a different average, not that the interpolated average is necessarily correct. It may have inherent biases that are related to how the interpolation is performed and where the data are missing.
It is a different game to remove a large area and not use it, compared to having measurements randomly missing across the entire globe. What we are talking about in the first instance is a biased sampling protocol.
WE conducted a data denial experiment. In the experiment you declare a data field be true. Then you deny your analysis some of the data in the field and see how well your analysis performs. The error is the difference between the original field and the analyzed field. You repeat the experiment for each analysis strategy and rank the strategies in order of skill.
Which still doesn’t mean that your original data field was accurate to start with. Error from an incorrect data field most likely is incorrect as well.
The original field is declared to be true and accurate for the experiment.
In other words, you create a model to give you the answer you want.
Put three sticks separated by 6 feet on the floor. Measure the left one and get 12 in. Measure the right one and get 11 in. Assume you can’t get to the middle stick to measure it, just like no data for temperature.
Now interpolate what the middle stick length should be, Is it longer than that interpolated value? Is it shorter? Is it exactly what you interpolated? How do you ever know what the value should be? How does that uncertainty affect the end result?
That my friend, is uncertainty. It is something you don’t know and can never know. Stating that somehow you can reduce uncertainty through statistics is extremely misleading if you don’t also state the associated uncertainty,
Exactly, he constructed the desired universe with a computer.
What Willis did was create a control grid, a data denial grid, and a model for reconstructing the control grid by creating an analyzed grid from the data denial grid. He then assessed the skill of his model by comparing the average of the analyzed grid to the average of the control grid.
The point of a data denial experiment is to assess the skill of an interpolation model. Step 1 is to create a control grid. Step 2 is to create a data denial grid. Step 3 is to run the model on the data denial grid to create an analyzed grid. Step 4 is to compare the analyzed grid to the control grid and objectively score the results. You don’t create the model to give the answer you want. You create the model to give the answer that is closest to the control.
“If you only know the temp for 85% of the globe then just say “our metric for 85% of the earth is such and such.”
This statement is 100% accurate! Does it imply that it includes the whole earth? No. Does it let one know that more needs to be determined thru additional measurements? Absolutely!
Look at this image and tell how an interpolation routine would give a smaller value than the two end points have. You would need prior knowledge of what was happening. The real problem is that temps vary from moment to moment. Geostatistics never anticipated solving for a fluid under constant motion as are temps.
Bullshit. You’re aiming at fake completeness by faking data. Being honest about what we actually know may be harder for politicians to communicate, but it’d be accurate.
There are toxic amounts of this substance in this thread.
While infilling large areas of missing data with some modeling approach may improve the estimate of the global mean, it probably does so at the cost of greater uncertainty for the global mean, and almost certainly reduces the accuracy of temperature within the infilled block.
How do the mean and standard deviation of the temperatures of the extracted areas compare with the mean and SD of the infilled temperatures?
Willis
A question that has puzzled me for a while is why fill the blanks in the first place?
We are relying on guesses to get the answer, so why not just use the known world which gives a better apples with apples month on month comparison. As far as people are concerned it’s their local climate that’s of primary concern, and 99.999% don’t know how the guesses work
There are two issues with that approach. 1) The set of unpopulated grid cells changes from month-to-month. 2) The goal is to estimate the mean temperature of the whole Earth and the set of populated grid cells does not represent the whole Earth.
The fictional “mean temperature” you describe also does not “represent the whole Earth,” because you don’t have measurements of the “whole Earth.”
Actual data compared with actual data is better than a bunch of guesswork masquerading as “measurements” you don’t have.
“The idea that interpolation could be better than observation is absurd. You only know things that you measure.”
Ex falso, quodlibet.
Interpolated data can be shown to be false, both empirically and theoretically. Sometimes they are not even ‘approximately true’, to be useful.
The idea that you can infer correctly (not by accident, but scientifically sound) true things from false things is hilarious.
That was a strawman comment. No one is saying that interpolation is better than observation. What is being said is that a local weighted interpolation scheme is better than a non-local scheme for gridded temperature data. Both of Willis’ data denial experiments above demonstrate this.
In regards to the second talking point understand that interpolated data is neither 100% right nor 100% wrong. It’s not a binary concept. There is a spectrum of possible error magnitudes. Willis’ least-effort interpolation had an error of 0.6C while the more robust interpolation was 0.1C.
‘Better’ for what? For example, the method used above managed to pseudo-warm Earth comparing with the original data (which I highly suspect is not the actual temperature field for Earth, but largely invented for the surface).
So, from mostly false data a chunk was extracted, interpolation/extrapolation was used to guess other wrong data that pseudo-warmed Earth even more, but there is the claim that the new data is better than the one without false data extracted out. Why? Because it’s closer to the originally false result, which is hilarious.
Of course you can get similarly false results by using similarly false models.
But in reality the things are worse than in this example. You are not missing only data over a single patch of the Earth’s surface.
There were two methods of interpolation used. The first assumed the unpopulated cells behave like and inherit the average of the populated cells. This results in 0.6C of error. The second used a more robust strategy and yielded 0.1C of error. The experiment was repeated a second time with a different set of unpopulated cells and the results were the same. The more robust approach resulted in a smaller error.
Pffft, the mining industry has been interpolating between data points for ages.
Data points cost a lot to obtain. at some point you decide that your data set is good enough to draw conclusions from. More data does not tell you anything significant.
There are a whole bunch of codes which have been developed to explicitly explain how data has turned into models, what the confidence levels are and what assumptions have been made during the analysis.
A large difference between mining and climate science is that the people signing off on the models take on a personal liability for investment decisions made using those models. You say there was X in that project and it was only 0.7X then you are in big trouble. Imagine if that was applied to climate science!!!!!
Mining decisions are based on geostatistical conditional simulation models rather than kriging, in general.
Some maybe but not the majority. Then which conditional simulation run do you choose? Often the one the estimator prefers 😉
The conditional simulation runs are equi-probable when they come out of the geostatistics. You may be able to reject some post-geostats by comparison to known data. Less likely in mining, but possible in petroleum scenarios eg closure/spillpoint related to a known hydrocarbon contact or connectivity or not between two compartments
But that info should have been used in the initial conditional simulation
It may not be possible to incorporate some types of information in a prior for conditional simulation. For example, in petroleum, it would be almost impossible to include the oil water contact/spillpoint as a constraint in the geostatistical conditional simulation.
However, it is a trivial matter to generate, say, 1000 conditional simulations of a depth surface and then throw away any realisations that don’t show closure within some uncertainty range of a known oil water contact. Its much simpler and more efficient to do it that way and all kinds of unusual knowledge is easy to incorporate.
Btw, I do this stuff for a living, including designing, developing and selling the software and services commercially. 😉
“ the mining industry has been interpolating between data points for ages”, true but they also get it wrong on occasions. If you don’t understand the the data you are dealing with ie geology and grade etc you can only get it right by luck and the “whole bunch of codes ” is of no use. I commented on your reference to the “nugget effect” earlier in this thread and that is a classic case of where things can go wrong and ignoring the geology and using fancy codes is of little use
Willis,
Congalton and Green (1999) have written a book, Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, that recommends an approach of comparing thematically classified imagery with reference ‘ground truth,’ using an “error matrix.” The matrix contains two error metrics, essentially false-positives, and false-negatives.
If one were to treat your temperature maps as thematically classified images, where your colored temperature ranges are treated as themes or thematic categories, then the above approach could be applied to samples of your interpolation. I suspect that the accuracy would be disappointing.
I was involved in a study in 1997 for the city of Scottsdale (AZ), attempting to classify the percentage of impervious surface of the hydrologic basins within the city, which were used to estimate runoff. The city was randomly sampled and classified into several different themes, using high-resolution color aerial photography, which were then aggregated into pervious and impervious categories. The entire city was similarly classified into the same themes using moderate-resolution multispectral Landsat imagery.
Now, the interesting thing is that while the average (to three significant figures) impervious surface was the same for both approaches, the accuracy of the satellite classification was only about 65%, based on the error matrix! What happened is that errors tended to cancel across the entire city when averaged. However, what that meant is that, for individual hydrologic basins, the impervious percentage was often excessively high or excessively low to get accurate runoff predictions. Thus, there might be unpredicted local flooding based on the Landsat satellite classification, or money might be wasted by the city on improving drainage/retention on basins that didn’t need it. It was a difficult test area as, for example, undeveloped areas with heavy creosote growth tended to classify as asphalt because of the low IR reflectance and large shadow component.
In summary, I’m not a fan of using averages to assess situations. Averages should be used very carefully with appropriate concern for the ultimate application.
Coincidentally I just solved the follow clue from my book of Jumbo crosswords: “To investigate endless variation in climate could be tricky (11)”
The answer describes many of the comments here.
It seems that a lot of the discussion here about uncertainty isn’t separating the instrument accuracy from the statistical analysis of the measurements. I took 914 temperature measurements and calculated the standard deviation for the whole set and arrived at 12.6 C (only one decimal point in the measurements, only one decimal point in the result). Then I figured the SD for the first half of the measurements and got 12.7C, and with the other half I got 12.6C again. In any event, the propagation of the +/- 0.5C instrument uncertainty remains the same. So how do these “global average anomalies” with a precision in the hundredths of degrees come about, except by totally ignoring significant digits and instrument uncertainty?
Seems to me we should never see estimates better than something like +0.1C +/- 0.5C. Anything else is made-up numbers.
100%. Mixing standard deviation with uncertainty all of the damn time. My faith that scientists are intelligent and honest people has been destroyed by this field.
The core concept of why a global mean temperature has lower uncertainty than individual temperature measurements is because of the standard error of the mean which states SEM = σ/sqrt(N) where σ is the uncertainty on individual measurements and N is the number of measurements. For example, if you have a grid mesh with 2592 cells and each cell has ±1.0C of uncertainty then the uncertainty of the mean of the grid is 1.0/sqrt(2592) = 0.02C.
You are illustrating the ignorance. The standard error of the mean is not the same thing as the uncertainty of the mean. Each of your N is an INDEPENDENT measurement of different places at different times with different noise functions. The central limit theorem does not make uncertainty disappear.
I’m going to tell you what I’ve told everyone else who said the same thing. Prove this for yourself by actually doing the experiment. You will conclude that all of the statistics text, expert statisticians, etc. were right. Don’t take my word for it; actually do it. It’s not that hard. The experiment can be conducted in Excel in just a few minutes. You could do a full blown monte carlo simulation in R, python, or your favorite language to put the final nail in the coffin on any lingering doubt if it still exists.
All a Monte Carlo simulation is going to do is lower your standard deviation as N goes to infinity. That has NOTHING to do with the propagation of error.
Just because your sample mean converges on the mean of the population doesn’t mean the population as a whole is more accurate than the sum of the parts. Read Taylor’s “Error Analysis.” Do the work.
The monte carlo simulation is propagating the error. That’s the point of doing monte carlo simulations. When there is doubt or disagreement on the uncertainty you can simulate the problem and let the errors propagate naturally and observe the result.
Nobody is saying “the population as a whole is more accurate the sum of the parts”. What is being said is that the uncertainty of the sample mean is less than the uncertainty of the individual measurements within the sample. This is true as long as the error of the individual measurements is randomly distributed.
Your presumption of random distribution of errors is the problem. That’s an assumption, not proven. We don’t know the noise functions of the stations. Every single measurement has an adjudged uncertainty, which propagates as the root mean square.
The uncertainty of the sample mean IS NOT a measurement! It is an interval within which the sample mean lays. The smaller the interval, the closer the sample mean is to the population mean. It is the standard deviation (SD) of the sample mean distribution.
The stationarity and normality of the error terms in air temperature measurements is not known. Every single measurement carries this uncertainty, and for any average, the uncertainty in the average is the root mean square of the uncertainties in the individual measurements.
That’s not a figure that collapses with N like the standard deviation of the sample mean. To put more color on this, and quoting Frank:
To quote Frank “Field-calibrations reveal that the traditional Cotton Regional Shelter (Stevenson screen) and the modern Maximum-Minimum Temperature Sensor (MMTS) shield suffer daily average 1σ systematic measurement errors of ±0.44ºC or ±0.32ºC, respectively, stemming chiefly from solar and albedo irradiance and insufficient windspeed. Marine field calibrations of bucket or engine cooling-water intake thermometers revealed typical SST measurement errors of 1σ = ±0.6ºC, with some data sets exhibiting ±1ºC errors. These systematic measurement errors are not normally distributed, are not known to be reduced by averaging, and must thus enter into the global average of surface air temperatures. Modern floating buoys exhibit proximate SST error differences of ±0.16ºC. These known systematic errors combine to produce an estimated lower limit uncertainty of 1σ = ±0.5ºC in the global average of surface air temperatures prior to 1980, descending to about ±0.36ºC by 2010 with the gradual introduction of modern instrumentation,”
What is the probability that two monthly global mean temperature anomaly values with uncertainty of ±0.5C (1σ) would differ by no more than 0.04C?
You don’t know! That *is* the definition of uncertainty. Uncertainty is not a probability distribution.
I asked Jim, but what is your definition of uncertainty?
Uncertainty is what you don’t know and can never know.
If I tell you I measured a bug at 1 in +/- 0.5 before it flew away, what is the possible range of the actual value?
i now measure a 2nd bug and 2 in +/- 0.5 in. What is the possible range of the actual value?
What is uncertainty in an average?
“Uncertainty is what you don’t know and can never know.”
That’s not a very useful definition. It could just as well apply to error, which you keep insisting is a completely different concept.
“If I tell you I measured a bug at 1 in +/- 0.5 before it flew away, what is the possible range of the actual value?”
That depends on how you are calculating the uncertainty. For example is the 0.5 1 standard deviation, or 2, or is it a general best guess of all possible errors. [sarc] Where is your analysis starting with equation 1 of the GUM? I don’t see it.
I’d say it was probably between 0.5 and 1.5 inches, but the more pedantic answer is that the actual value doesn’t have a range. I’d also wonder how bad you where at estimating if you couldn’t tell if the bug was bigger or smaller than 1″, and also wonder why you are using imperial measurements.
“What is uncertainty in an average?”
I’m using the standard error of the mean as uncertainty or a multiple thereof. I don;t know or care if you could call that “measurement uncertainty”, but it seems to be the same concept. Of course, this is only the uncertainty caused by random independent samples. I there are systematic errors they have to be treated differently.
So, what do you think uncertainty of an average should be?
It’s not a trick question. There is a calculation for it. An alternative to the PDF calculation is to do a monte carlo simulation. Either the value is obtainable and it’s not that difficult. I’m asking the question because we’re all going to test Pat Frank’s hypothesis together.
Who are “we”?
Anyone who is open to testing Pat Frank’s hypothesis that the uncertainty on global mean temperatures is ±0.46C. You in?
Imbedded in your Monte Carlo simulation is going to be the assumption that the error terms are known. That’s precisely the mistake you are making. You don’t know. You can’t simulate what you don’t know. Each and every measurement will have an uncertainty. Those uncertainties in an average will be via the root mean square.
Nope. The averaging routine has no knowledge of the measurement error at all.
If the averaging routine has no knowledge of the measurement error, or do you mean uncertainty, how can it possibly reduce either?
It’s funny that you somehow think consensus is proof.
I never said that. BTW…are you curious to see how Pat Frank’s hypothesis tests out?
Have you read Taylor?? You are out of your depth.
But go ahead and share the excel doc or python code that you think demonstrates uncertainty going away. We’ll look at it and explain why you are wrong, or what assumptions you made which aren’t proven.
Each measurement has an independent uncertainty. These aggregate as the Root sum of squares. In an average, they aggregate as the root mean square.
Disprove this.
https://ibb.co/bFCbTYb
Why would I want to disprove that? I agree with it. That is when you sum measurements you use RSS to determine the final uncertainty.
And since each station has an uncertainty because the error functions are not known, the uncertainty does not reduce with N.
Take 100 stations all with an uncertainty of ±0.5C. You will have 100 terms of ±0.5C squared, summing, and then square rooted. And what do you get?
SQRT(( 0.5^2 * 100)/100) = ±0.5
Do you understand now?
Nope. You’ll have 100 terms of ±0.5/sqrt(100) that you then square, sum, and square root. Dividing a value by a constant C causes the constituent pieces of that value to inherit an uncertainty that is multiplied by 1/sqrt(C).
This is easy to demonstrate with an example. Assume you have 3 boards of length 1.00±0.100. When you lay the boards end-to-end you get length 3.00±0.170. If I want to then divide this value by a constant 7 then I also have to divide the uncertainty by sqrt(7). My result of the division is (3.00/7)±(0.170/sqrt(7)) = 0.43±0.064. Note that sqrt(0.064^2 * 7) = 0.170…our original uncertainty on the combined length so that checks out. Now, let’s say want to divide 3.00±0.170 by 3 instead. The result is now (3.00/3)±(0.170/sqrt(3)) = 1.00±0.100. And do you notice anything special about our choice of constant 3 here? Yep, that is the constant we would use to compute the average if we knew there 3 boards! In other words, the 3 is a special constant in so far as it is the one that yields an average. Though, no constant is treated any differently than any other in terms of its effect on the uncertainties.
Keep in mind that I derived the σ/sqrt(C) formula in a post down below where σ is the individualized uncertainty of the values in the sample and C is the constant by which you want to divide each value prior to summing them. If you choose C such that C=N then you’ve actually computed the mean of the sample when you do the summing.
BTW…I have no idea where you got the formula sqrt(σ^2*N)/N. Do you mind explaining where you found that and how it was derived?
“Dividing a value by a constant C causes the constituent pieces of that value to inherit an uncertainty that is multiplied by 1/sqrt(C).”
That’s not how uncertainty propagates. if q = Bx where B is a constant then the resultant uncertainty equation is:
u_q^2 = u_B^2 + u_x^2 where u is the uncertainty.
Since the uncertainty of a constant is zero this becomes:
u_q^2 = u_x^2
“This is easy to demonstrate with an example. Assume you have 3 boards of length 1.00±0.100. When you lay the boards end-to-end you get length 3.00±0.170.”
Nope. When you lay down first two the overall length lies somewhere between 3 + u1 + u2 and 3 – u1 – u2.
When you add the third board your overall length will lie between:
3 + u1 + u2 + u3 and 3 – u1 – u2 – u3
If u = .1 then your final result is 3 +/- (u1+u2+u3) = 3 +/- 0.3
If you don’t believe me ask any framing carpenter. +/- .17 is a *possible* length but you don’t know if it is the true length or not. The maximum and minimum uncertainty has to be considered or you wind up with wavy ceilings, roofs, etc.
You’ve surely heard the old adage “measure twice, cut once”? If you want to lower your uncertainty then you MEASURE the actual length, you do *not* assume the uncertainty has somehow been lessened by adding boards.
How long can you avoid opening your copy of Taylor to page 54, box (3.9), to see you are wrong?
or in Taylor’s notation
or if you can’t understand the equation he spells it out
You still don’t get fractional uncertainty, do you?
You multiply by B when you are calculating fractional uncertainty! Fractional uncertainty is uncertainty/value. The value is calculated with B. The uncertainty is still delta-x!
Except that’s literally the opposite of what Taylor says. The uncertainty is delta_x times the modulus of B. He specifically points to the equations where fraction uncertainty remains the same after multiplying x by B, and they are written
The fractional uncertainty is uncertainty/value. That’s what the above equation is using. It is saying the fractional uncertainty in q is the same as in x, and then he goes on to show how this means the non-fractional uncertainty of x has to be multiplied by B to get the non-fractional uncertainty of q.
And in case anyone doesn’t follow he immediately gives an example where he divides a measurement by 200 and does the same to the uncertainty of that measurement. Nothing whatsoever about this being the fractional uncertainty.
If you where right, then you would not divide the uncertainty by 200, the uncertainty of a single sheet of paper would be the same as the uncertainty of the stack of paper, 0.1 inch, and the whole exercise would be pointless.
“ It is saying the fractional uncertainty in q is the same as in x, and then he goes on to show how this means the non-fractional uncertainty of x has to be multiplied by B to get the non-fractional uncertainty of q.”
Taylor says first that delta-q/q = delta-x/x. Fractional uncertainty.
Taylor: “That is, the fractional uncertainty in q = Bx (with B known exactly) is the same as that in x.”
Taylor then follows this with his example of 200 sheets.
if the stated value is 1.3 +/- .1 inches then the stated value for one sheet would be 1.3 +/- .1 inches divided by 200 giving
.0065 +/- .0005 inches.
In this example, the constant is 200. q = Bx where B = 200.
Thus the uncertainty in q becomes delta-q = (200)delta-x = .1 inches.
What is so difficult about this math? It’s the very issue we are discussing! Uncertainty ADDS. That’s all the (200)delta-x implies. Delta-x being added to itself 200 times!
You keep wanting to believe you are somehow minimizing uncertainty in some manner with this math. You aren’t.
When you have q = Bx then you have to know delta-x. In the 200 sheet example you divide a stack of 200 sheets by 200 to get delta-x. You then multiply it times 200 to get delta-q.
If each sheet were truly 0.0065±0.0005 then when you combine the sheets into a stack of 200 RSS says the combined uncertainty is (0.0065*200)±(sqrt(0.005^2 * 200)) = 1.3±0.007.
Good, we seem to making progress. From this comment I take it that you now accept that the correct equation for uncertainty when
is
, and not
, as you were previously saying. So we both accept that scaling a quantity also scales the uncertainty.
This still leaves a couple of issues. You say, I think becasue you still cannot accept that B can be less than 1, that B = 200, and that Taylor is saying in the case of 200 sheets of paper, that
The problem is that x is a meant to be the uantity that has been measured and has a known uncertainty, whilst q is the quantity derived from x by multiplying by B. As previously gone over, Taylor’s example is measuring the whole stack of paper, not an individual sheet, so x is the stake of paper,
is the uncertainty in that measurement (0.1″), and q is therefore the thickness of a single sheet of paper, with uncertainty
If it were the other way round, Taylor would have to be able to measure a single sheet of paper with uncertainty 0.0005″, something he insists would require a very accurate measure. The whole point of the exercise is that it’s possible to derive the single paper with remarkable precision without expensive equipment.
“f you where right, then you would not divide the uncertainty by 200, the uncertainty of a single sheet of paper would be the same as the uncertainty of the stack of paper, 0.1 inch, and the whole exercise would be pointless.”
You *still* don’t understand uncertainty! The uncertainty in the stack of 200 is the sum of the uncertainty in each of the 200 sheets.
All Taylor is doing is working it backwards! Dividing the uncertainty of 200 sheets by 200 in order to get the uncertainty of each individual sheet! You then multiply by 200 to get back the uncertainty of the entire stack.
Please note carefully that Taylor also assumes each sheet is identical. In this case each sheet becomes, in essence, the same messurand. If each sheet has a different uncertainty then those uncertainties will still add but it is no longer a simple multiplication by the number of sheets.
Taylor’s example is a perfect example of what I have been asserting. You keep on seeing what you want to see and not what is actually there!
“All Taylor is doing is working it backwards! Dividing the uncertainty of 200 sheets by 200 in order to get the uncertainty of each individual sheet!”
Er, yes, that’s been my point all along.
“You then multiply by 200 to get back the uncertainty of the entire stack.”
Why would you do that? We already know the uncertainty of the entire stack.
“Please note carefully that Taylor also assumes each sheet is identical.”
I wondered how long it would take for you to notice that. Yes, in fact he doesn’t just assume it, he states it as a necessary condition. That’s because he isn’t interested in the thickness of an average sheet of paper, but the thickness of a single sheet of paper.
This gets back to the second problem we keep having. You think that if a metrologist says you can use averaging to get a more precise measure of a thing, and calculates it using the the statistical laws, that means that those laws can only be used to measure a single thing. I disagree. The equations for things like the standard error of the mean existed long before Taylor. All the metrology text books are using these rules for a specific application, but that does not mean you cannot use them for different applications. In particular, they don’t change just because you are measuring different sized things.
Personally, I thinks it’s a bit foolish to demand you know all sheets are eually thick, without defining what equal means. How could you know unless you measure each, and given there will still be uncertainty you still couldn’t be certain. In any event, can any two sheets ever be exactly the same?
By the way, you may rememebr a few days ago I asked you to define what you meant by the same and different objects. In particular I asked if you considered 200 sheets of paper to be 200 different objects, or if the thing you were measuring is the thickness which is just one thing. I take it that you do consider them to be the same thing, that is the same measurand, so I would still appreciate it if you could define the distinction between the same and different.
And what is the probability that u1, u2, and u3 would all present as +0.1 simultaneously?
You sum measurements to get an average. That’s how an average works.
No. You sum measurements that have been divided by N to get an average. THAT is how an average works. Simply summing the measurements gives you something entirely different.
LOL. It’s equivalent. 1/N is the common term. But you have to SUM the measurements AND THE UNCERTAINTIES.
No. sum(Si, 1, N) is not equivalent to sum(Si/N, 1, n). Those produce different results. Note that the later is equivalent to average(Si, 1, N) though.
Anyway, let me do the derivation in a completely different way than what I did down below using variance the expectation property. This time I’ll start from RSS since you want to approach the problem from a summing angle. That’s totally fine. We can do it that way too. Just know that the uncertainty of the original values Si in the sample is not the same as uncertainty of the divided values Si/N in the resample. Here’s how this works.
When you combine measurements you use the RSS equation
[1] U = sqrt(σ^2 * N)
where σ is the common uncertainty of the constituents and N is the number of constituents.
The opposite of combining uncertainties into a final uncertainty is partitioning it back into its constituents. To find the uncertainty of the constituents we solve the RSS equation [1] for σ to yield [2].
[2] σ = sqrt(U^2/N) = U/sqrt(N).
That is the uncertainty when you partition a combined uncertainty back into the common uncertainty of the constituents.
So if an element of a sample Si has uncertainty Ui then the element Si/N would have uncertainty given by [3] after swapping U for Ui in [2].
[3] σ = Ui/sqrt(N)
Then if you want to sum the elements Si/N you can use the RSS equation [1] again but swapping σ for [3] to yield [4].
[4] Us = sqrt((Ui/sqrt(N))^2*N) = sqrt(Ui^2 * N) = Ui/sqrt(N)
where Us is the combined uncertainty after summing all elements Si/N and Ui is the common uncertainty of Si and N is the number of Si elements.
And because N is the number of elements in the sample and because we summed Si/N then we necessarily computed the uncertainty of the mean. It is worth repeating [4] in more familiar notation as [5].
[5] Umean = Ui/sqrt(N)
And here is another derivation that is even simpler and drives the point home that the uncertainty of the mean of the sample must be lower than the uncertainty of the individual elements within the sample.
Start with the RSS equation [1].
[1] U = sqrt(σ^2 * C)
where C is a constant you want to multiply each measurement.
We can simplify [1] into [2].
[2] U = σ * sqrt(C)
And if we choose 1/N for our constant we get [3].
[3] U = σ/sqrt(N)
And because N is the number of elements in the sample we declare [3] the uncertainty of the mean.
Let me make this perfectly clear. I’ve presented 3 different derivations of this fact two of which start with the RSS equation. If you accept the RSS equation then you have to accept the SEM equation as well.
Your random equations have nothing to do with averaging.
Are you challenging the RSS equation or the fact that average(Si, 1, n) = sum(Si/n, 1, n) = (S1/n + S2/n + … + Sn/n)?
U is U_total. sigma_i is the individual uncertainty, not the uncertainty of the mean.
That is correct. In the equation [1] U = sqrt(σ^2 * C) the variable U is the total uncertainty and σ is the individual uncertainty. You’ll recognize the equation as the combined root sum squared uncertainty. Now plug in 1/N for the constant C so that the sum yields the average. Note that you get equation [3] U = σ/sqrt(N). All we’re doing here is applying RSS. BTW…yes, I did the monte carlo simulation proved this is correct. Note that I needed to since this equation is in every statistics text and universally accepted by all statisticians but whatever I did it anyway because it was fun.
NO! This is not how uncertainty is calculated!
How is combined uncertainty when adding measurements calculated if not by root sum square?
You aren’t doing root-sum-squared. You are doing RSS divided by sqrt(N). Not the same things.
I’m definitely using RSS. You can see that quite clearly in equation [1] in my post.
It is only the root-sum-square of individual elements with exactly the same uncertainty. If each element has different uncertainties then you don’t wind up with a common factor equal to the number of elements.
You aren’t applying root-sum-square except for a specific, tailored example.
Once again, the average is not the uncertainty. The average is calculated from the stated values, not from the uncertainties.
What I am saying is that you pull equations out of anywhere and use them without understanding what they mean.
I thought we all agreed that root sum squared was the method for combining uncertainty when adding measurements?
You are correct. All he is doing is calculating the uncertainty of each individual member, not of the average.
It is obvious to me that he likes to buffalo people with lots of big words.
U_total^2 = (sigma_i^2) * N
U_total*2/N = sigma_i^2
U_total^2/N is the individual sigma for each member squared
sigma_i = sqrt(U_total^2/N) = U_total/sqrt(N)
This has nothing to do with the mean or its uncertainty. It only calculates the uncertainty of each individual member.
“U_total^2/N is the individual sigma for each member squared”
That’s right. We also call that the variance.
“sigma_i = sqrt(U_total^2/N) = U_total/sqrt(N)
This has nothing to do with the mean or its uncertainty.”
Your sigma_i is the combined uncertainty of adding individual members after they have all be divided by 1/N. What happens when you add up values that have been divided by N where N is the number of members? You end up calculating the average! Test this out for yourself with an example if you don’t believe me, statistics texts, and expert statisticians.
“Your sigma_i is the combined uncertainty of adding individual members after they have all be divided by 1/N.”
Nope. They are *not* divided by 1/N. The sum of the stated values are multiplied by 1/N in order to determine the average of the values. That has nothing to do with the uncertainty. If the uncertainty of the individual elements are not common then what do you divide by?
“ U = sqrt(σ^2 * N)”
This is not the generalized formula.
The generalized formula is:
U_total^2 = u_1^2 + u_2^2 + … + u_n^2.
You only get U_total = sqrt( u^2 *N) if u_1 = u_2 + u _3 …..
If the individual elements have different uncertainties then there is no N.
You cannot average away instrumental uncertainty. Say I want to determine the voltage on the output of a power supply. I have 2 meters, a 3.5 digit DMM and an 8.5 digit DMM. You are arguing that if I take enough readings using the 3.5 digit DMM that my average would eventually be more accurate than a single reading using the 8.5 digit DMM. Taking multiple readings may improve my accuracy due to random noise etc. But for each 3.5digit DMM reading the uncertainty has to be included and the final result is a mean +/-uncertainty1. This uncertainty has 2 parts, one due to the readings and one due to the instrument uncertainty itself.
Taking readings with the 8.5digit DMM reads to a different voltage level+/-uncertainty2 where the uncertainty2 is much less than uncertainty1. Again, averaging can reduce random effects such as noise. But i always have a mean (best estimate) plus an uncertainty.
Try reading Doubt Free Uncertainty by Ratcliffe. A practical approach to uncertainty.
Before discussing this we need to be pedantic and agree on terms. I use the accepted definitions described most succinctly by the graphics below. Note that in ISO 5725 “accuracy” is synonymous with “trueness”. Informally “bias” is also used to mean the same thing which I liberally employ all of the time.
When it comes to measuring the Earth’s temperature you are missing the fundamental point. There is no true value. All we have is a set of measurements. As Rattclife says on page 1: “If we are weighing an object, then the ‘True Value’ is the actual value of that measurand
(the weight). Sadly, we do not know the true values of any measurands,
which is the reason we measure them. For example, if you are checking the electrical resistance of a resistor, the only reason you are measuring the resistance is because you do not know it. If you knew exactly the electrical resistance you would not have to measure it, and there would be no need for this monograph!”
Try to imagine the target above but with no cross-hairs. The mean of the values is simply the best estimate of the values. Since you don’t have a target point to aim at how do you know if you are accurate? You might be a precise shot but very inaccurate. But without the cross-hairs you will never know. And there is no correct location (or temperature) for you to compare your shots (measurements) against.
What you have to do is consider all possible uncertainties, maybe eliminate ones which you can confidently say are negligible and generate an uncertainty budget. Arm shaking might be one such uncertainty, wind speed another etc.If you say I can measure wind speed then wind speed measurement uncertainty must be included and so on. The GUM provides a method for doing this.
IMO all weather stations should have a published uncertainty (to 95% CI) with traceable calibration over time. The uncertainty budgets should be calculated and published. The next step is then to show how the uncertainty from each station is then combined with others to give an overall uncertatinty budget. Not an easy task IMO.
Please help this guy understand. Since we don’t know that about weather stations, and since we know the instruments were less precise prior to the 80s, every single measurement has an uncertainty associated with it that does not go away, and that uncertainty was larger in the past. Every single station, measuring a different location at a different time has an uncertainty with each measurement. They do not reduce with N. Berkeley Earth and the other groups have committed a scientific travesty by pretending that errors are random and that each measurement is like a repeat measurement of some concrete thing called “average temperature.” And intelligent people have fallen for it.
“every single measurement has an uncertainty associated with it that does not go away”
I never said otherwise.
“and that uncertainty was larger in the past”
I never said otherwise.
“Every single station, measuring a different location at a different time has an uncertainty with each measurement.”
Agreed.
“They do not reduce with N.”
Agreed. But note what is being implied here. You are trying to imply that said the uncertainty on individual measurements decreases as N increases. I never said that. I don’t think that. And I don’t want other people to think that.
What I said was that the uncertainty on the sample mean decreases as the sample size increases. That is not the same thing as saying or imply that the uncertainty on the measurements with in the sample decrease as the sample size decreases. The keyword here is the mean. I can’t emphasize that enough.
“Berkeley Earth and the other groups have committed a scientific travesty by pretending that errors are random and that each measurement is like a repeat measurement of some concrete thing called “average temperature.”
No they don’t. They do not pretend that individual temperature measurements are not biased. In fact, they and other groups work really hard to identify and compensate for biases both of known and even unknown origin. They also don’t pretend that the gridding method does not inject artificial biases either. They also don’t pretend that the spatial averaging routine does not inject artificial biases as well. BEST actually has a clever and unique (at least among the surface station datasets) method for quantifying and dealing with this issue. Then we have other groups (like Copernicus) which uses renalysis datasets that handle the identification and quantification biases in a completely different way. Yet all of these groups who produce full-sphere mean temperatures provides estimates that agree at about σ = 0.05C on a monthly value.
“What I said was that the uncertainty on the sample mean decreases as the sample size increases.”
If your sample mean isn’t getting you closer and closer to a true value then of what use is it?
If you buy 10 different 1/4″ metal rods of random length at ten different Tractor Supply store and measure them with ten different measuring tapes you also buy, will the mean of those measurements give you a “true value” for any of the rods? What will be the irreducible uncertainty of the mean of those ten rods of random length as measured by ten different devices? If you buy an additional ten rods of random length at ten different Lowe’s will and add them to your universe will the mean change? Will it remain the same? Will it give you a better “true value” for the length of the rods? Will the uncertainty of the mean get any smaller?
“If you buy 10 different 1/4″ metal rods of random length at ten different Tractor Supply store and measure them with ten different measuring tapes you also buy, will the mean of those measurements give you a “true value” for any of the rods?”
No. But what it will do for me is provide an estimate of the mean of those rods with an uncertainty that is σ/sqrt(10) where σ is the standard deviation of the lengths of the 10 rods.
I think the issue here is that you think we’re saying that the mean provides information about specific elements (rods in this case) and that having a bigger sample of elements means you reduce the uncertainty regarding specific elements. That’s not what we are saying at all. What we’re saying is that the uncertainty of the mean decreases as the sample size increases. I’ll repeat the mean is the crucial context here.
“What will be the irreducible uncertainty of the mean of those ten rods of random length as measured by ten different devices?”
If each of those devices produce measurements with an uncertainty σ then the uncertainty of the mean is σ/sqrt(N). Note that σ is assumed to be the final combined uncertainty of each measurement including any truncation uncertainty, precision uncertainty, and random accuracy uncertainty.
“If you buy an additional ten rods of random length at ten different Lowe’s will and add them to your universe will the mean change?”
Yes. It probably will.
“Will it give you a better “true value” for the length of the rods?”
No. But it will provide a better estimate of the mean of the 20 rods as compared to the mean of the original 10 rods. I cannot emphasize the important of the mean enough here. That’s why it is bold.
Will the uncertainty of the mean get any smaller?
YES!
“No. But what it will do for me is provide an estimate of the mean of those rods with an uncertainty that is σ/sqrt(10) where σ is the standard deviation of the lengths of the 10 rods.”
You are right, the answer is no. But the uncertainty of the mean is the RSS of the individual uncertainties. You do not divide by N like you do when calculating the average. It’s just RSS. And the uncertainty of the mean is the RSS.
You keep confusing how precisely you can calculate the mean with the uncertainty of the mean. They are *not* the same thing.
“That’s not what we are saying at all. What we’re saying is that the uncertainty of the mean decreases as the sample size increases. I’ll repeat the mean is the crucial context here.”
The interval within which the mean exists is not the same thing as how uncertain the mean *is*. I know that is a difficult concept to understand for someone not versed in physical science but it *is* a truism. The uncertainty of the mean *is* based strictly on the combined uncertainty of the elements, not on how precisely you calculate the mean.
And, once again, if the mean doesn’t match any single element then of what use is it? Calculating that mean more and more precisely doesn’t tell you anything you can physically use regarding the elements. That’s the problem with the GAT. It tells you nothing. You don’t know if minimum temps are going up, maximum temps are going up, minimum temps are going down, maximum temps are going down, or any combination of the previous. The GAT would only be useful if it told you something about the temperatures, i.e. the individual elements. Even the daily mid-range values calculated for a single location from a single station tells you nothing about what actually, physically happened at that location that day. That mid-range value can be the result of all kinds of combination of minimum and maximum temps. And if the individual station taking those temps has an uncertainty of +/- 0.5C then the mid-range value has an uncertainty +/- 0.7C. Uncertainty grows, it doesn’t decrease. You don’t get +/- 0.35C by dividing by 2. Average mid-range values from two locations and the uncertainty of the average grows even more – to about +/- 1C, not +/- 0.5C
It’s the board example all over again.
“f each of those devices produce measurements with an uncertainty σ then the uncertainty of the mean is σ/sqrt(N). “
Why are you stuck on this? If you have ten different measuring devices you most likely don’t have a common uncertainty for each of the ten.
How do you handle that? You won’t have an N.
You are still stuck on believing that how precisely you can calculate the mean with how uncertain the mean is.
“ Note that σ is assumed to be the final combined uncertainty of each measurement including any truncation uncertainty, precision uncertainty, and random accuracy uncertainty.”
What makes you think there is a common σ?
“No. But it will provide a better estimate of the mean of the 20 rods as compared to the mean of the original 10 rods.”
Will that mean help you build a frame for a box that is square in all dimensions? If not, then how does an ever more precisely calculated mean help you in any way physically?
How does a mean temperature that doesn’t match any of the constituent elements help you in any way physically? Does that mean going up imply the earth is going to turn into a cinder? How do you know that from the GAT?
I don’t disagree with anything you said here. But nothing you said challenges the fact that the uncertainty of the mean of the sample is always less than the uncertainty of the individual measurements within the sample. That is the broad concept that explains why the uncertainty on a global mean temperature is lower than the uncertainty of the individual measurements upon which it is based.
Incorrect. It completely depends on the stationarity and normality of the error terms. You’re assuming a lot in your statement without realizing it.
All I’m assuming is that the error on the individual measurements is randomly distributed and that the sampling itself is random.
How do you KNOW what probability distribution of the random measurements provide? How do you know the errors are randomly distributed? Are you assuming a normal distribution for both? If so, why?
When you are measuring different things using different things (i.e. different temps measured with different measuring stations) you don’t know what the errors might be. Or at least I don’t. You can easily have more stations reading high instead of low or reading low instead of high. What is the assumption of random distribution based on?
And how do you know the population from which you are sampling a bazillion times is not changing over time?
You don’t know! That just adds to the uncertainty!
“ All I’m assuming is that the error on the individual measurements is randomly distributed and that the sampling itself is random.”
Exactly. You are assuming things that are known to be false. Error isn’t randomly distributed. It’s systemic. That’s the crippling error you’ve made. Each and every station has an uncertainty, and that propagates into your GAT. Your confidence interval on the sample mean, excluding their uncertainties, will narrow as N goes up, but the uncertainty doesn’t go away. All you get is a narrow confidence interval of the sample mean ignoring uncertainty, presuming every element of N is a perfectly accurate measurement.
I never said error in the context global mean temperatures was completely random. In fact, if you follow my posts closely you will have seen that I pointed out several times that the uncertainty on global mean temperatures is higher than the uncertainty of the mean would suggest precisely because of measurement and sampling bias. And I especially never said the uncertainty goes away. I didn’t claim it did. I don’t believe it does. And I don’t want other people to think it either.
My only claims in this blog post are.
1) Interpolation of unpopulated cells in a gridded temperature field using a local weighted strategy is better than a non-local strategy.
2) The uncertainty of the sample mean is lower than the uncertainty of the individual elements within the sample.
“ 2) The uncertainty of the sample mean is lower than the uncertainty of the individual elements within the sample
Wrong, as has been shown. You think the standard deviation of the sample mean is uncertainty, and it’s not. Uncertainty propagates throughout the entire calculation, and it’s impossible to get a GAT to within +-0.05C with thermometers rated to +-0.6C, which aren’t regularly calibrated. A giant band of uncertainty overwhelms whatever arbitrarily small confidence interval you get for your so-called sample mean.
“ry to imagine the target above but with no cross-hairs. The mean of the values is simply the best estimate of the values. Since you don’t have a target point to aim at how do you know if you are accurate? You might be a precise shot but very inaccurate. But without the cross-hairs you will never know. And there is no correct location (or temperature) for you to compare your shots (measurements) against.”
Very good explanation! Hope you don’t mind if I use it.
I would only add the fact that each of those shots could be from a different firearm if you are trying to simulate measuring different things. So does the mean value tell you anything about the true value of each of the rifles?
BTW, the federal standard for land-based measuring stations is +/- 0.6C (taken right out of their handbook). If you don’t know the exact uncertainty of each station then using this value will at least get you in the ballpark.
You are dealing with error again, not uncertainty.
Uncertainty is basically a square distribution (not a unit square) where the only possible values all have a probability of 1. IOW, any value is just as possible as another and you can’t know what the real value is inside the interval.
“…where the only possible values all have a probability of 1”
That’s nonsense, how can all values have a probability of 1?
The distribution in an uncertainty can be uniform, or it can be normal, or it can be any shape. If you have no information about the source of the uncertainty you have to assume a rectangular distribution, but if you have more information you can use a more accurate distribution. This is illustrated in the GUM 4.4.4 and 4.4.5.
Actually all but one of the values in an uncertainty interval have a probability of zero. Only one has a probability of 1. The problem is that you don’t know which value has the probability of 1. And there is no way to determine it.
Uncertainty is *not* a probability distribution so there is no “mean” that can give you a true value.
True, and I probably should have used more statistically correct language, but I also find this sort of argument a bit pedantic. The fact is, if you don’t know what a value is, it can be described as having an uncertain value given what you know. If I ask you to pick a card, don’t look at it, then ask you what the probability of it being a heart is, you would be entitled to say 1 in 4, rather than it might be 1 or it might be 0.
You can never know the true value, but you can estimate how close your measurement is likely to be to it, that’s the uncertainty and it has a probability distribution.
Your example doesn’t work. It has nothing to do with calculating a mean.
If the question was “What are the odds that the card you picked is the mean of the deck? That it is the true value for the deck?”
You wouldn’t have any idea and couldn’t have any idea.
If you don’t know what the true value is then how can you estimate how close you are to it? The uncertainty interval only tells you an interval in which the true value might lie, it can’t tell you how close you are to the true value.
Again, uncertainty has *NO* probability distribution – NONE. If it did then you *could* estimate how close your stated value is to the true value. Again, only one value in the uncertainty interval has probability of 1 as being the true value. All the rest have zero probability. The problem is that you DO NOT KNOW which value has a probability of one and you can never know! If you did then the uncertainty interval would be +/- 0.0!
My example had nothing to do with calculating a mean, it was how you describe probability when talking about uncertainty. If you follow the comments backwards you will see this originated with the example from Jim talking about the uncertainty in of a single measurement.
“If you don’t know what the true value is then how can you estimate how close you are to it?”
What do you think Taylor, GUM and all the other authorities on measurement uncertainty are doing? They are saying that even though you can never know the true value, you can define and estimate the uncertainty. A measure of uncertainty is .an indication of how far you are likely to be from the true value. If you define the uncertainty interval as an interval in which the true value might lie, you are indicating how far any measurement might be from the true value.
If one uncertainty interval is ±10mm, and another is ±0.01mm, is a measure with the second uncertainty more or less likely to be closer to the true value than a measure from the first?
Taylor doesn’t say you can estimate how close you are to the true value. He says the true value can lay anywhere in the uncertainty interval. If the true value can be anywhere in the uncertainty interval then how does it indicate how far off you are from the true value.
The uncertainty interval is an interval around the stated value not around the true value. The true value may be at the negative boundary, it may be at the positive boundary, or anywhere in between.
“If one uncertainty interval is ±10mm, and another is ±0.01mm, is a measure with the second uncertainty more or less likely to be closer to the true value than a measure from the first?”
You simply don’t know! Even if the uncertainty interval is +/- 10mm the stated value may actually be the true value! You don’t know! It may be far more accurate than a stated value with +/- 0.01mm. YOU DON’T KNOW! That’s the whole point of uncertainty!
You are confusing precision with uncertainty. The measured value with an uncertainty interval of +/- 0.01mm is going to be far more precise and repeatable than one with a +/- 10mm. But precision is not uncertainty!
“Taylor doesn’t say you can estimate how close you are to the true value. He says the true value can lay anywhere in the uncertainty interval. If the true value can be anywhere in the uncertainty interval then how does it indicate how far off you are from the true value.”
Common sense. What’s the point of trying to reduce the uncertainty is not to get a more accurate measurement? Why average multiple measures if the average will not be more likely to be closer to the true value?
You said yourself that the true value might be at the edge of the interval. If the edges are closer together that reduces the chance that they will be further apart.
“You simply don’t know! Even if the uncertainty interval is +/- 10mm the stated value may actually be the true value!”
You missed the all important word “likely”. I’m not asking for certainty, I’m asking is one thing more likely than the other.
“Common sense. What’s the point of trying to reduce the uncertainty is not to get a more accurate measurement? Why average multiple measures if the average will not be more likely to be closer to the true value?”
You improve uncertainty from independent, random measurands by using better measurement techniques and equipment, not by trying to pretend that the measurements are all from the same measurand.
Even after doing this you still face the problem that the mean of random, independent measurands is meaningless because they may not define a probability distribution that can be analyzed using statistics. The mean may not even match any of the members of the data set.
“You said yourself that the true value might be at the edge of the interval. If the edges are closer together that reduces the chance that they will be further apart.”
But you can’t arbitrarily reduce uncertainty of the mean by dividing it by N or sqrt(N), not with random, independent measurands. You can only do that if you have a known probability distribution around a single measurand.
“You missed the all important word “likely”. I’m not asking for certainty, I’m asking is one thing more likely than the other.”
You are still hell-bent on confusing precision with accuracy, aren’t you? If you do not know the true value then the stated value with the widest uncertainty might very well be the true value, you simply don’t know. Same with the a stated value with a smaller uncertainty interval. And you can’t decrease the uncertainty interval for random, independent mesurands by pretending you have a Gaussian distribution and dividing by N or sqrt(N). The uncertainty of the mean in this case is the uncertainty associated with the final result of your sum of the member values. That uncertainty grows without bound as you add more random, independent measurements.
Again, I simply do not understand why this is such a hard thing to understand. I had it drummed into me in all of my engineering labs back in the 60’s and 70’s. You simply can’t find a “true value” using single measurements of multiple circuits, multiple different motors, etc. You can only find a true value for each independent, random measurand by doing multiple measurements on each inddividual, independent, and random measurand. The average of those true values won’t give you a separate true value for the universe of measurands. The average you reach will have an unceratinty that grows with each independent, random measurand you add to the bunch.
“Even after doing this you still face the problem that the mean of random, independent measurands is meaningless because they may not define a probability distribution that can be analyzed using statistics. The mean may not even match any of the members of the data set”
The measurand is the mean we are trying to measure by averaging random independent measurements of random samples. Those samples are from a probability distribution – the population. Any random errors in the measurements just alter the distribution, but not usually by much. You can of course analyse all of this statistically – statisticians have been doing it for over a century.
There is absolutely no requirement for the mean to match any of the members of the population. Why on earth do you keep thinking that is a problem. If the population is the role of a 6-sided die, the mean will be 3.5, no individual roll will match that.
“But you can’t arbitrarily reduce uncertainty of the mean by dividing it by N or sqrt(N), not with random, independent measurands.”
Oh, yes you can. This argument will keep going on until you provide some evidence for your claim.
“If you do not know the true value then the stated value with the widest uncertainty might very well be the true value…”
It may be, but it’s less likely. And I’m not interested in finding the “true value” as that’s something that’s very unlikely to happen, and you can never know if it does. What I want to do is reduce the uncertainty so that the sample mean is more likely to be closer to the true mean.
“The uncertainty of the mean in this case is the uncertainty associated with the final result of your sum of the member values.”
And you are still claiming this, despite it deifying all logic, experimentation and every text book I can find. So again produce some evidence to support your claim. I mean it shouldn’t be that difficult if it were true. Find a quote in one of your books that says that, or run a simulation, or any sort of experiment.
“Again, I simply do not understand why this is such a hard thing to understand.”
Because it defies all sense, and you haven’t produced a single sensible argument to support it. For one thing it implies that the uncertainty of a mean of measurements can be bigger than any individual measurement. It implies that if you measure 100 men with an uncertainty of 1cm, the uncertainty in the mean could be 10cm. It implies that if you take the measure of 1000000 men the true mean could be 10 meters different from your sample mean. That’s what I find hard to understand.
“I had it drummed into me in all of my engineering labs back in the 60’s and 70’s.”
Maybe engineering labs in the 60s and 70s should have spent less time drumming rules into you and more time getting you to understand what they meant.
Wouldn’t the mean actually be 4, given the one significant digit on the face of a die?
Of course not. The actual mean is 3.5. This isn’t even anything to do with uncertaint, it’s the expected value of a fair die.
Claiming it should be 4 just illustrates the wrongness of people here claiming you should state calculated measures to the same number of digits as the individual measures. I don’t know why they think that, it makes no sense and is not what any rule for reporting digits says, at least none I’ve see. But for some reason it seems to have become an article if faith, mainly I assume so they can claim global anomalies should only he reported to the nearest integer, which they think will mean global warming doesn’t exist.
Holy Christ, a mean is not a measurand. It’s a descriptive statistic.
Granted I know little about metrology, but as far as I can see a measurand can be any thing or quantity you want to measure. I’ve found nothing to suggest you cannot treat a population mean as something to be measured. But if you can’t that’s fine, it just means that none of the documents I keep being pointed to on metrology have anything to say about the statistics of taking means.
If that is the rule then no station temperature value can be considered a measurement since even the “raw” values are actually the average of 6 samples taken 10 seconds apart. And the Tmin and Tmax are actually the min/max of all 5-minute averages. Obviously this is for modern era digital instrumentation.
That is also true. The average of the high and low is about as arbitrary as the GAT, but at least you’re taking it from the same location each day.
With temperature, you aren’t taking multiple measures of the same thing. You’re taking measures that thousands of different places, at different times. You’re measuring millions of different things and presuming it represents a single thing.
I keep asking for a defintion of “the same thing” verses “different things”, and never get an answer. Let alone an explanation of why the laws of probability change between the two.
As far as I’m concerned if you take millions of measurements of different temperatures, you are measuring the single thing the mean temperature. If I want to estimate what the global temperature is on a given day, or over a specified period, how do I do that by measuring just one place at one time?
Type in circles you.
The card deck is fixed (unless you are cheating).
“You cannot average away instrumental uncertainty.”
I never said you could. What I said is that the uncertainty of the mean is lower than the uncertainty of the individual measurements.
“You are arguing that if I take enough readings using the 3.5 digit DMM that my average would eventually be more accurate than a single reading using the 8.5 digit DMM.”
No. That is definitely not what I’m arguing. Any accuracy error or bias will show up in the average as well. What averaging will do is result in a lower precision error for the mean than for individual measurements.
“Taking multiple readings may improve my accuracy due to random noise etc.”
No. It won’t. Any accuracy error or bias will show up in the average as well. What averaging will do is result in a lower precision error for the mean than for individual measurements.
“This uncertainty has 2 parts, one due to the readings and one due to the instrument uncertainty itself.”
Agreed. There is uncertainty that arises from digit truncation, rounding, etc. and there is uncertainty that arises from noise. These uncertainties combine via RSS via sqrt(u1^2+u2^2) to form the final combined uncertainty u3. Any use of uncertainty in the standard error of the mean formula should be using the combined uncertainty. That is not being challenged by me.
I will say that when someone tells me the measurement uncertainty is ±X I’m assuming X is the final combined uncertainty. And when working with large measurement sets from different instruments I’m assuming X also includes any calibration/accuracy/trueness/bias uncertainty as well. Though in the context global mean temperature datasets any calibration/accuracy/trueness/bias at the station level is mitigated (not eliminated) by transforming the measurements into anomalies already. The transformation to anomalies does not, however, mitigate sampling error which is a point of accuracy/trueness/bias injection itself. This is why in practice the final uncertainty of global mean temperatures ends up being higher than the standard error of the mean formula suggests. And if there is a systematic bias on the global mean temperature anomaly (there usually is) it won’t matter when you do linear regression trends because it only effects the y-intercept without effecting the slope assuming of course the systematic bias is time invariant. The real issue on trends is that the systematic bias often is time evolving. A lot of work goes into assessing and handling these time evolving biases so that trend estimates are closer to truth.
“Try reading Doubt Free Uncertainty by Ratcliffe. A practical approach to uncertainty.”
Will do.
Why would the uncertainty of the average be lower than the uncertainty of individual measurements.
And don’t say “try a monte carlo simulation.”
What mathematical reason?
Let
[1] X be some variable
[2] Y = X/n
and given the property of expectation
[3] var(a) = E(a^2) – E(a)^2
therefore
Y^2 = (X/n)^2 = (1/n^2) * X^2 using [2]
E(Y^2) = E((1/n^2) * X^2) = (1/n^2) * E(X^2)
and then
var(Y) = E(Y^2) – E(Y)^2 using [3]
var(Y) = E((X/n)^2) – E(X/n)^2
var(Y) = (1/n^2)*E(X^2) – ((1/n)*E(X))^2
var(Y) = (1/n^2)*(E(X^2) – E(X)^2)
var(Y) = (1/n^2)*var(X) using [3]
sqrt(var(Y)) = sqrt((1/n^2) * var(X))
sqrt(var(Y)) = sqrt(1/n^2) * sqrt(var(x))
σ(X/n) = sqrt(1/n^2) * σ(X)
σ(Y) = σ(X)/sqrt(n) or SEM = σ/sqrt(N)
The conceptual reasoning is that when you divide the values in a sample by a constant you divide the variance by the square of the constant. Variance is the square of the standard deviation. The choice of the constant ‘n’ in the derivation above should be obvious from the context.
The other interesting concept invoked is the CTL. That means if you sample a population randomly then all you need to have are 30 or so values in each sample for your sample means to fall into a normal distribution even if the population is not normal. This is important because in the context of a global mean temperature we cannot sample the temperature at every infinitesimal point at every infinitesimal moment in time. But in most case we can get close enough.
BTW…it is the sampling methodology that primarily causes global mean temperature estimates to have uncertainties higher than what the SEM (derived) above would imply. Sampling methodology includes the sparseness of observations in some regions and the interpolation scheme used to fil in grid cells when that happens.
You can’t “SAMPLE” global average temperature. It doesn’t exist as a measurable thing. It is calculated based off the data you collect, the data you make up from thin air in the case of interpolation, and every data point has uncertainty. That uncertainty does not go away. Even if you are just taking one temperature station, repeated measurements aren’t measuring the average temperature for that station.
T(t) = Tmeasured(t) + ErrorSyst(t) + ErrorRand(t)
The error functions are unknown. Taking more measurements does not decrease your error because with each measurement, the actual temperature has changed and the error functions give different noise. Since you don’t know what those functions are, your best case is an adjudged value for the uncertainty.
T(t) = Tmeasured(t) ± Uncertainty
Taking additional measurements does not make that uncertainty go away. You are not doing repeated measurements of the same thing.
The math you have described above is simply math showing that the sample average of any data set converges on the population average as N goes to infinity. It says NOTHING about uncertainty. Consider a situation in systematic error of unknown origin with non-stationarity contaminates your measurements. Adding more N does NOTHING to reduce uncertainty.
You didn’t start off correctly.
Y +/- u_Y = X/n +/- u_X +/- u_n)
You left uncertainty out of your starting equation and never propagated it through correctly.
Nope. If ±u_Y is the uncertainty on Y then ±u_Y/sqrt(n) is the uncertainty on Y/n. I derive this via the RSS equation in a post above. You can also test this for yourself with a monte carlo simulation if you don’t believe me. And no, I didn’t leave out the propagation of uncertainty in my derivation. Also note that I derived an equation that appears in all statistics texts and is universally accepted by all expert statisticians and confirmed ad-nauseum by monte carlo simulations. So yeah…it’s right.
Nope. If you divide by N then you get the individual uncertainty for each member. Remember the 200 sheets of paper.
U_total = 200(u_i).
u_i = U_total/200
I simply don’t know what you think you are getting by dividing by sqrt(N).
Yeah, let’s take a look at that sheet scenario.
The stack with 200 sheets is 1.3±0.1.
When you divide the stack into individual sheets with trivial division of the uncertainty by 200 you get 0.0065±0.0005.
Now let’s recombine the individual sheets into a stack again. We all agree that you use RSS in this case which results in (0.0065*200)±(sqrt(0.0005^2 * 200)) = 1.3±0.007.
Hmm…the original stack was 1.3±0.1. We divided it into individual sheets at 0.0065±0.0005. Finally we recombined the sheets back into a stack with 1.3±0.007. See the problem?
Right on.
Let Tact, i(τ) be the actual temperature at location i at time τ.
Let Tmeasured, i(τ) be the measured temperature at location i at time τ
Let W i(τa, τb ) be a weather function of unknown nature which gives the temperature difference between time a < time b.
In other words:
Tact, i(τb) = Tact, i(τa) + W i(τa, τb )
Which means that there is always a chance that temperature has changed between time a and b.
Let εsys, i(τ) be a systematic error function for location i at time τ which can offset the measured temperature from the actual temperature. Nothing is known about this function, it’s normality, or stationarity.
Let εrand, i(τ) be a random error function for location i at time τ which can offset the measured temperature from the actual temperature.
Accordingly, here is how actual and measured temperature relate.
Tmeasured, i(τ) = Tact, i(τ) + εsys, i(τ) + εrand, i(τ)
Note this also means that:
Tmeasured, i(τ)= Tact, i(τ-1) + W i(τt-1, τt ) + εsys, i(τ) + εrand, i(τ)
Looking at the change in temperature, if any, between time τ -1 and τ, we do not know what part of the change is due to weather (actual change) and what is due to systemic and random error functions.
Let Si be a set of measurements Tmeasured, i for location i.
The average temperature for location i will be:
Sum(Si)/n
where n is the number of measurements.
This can be expanded:
Sum(Tact, i(τ) + εsys, i(τ) + εrand, i(τ))/n
And it follows that:
Sum(Tact, i(τ-1) + W i(τt-1, τt ) + εsys, i(τ) + εrand, i(τ))/n
Note, even if the function term W i(τa, τb ) is very small so as to be ignored, a measurement at time τ-1 cannot be used as a proxy for a measurement at τ because of the error terms.
If W i(τa, τb ) is very small over large time periods so as to be ignored and εrand is N ( 0 , σ ), the random errors will cancel, but the εsys, i(τ) terms will not.
If there is no systemic error, and the random errors cancel, and W i(τa, τb ) is very small so as to be ignored, then as n increases, the average error term will approach zero.
The problem with temperature measurements is that W i(τa, τb ) is not negligible. Temperatures vary day to day at the same time of day and also vary within the day. In addition, systemic error is known to exist in stations. We know that errors are not normally distributed. Even assuming random errors that cancel, we cannot tell the difference between a change in Tmeasured that is caused by W i(τt-1, τt ) and a change that is caused by εsys, i(τ)
This is a cone of ignorance that persists. Each measurement stubbornly comes with this uncertainty, and we cannot dispense of it with an increase in n.
Just because the standard deviation of your presumed sample mean is predicted to decrease with n is completely and utterly irrelevant. That presumed sample mean will have an uncertainty associated with it related to the RSS of the measurement uncertainties.
A few corrections to some subscript typos, but same point.:
Let Tact, i(τ) be the actual temperature at location i at time τ.
Let Tmeasured, i(τ) be the measured temperature at location i at time τ
Let W i(τa, τb ) be a weather function of unknown nature which gives the temperature difference between time a < time b.
In other words:
Tact, i(τb) = Tact, i(τa) + W i(τa, τb )
Which means that there is always a chance that temperature has changed between time a and b.
Let εsys, i(τ) be a systematic error function for location i at time τ which can offset the measured temperature from the actual temperature. Nothing is known about this function, it’s normality, or stationarity.
Let εrand, i(τ) be a random error function for location i at time τ which can offset the measured temperature from the actual temperature.
Accordingly, here is how actual and measured temperature relate.
Tmeasured, i(τ) = Tact, i(τ) + εsys, i(τ) + εrand, i(τ)
Note this also means that:
Tmeasured, i(τ)= Tact, i(τ-1) + W i(τ-1, τt ) + εsys, i(τ) + εrand, i(τ)
Looking at the change in temperature, if any, between time τ -1 and τ, we do not know what part of the change is due to weather (actual change) and what is due to systemic and random error functions.
Let Si be a set of measurements Tmeasured, i for location i.
The average temperature for location i will be:
Sum(Si)/n
where n is the number of measurements.
This can be expanded:
Sum(Tact, i(τ) + εsys, i(τ) + εrand, i(τ))/n
And it follows that:
Sum(Tact, i(τ-1) + W i(τ-1, τt ) + εsys, i(τ) + εrand, i(τ))/n
Note, even if the function term W i(τa, τb ) is very small so as to be ignored, a measurement at time τ-1 cannot be used as a proxy for a measurement at τ because of the error terms.
If W i(τa, τb ) is very small over large time periods so as to be ignored and εrand is N ( 0 , σ ), the random errors will cancel, but the εsys, i(τ) terms will not.
If there is no systemic error, and the random errors cancel, and W i(τa, τb ) is very small so as to be ignored, then as n increases, the average error term will approach zero.
The problem with temperature measurements is that W i(τa, τb ) is not negligible. Temperatures vary day to day at the same time of day and also vary within the day. In addition, systemic error is known to exist in stations. We know that errors are not normally distributed. Even assuming random errors that cancel, we cannot tell the difference between a change in Tmeasured that is caused by W i(τ-1, τt ) and a change that is caused by εsys, i(τ)
This is a cone of ignorance that persists. Each measurement stubbornly comes with this uncertainty, and we cannot dispense of it with an increase in n.
Just because the standard deviation of your presumed sample mean is predicted to decrease with n is completely and utterly irrelevant. That presumed sample mean will have an uncertainty associated with it related to the RSS of the measurement uncertainties.
Suppose a digital voltmeter is used to read the temperature sensor: the manufacturer’s data sheets will include detailed information about the errors that will be encountered and must be be included in the temperature measurement uncertainty. These include:
1) temperature coefficient error caused by the instrument’s temperature not being 25°C
2) digitization error
3) calibration drift, the bounds of which increase with time since its last calibration
Note that none of these can be assumed to be random that will disappear in an average, each one has to be accumulated into the uncertainty of the average.
#1 would certainty cause a bias. But that should be easily corrected with adjustments in the unlikely event that it actually occurs in practice. You typically only place instruments into service that are calibrated for the expected temperature range they will be exposed to.
#2 might be problem if the instrument has a transmitted that produces a 4-20 mA signal that must then be scaled and if the scaling was configured improperly. But again that should be easily corrected with adjustments.
#3 could introduce a bias especially if all measurements use the same type of instrument. But like #1 preferential drift directions should be easily corrected with adjustments. If the drifts are random then this would not be an issue.
There are other biases as well.
4) time of observation bias
5) station moves
6) instrument changes
7) non climatic environmental changes (UHI)
8) improper commissioning
and many more.
BTW…A lot of these biases become moot when transforming absolute temperatures into anomalies as well.
You can’t adjust for biases you don’t know about. At this point I have to ask what you do for a living? You’ve gotten this whole thing so completely wrong. If global average temperature had lower uncertainty by just adding N, we wouldn’t worry about better instrumentation. We would just put cheaper and less precise instruments everywhere.
And your comment about biases becoming moot when you transform to an anomaly is also insane. The problem is not only are you ignorant of the biases, you are ignorant about their duration, whether they are intermittent, etc. Transforming to an anomaly removes static biases. It does nothing to remove non-stationary, systemic errors of unknown distribution. If albedo from snow causes too high temperatures, many of your winter highs will be too high, but you won’t know that. If insufficient wind causes poor ventilation, your highs will be too high, but you also won’t know that.
You folks have taken an unproven shortcut by just assuming that either errors are random and canceling or that they are persistent biases which are always the same and can be removed with differencing. None of that is true.
This entire field is replete with scientific fraud.
I agree with your comment about systematic biases only canceling when doing anomaly analysis if those biases are static (time invariant). Most of think its the time dependent biases that are most problematic especially with datasets like UAH and RSS that are especially prone to them due to the nature of how they work.
I disagree with your comment that you can’t adjust for biases you don’t know about. The pairwise homogenization algorithm is specifically designed to do just that (Menne 2009). And transforming absolute temperatures into anomalies does it too.
I agree with your comment about factors like albedo, ventilation, etc. potentially biasing temperature observations. Just remember that biases are a double edged sword that cuts both ways. Also consider that there are techniques like BEST’s jackknife resampling and ERA5’s 4D-VAR method that guard against these kind of biases.
Don’t hear what I’m not saying. I’m not saying that biases aren’t a problem. They are. I’m not saying that global mean temperature datasets are perfect. They aren’t. What I will say is that the presumptions by contrarians that all of the worlds scientists are either stupid or committing fraud does not have merit.
Have you submitted your analysis to a metrology journal yet?
Pairwise homogenization does not correct for anything. All it does is applies a hypothesis that A is a proxy for B when they aren’t measuring the same thing. And then inevitably, it ends up spreading the Urban Heat Island effect across the entire record.
https://wattsupwiththat.com/2014/06/10/why-automatic-temperature-adjustments-dont-work/
Your statistical games aren’t science.
To this very second you still are confusing uncertainty with standard deviation of the “sample mean” (not that you can actually sample average temperature).
“Just remember that biases are a double edged sword that cuts both ways.”
Which ways is both ways? The entire point is that YOU DON’T KNOW WHAT THE BIASES ARE. HENCE THE UNCERTAINTY LINGERS. YOU CAN’T CORRECT FOR IT. IT PROPAGATES THROUGH EVERY SINGLE CALCULATION YOU MAKE, AND ANY STATISTICAL GAMES OR ALGORITHMS CANNOT REDUCE THIS. THE ACTUAL AVERAGE TEMPERATURE OF ALL OF THAT DATA YOU COLLECTED COULD BE HIGHER OR LOWER. YOU DON’T KNOW. TREATING EACH DATA POINT IN THE AVERAGE AS IF IT’S A PRISTINE NUMBER WITH NO UNCERTAINTY AND THEN JUST HANDWAVING IT AWAY WITH THE CENTRAL LIMIT THEOREM IS WRONG.
Error propagation is completely and utterly ignored by this field. It’s not that they’re all stupid. Some of them are just utterly uneducated on statistics and metrology. Years and years of papers have been published based on false premises and garbage statistics. They can’t about face now. Hence, the corruption.
This guy knows zero about real data acquisition, it is obvious he has never made a single measurement, ever.
Totally, but yet, arrogant enough to think he has. Either a student or a data “scientist.” There are so many geeks out there that think learning to use pandas means they are scientists.
I’ll be the first to admit that I make more than my fair share of mistakes. I also want to make it know that if I have offended you, Carlo Monte or Gormans in anyway or caused undo frustration I am genuinely sorry. That is never my intention. My intentions are only to learn as much as possible myself and hopefully educate others.
You should read Taylor and understand it. Then help others to understand it. Uncertainty is a critical concept that has been neglected in many fields, especially climatology. You have Berkeley saying that they think they can discriminate between two average temperature measurements that are within 0.05C of each other. Even with today’s thermistors, this is impossible given various systemic errors. The uncertainty of those averages is greater than the difference. What these people are claiming is impossible and fraudulent. Only by completely ignoring uncertainty, as if you are drawing numbers out of a bag, can you presume to apply the central limit theorem to these averages and suggest that N helps anything with respect to the precision of the sample mean estimating the actual population mean. And even in that situation, there would probably be an systemic error involved that would bias that experiment. Maybe in the way someone was biased in how they randomly pulled numbers out of a bag. Who knows. But that’s not what we’re dealing with.
“#1 would certainty cause a bias. But that should be easily corrected with adjustments in the unlikely event that it actually occurs in practice. You typically only place instruments into service that are calibrated for the expected temperature range they will be exposed to.”
And when a mud dauber wasp builds a nest in the air intake of a measurement station exactly how do you correct for that with adjustments? When a barnacle attaches to the water intake of a float and affects the water flow past the sensor exactly how do you correct for that?
Captain asked you the right question. What do you do for a living? Have you *ever* been associated with a project carrying personal liability for you? A project where uncertainty *has* to be properly accounted for and propagated throughout the entire project?
“#3 could introduce a bias especially if all measurements use the same type of instrument. But like #1 preferential drift directions should be easily corrected with adjustments. If the drifts are random then this would not be an issue.”
Drift is based on local environmental conditions unique to each location. How do you adjust for local environmental conditions you don’t know about? If drifts are random then you don’t know if they cancel or not. How do you then account for them?
If the wasp nest is significantly altering temperature observations then PHA would detect that as a change point. If not then jackknife resampling would catch it, compensate for it, and nudge the uncertainty up a bit. If that failed then VAR data assimilation based datasets would compensate for it via the other observations and fields and the ensembled uncertainty for the impacted grid cell would nudge up a bit.
If the drift you’re talking about is environmental then we probably want to keep it. Even if it is caused by urban expansion we want to keep it in the record because regardless of the cause of the change in temperature it is real. What we don’t want to do is overweight/underweight it in the spatial averaging process because that is what causes positive/negative UHI biases. Most datasets handle the UHI bias as a special step. Reanalysis dataset are largely immune because their grid meshes have sufficiently high resolution and other sources of observations that prevent an urban/rural stations UHI/non-UHI trend from bleeding over into and contaminating a rural/urban grid cell. Anyway, the drift I was thinking of was instrument drift due to wear and tear that may have a preferential direction. That should be addressable as an adjustment based on the HOMR database records should an instrument bias like that be identified. Reanalysis datasets should compensate for it automatically.
Don’t hear what I’m not saying. I’m not saying all biases are identifiable and correctable. We’ll never be able to perfectly estimate the global mean temperature. I seriously doubt we’ll ever be able to drive the uncertainty down to ±0.01C even.
“If the wasp nest is significantly altering temperature observations then PHA would detect that as a change point. If not then jackknife resampling would catch it, compensate for it, and nudge the uncertainty up a bit. If that failed then VAR data assimilation based datasets would compensate for it via the other observations and fields and the ensembled uncertainty for the impacted grid cell would nudge up a bit.”
You live in a fantasy world where algorithms can scrub data and compensate for errors you don’t know about. You fundamentally do not understand what uncertainty is, or why it is important.
Bull pucky! Are you saying that you can take a months worth of temp readings from a station in 1910 and analyze it for environmental changes that may only cause some readings to move from 50 to 51 because someone thought the thermometer had “tipped” over the half way point and rounded up instead of down?
And, even if you could, why multiple changes to the same stations from early last century?
Everytime you comment, you solidify the fact that you are a mathematician interested only in numbers and have no idea what they mean. Tell me did a chemistry or physics teacher in a lab you took ever have all the students measure independent things then have the class find an average with more significant digits than what the measuring device could provide? If they didn’t allow it, how do you justify it now with temperatures?
“If the wasp nest is significantly altering temperature observations then PHA would detect that as a change point.”
How would it do this? Can it magically tell from minute to minute, hour to hour, day to day what is real change and what is not? The only way for this to happen if it can tell the future!
“If not then jackknife resampling would catch it, compensate for it,”
How do you do jackknife resampling when you have exactly two points in your dataset, minimum temp and maximum temp. What are you going to leave out in order to recalculate averages?
“If that failed then VAR data assimilation based datasets would compensate for it via the other observations and fields and the ensembled uncertainty for the impacted grid cell would nudge up a bit.”
The uncertainty would go up? Oh, that’s a good way to run a business I guess.
“Anyway, the drift I was thinking of was instrument drift due to wear and tear that may have a preferential direction.” (bolding mine, tpg)
And how do you know what the preferential direction is? If the drift is part of the measurement how do you separate out the drift from the real temp? Field measuring stations don’t get calibrated every hour, day, month, or even year! The word “MAY” is the operative word in your sentence. “May” means you don’t know.
You’ll never get the uncertainty in the GAT down to anything because it doesn’t exist, isn’t measurable, isn’t physical, and its uncertainty interval is wider than the absolute temperatures being measured. The GAT is a phantom created by climate scientists because they don’t want to be held to actually evaluating absolute physical parameters such as enthalpy.
NO!
Go read a DVM spec sheet, they do NOT tell you if a calibration drift is plus or minus, this is something you CANNOT know. And you cannot just hand-wave calibration drift away by assuming identical instruments, there is nothing to indicate they will all behave in the same manner.
It is quite obvious that you are oblivious about digitization errors and know nothing about analog-to-digital conversion.
“in the unlikely event that it actually occurs in practice”
You just confirmed that you are completely ignorant of instrumentation. Effects of temperature coefficients in electronic data acquisition IS an issue, and one you cannot simply wave away with your magic “adjustments”.
And you want people to think you know something, anything, about uncertainty. You are failing.
These are perfectly fine points. But here’s the thing. ERA5 addresses these concerns with its 4D-VAR assimilation of observations. And when we compare ERA5 with HadCRUTv5 we see their agreement fits a normal distribution (mostly) with a mean mismatch of 0.00±0.06C and a range of -0.20 to +0.17C with a recurrence rate of < 0.06C, < 0.10C, and < 0.12C mismatch at 68%, 90%, and 95.5% respectively over the period of record. This is inconsistent with Frank’s claim that the uncertainty is ±0.46C and astonishingly inconsistent with estimates using RSS which employed strictly on the ERA5 grid would yield a mind numbingly unrealistic value of ±200C. I will say the agreement analysis between HadCRUTv5 and ERA5 implies a slightly higher uncertainty of each at ±0.07C as compared to typically cited ±0.05C. The match between ERA5 and BEST and ERA5 and GISTEMP is better and consistent with that ±0.05C uncertainty suggesting that maybe HadCRUTv5 despite the improvements has higher uncertainty than BEST or GISTEMP. I actually haven’t seen the uncertainty analysis from HadCRUTv5 yet.
You cannot address systemic error after the fact if you don’t know about it. Multiple groups making the same erroneous assumptions as a shortcut doesn’t make the problem go away. You’re using the central limit theorem as a crutch and then pretending it waves a magic wand on uncertainty so that you can make impossibly precise statements about something you can’t measure.
It’s best that this scientific fraud is kept in climate and not in places where it makes planes crash and pacemakers fail.
HadCRUTv5, BEST, GISTEMP, and ERA5 are not making the same assumptions. They don’t even use the same techniques. And ERA5’s method is so wildly different it is in a league of its own. How can these different groups using wildly different techniques and subsets of available data all accidently make different mistakes that result in agreement at σ = ±0.05?
How? Because they all treat uncertainty the same way – incorrectly!
Exactly. They all neglect uncertainty. The fact they all come to close to the same figure has nothing to do with speaking to uncertainty.
That’s not a true statement at all. Several datasets publish rigorous uncertainty analysis. You can argue that they may be doing it incorrectly, but I don’t think arguing that they don’t do it at all has any merit.
I also disagree with the statement that the agreement on the global mean temperature has nothing to do with speaking to uncertainty. I disagree because if the uncertainty of the measurements was higher we would expect the agreement to be less.
That could conceivably effect how much they agree on the uncertainty, but I don’t see how that would effect how much they agree on the global mean temperature.
You kind of answered your own question. How do you think the all ended up at the same point? Basically by ignoring significant digit ruler for calculating measurements.
You’ll never convince anyone here who has dealt with measurements and tolerances that you can can exceed the precision of any single measurement by simply averaging more independent measurements of different things together.
You do realize that what you are propounding would have a dramatic effect on all certified laboratories. They could all give up buying more and more precise equipment since it would only take averaging a few dozen readings from much less precise devices to get an increase of precision by several orders of magnitude!
I’ve told the story before about a manager storming to the quality control office and asking why there are so many doors being returned. The QC person pulls out a spreadsheet and shows him how each door is measured and the length is entered into the spreadsheet. We then calculate the AVERAGE length and the error of the sample mean is 0.0001 inches. The big guy takes him to the warehouse and shows him that there are doors that are 1 inch too short and 1 inch too long and everything in between.
Do you think the QC guy kept his job? Why or why not?
“You kind of answered your own question. How do you think the all ended up at the same point? Basically by ignoring significant digit ruler for calculating measurements.”
I’m not following your logic here. If anything wouldn’t that decrease their agreement on the global mean temperature?
“I’ve told the story before about a manager storming to the quality control office and asking why there are so many doors being returned. The QC person pulls out a spreadsheet and shows him how each door is measured and the length is entered into the spreadsheet. We then calculate the AVERAGE length and the error of the sample mean is 0.0001 inches. The big guy takes him to the warehouse and shows him that there are doors that are 1 inch too short and 1 inch too long and everything in between.”
The problem here is that the QC person used the mean of all doors to erroneously conclude something about individual doors.
Nobody is using the global mean temperature to conclude something about individual temperatures.
Ya know…the fact that you present an anecdote that has little relevance to the discussion makes me wonder if you understand how a global mean temperature is estimated and what it is used for. If you think it is used to infer something about individual temperatures then there is a misunderstanding somewhere.
“Nobody is using the global mean temperature to conclude something about individual temperatures.”
Really? Then why is everyone of the climate alarmists claiming the GAT going up means the earth is going to turn into a cinder if we don’t eliminate man-made CO2?
From the IPCC to the US Democrat party to the head of the governments of the UK, France, and Germany believe we are headed toward high temps causing food shortages, floods, droughts, tornadoes, hurricanes, etc.
It’s individual temperatures that have to cause these because there is no GAT that exists anywhere!
And the GAT is used to say that individual temperatures are going to increase dramatically.
You can’t say the earth’s temperature is increasing without also recognizing that means individual temperatures will also be increasing.
You can’t also say that averaging and using the standard error of the mean results in a more precise temperature measurement and then turn around and say that the standard error of the mean when applied to doors is an error. They are both measurements of real physical things. If the standard error of the mean applies to one, then it applies to all. You can’t simply assert it only applies to your calculation and not others.
We’re not even discussing the GAT increase or the reasons for the increase.
And I never said the standard error of the mean when applied to the sample of doors is an error.
What I said was that it is an error to use the standard error of the mean of the sample to make a conclusion on the quality of a specific door in the sample.
I don’t have time to read through all the ERA5 documentation. But it appears it uses a 60km grid and a 12 hour window. That is hardly sufficient to provide for accurate interpolation. Terrain and weather can vary *significantly* in 60km and 12 hours. It is better than a flat 2-d, 24hour window but still hardly what I would call accurate.
The ERA5 grid is 30×30 km with 137 levels up to 80km. It has also has few land and ocean layers as well. It definitely lacks the resolution to replicate the gradients that exist in and around mountain peaks, but it does produce a realistic values in these areas. It’s 500,000+ cells is certainly more than the 2592 cells provided by HadCRUTv5, 8000 provided by GISTEMP, and 15984 provided by BEST. And as opposed to monthly means only it provides Tmin and Tmax on 1 hour intervals based on 12 minute step processing. Is it perfect? Not even close. Is it better than GISTEMP, HadCRUTv5, and BEST? For sure. The assessed uncertainty on the near surface temperature field is about ±0.2-0.3C increasing to ±0.5 at the south pole so it’s pretty good. Note that global averages are not as low as the SEM suggests due to sampling bias.
Hey, Hey, right on. Your math is impeccable and you obviously have some familiarity with measurements and their use. Glad to see you aboard Captain!
Why do we need to have a mean temperature of the Earth that does not measure the entire Earth? What is the purpose of that number? Is it critical or just sort of informational? I used to work in a manufacturing environment where we used statistical process control to monitor how our manufacturing processes were doing. When we had a critical safety measurement, we measured each and every instance of the operation and kept records by serial number. I would think global temperature also has some criticality characteristic.
Actually it doesn’t. It is portrayed as a metric that shows how the earth’s temperature has changed..
In your process control scenario let’s say you were making saucers, plates, and glasses. If you follow Global Average Temperature (GAT) you would measure the diameters of all three and create an single average. That average is intended to tell you if things are out of control. And remember, you will only have the aggregate number.
The problem you will have is saucers could be getting larger and plates getting smaller. You will have no control over your process until everything goes to he!!. You will be forced to shut down all three lines.
GAT is similar. Temps from poles averaged with temps from the equator. Temps from summer in one hemisphere with winter temps in the other. The variances are large as is the uncertainty.
The Global Average Temperature (GAT) is meaningless. In your manufacturing process you were concerned with true values. Each of the processes were designed to produce a product around a true value. Multiple measurements would create a probability distribution you could analyze statistically and determine a true value.
The GAT is not a “true value”. There is no guarantee that anywhere on the globe matches that mean. It is not even guaranteed that the GAT is the most probable temperature you will find on the earth. Which means you are calculating the mean from a distribution that is not Gaussian.
You can’t even tell what is changing from the value of the mean. If you can’t tell what is changing then how can you know the earth is going to turn into a cinder?
Yes! And if these averagers were honest, they would include the standard deviations that arise from the daily ranges of temperatures that are easily about 100°C.
Anomalies smooth that away don’t you know!
I keep forgetting…
bdgwx August 27, 2021 5:26 am
When all of the worlds’ scientists incorrectly agreed that ulcers were caused by stress, were they “either stupid or committing fraud”? Nope. They were simply wrong.
When all of the worlds’ scientists incorrectly agreed that Newton’s Laws Of Motion were true everywhere under all conditions, were they “either stupid or committing fraud”? Nope. They were simply wrong.
When all of the worlds’ scientists incorrectly agreed that the continental plates couldn’t move, were they “either stupid or committing fraud”? Nope. They were simply wrong.
I know almost no one who thinks that the climate alarmists moving in intellectual lockstep are “either stupid or committing fraud”. On the other hand, many people, including me, think that most of them are simply wrong …
However, it is indisputable that some of them, like Phil Jones, Caspar Amman, and Peter Gleick, were absolutely committing fraud.
w.
The fraud if any starts early. The biggest ” fraud” is being taught in university math courses in statistics.
They teach that the “standard error of the mean” can define the precision of a measurement. It doesn’t. It tells the standard deviation of the sample mean. It is not a statistical parameter that defines precision of the mean.
Significant figures were designed to maintain the precision of measurement. Replicating science requires this or results would never be comparable. Climate science is the worst offender, probably out of ignorance more than intentional fraud.
“They teach that the “standard error of the mean” can define the precision of a measurement.”
Not the precision of individual measurements, but the precision of the mean.
For example,
https://web.chem.ox.ac.uk/teaching/Physics%20for%20CHemists/Errors/Statistics.html
If you don;t think it’s a parameter that defines the precision of a mean, what would you use to determine the precision of the mean?
If you have 1,000 measurements to one decimal point, the standard deviation can’t have more than one decimal point. How could it, if significant digits were paid attention to?
I took 914 temperature measurements (no reason for that number, just had them handy), and calculated a standard deviation of 12.6. I took the first half of the measurements and got a standard deviation of 12.5. I took the second half and got a standard deviation of 12.6 again.
The first calculation with all the measurement is presumed to be more accurate than those with half the data, but it can’t be presented with any more than one decimal point’s precision, nor can the uncertainty.
Of course it can. Just look at the formula for calculating standard deviation.
If I have 10,000 numbers precise to one decimal place, the mean can’t be any more precise than that.
When the difference between x and xmean is taken, it will also have one decimal place. When the difference is squared, two decimal places might be allowed in the case the square has a leading zero.
When the whole mass is divided by N-1 it’s still only a once decimal place number, unless again here is a leading zero.
The final square root will still only have one decimal place precision.
What is there in it all of that which would make the mean of several thousand samples any more precise than the individual measurements?
I simulated it. I generated 10,000 64-bit floating numbers and recorded the mean of these values. I then truncated the floating point values to one decimal places and recorded the mean of those. I then compared the mean of the original values and the truncated values and logged the error. I repeated the procedure 1000 times. The error of the mean formed into a normal distribution with σ = 0.003.
You calculated the error of the mean between the means of the one-decimal-place numbers and … how many decimal places did you use in the original 64-bit floating numbers? How large were the numbers? 64 bits doesn’t tell me anything except how large the memory block was that held the number.
The original values were 15 decimal places. The truncated values were only 1 decimal place.
BTW…I just noticed a typo on my part. Above I said the error was σ = 0.003 above. I missed a zero. It is actually σ = 0.0003.
I’m still not following your method. You averaged the 15-decimal-place numbers — what was to the left of the decimal point? Does it matter? — and then averaged the the truncated (or rounded?) 1-decimal-place numbers.
Now you’ve got something like 0.123456789012345 and 0.1. Then you “logged the errors”; what do you mean by that? I’m losing you on that last step.
The original numbers were fully randomized from approximately 10^-300 to 10^+300 each with 15 decimal places to the right of the period. It doesn’t really matter what the range is though.
The truncated numbers are straight up truncated; not rounded. So 2.123456789012345 becomes 2.1 and 2.987654321098765 becomes 2.9.
I generated a list of 10000 original numbers and took those 10000 to produce a list of truncated numbers. I then computed the average for each list and compared the difference. I did this 1000 times. That gave me 1000 averages for both the original and truncated lists that I could compare. The average difference (or error) between the original numbers and truncated numbers fell into a normal distribution with σ = 0.0003.
“If I have 10,000 numbers precise to one decimal place, the mean can’t be any more precise than that.”
What definition of precise are you using?
Of course the mean will have more than one decimal place, you’ve just got a sum of 10,000 values each to 1 decimal, and then you are going to divide it by 10,000. What’s 12345.6 divided by 10000? I’m pretty sure it isn’t 1.2.
“What is there in it all of that which would make the mean of several thousand samples any more precise than the individual measurements?”
Everything that’s been discussed ad nauseam for the last few months, and every website and book I’ve been asked to look at saying it. Experimentation. Experience. Common sense.
How does one get a value of 12345.6 by summing 10,000 numbers each having no more than one decimal place?
Easily. If each figure has a single decimal place, the sum will have a single decimal place.
Maybe you are confusing decimal places with significant figures.
What I think is being confused is numbers vs. measurements. The arithmetic mean of 12345.6 / 10000 is 1.23456, However, If the sum 12345.6 is the result of 10,000 measurements with one decimal place, then the mean is 1.2, because the mean of a measurement can’t be more accurate than a measurement.
Your question was about summing, not averaging. The rule for adding I think is to round the final answer to the number of decimal places of the least number of decimals in any value. If all 10000 values are measured to 1dp the sum should also be to 1dp.
As far as averaging, I don’t agree that the final mean should necessarily be reported to the same number of dps as the individual values. I know a lot disagree here, and I can find a number of sources that repeat that mantra. But I can also find many sources that disagree, and I still cannot find any logic behind it.
This guy is so out of whack it isn’t funny. I am sad that a university such as Oxford allows such drivel.
First –
The authors first table has a wrong calculation.
[(5.012 – 4.9794) = 0.0326] and not 0.326 as shown.
When you redo the table the deviation = 0.0000
Second –
The author specifies
The author apparently has no conception of what a “sample distribution” consists of. To create a sample distribution one must define a “sample size”, i.e., the N in the error of the sample mean and then proceed with taking samples of the population.
If you include the entire population in the single “sample”, there is no sample distribution. You can not calculate an error in the sample mean. All you have is the population mean and the population standard deviation.
Third –
The author specifies:
The author has no conception of what a “sample mean” vs a “true mean” actually defines. The entire population of measurements is the 5 values that comes from some experiment. If the errors in measurement result in a Gaussian (normal) distribution, the mean of the values will cancel the errors and the resulting mean will result in a value called the “true value” when the same thing is being measured by the same device.
In the authors example, the deviations seem to cancel although the distribution is far from normal.
Please watch the attached video on youtube. Sampling distribution of the sample mean | Probability and Statistics | Khan Academy – YouTube
This site:
https://onlinestatbook.com/2/sampling_distributions/samp_dist_mean.html
says:
First, you are correct about the typo in one of the calculations, but the mean of the absolutes is correct. I’m guessing you are averaging the values which inevitably will be zero as they are deviations form the mean.
Go to this site and hit “begin” to run examples of obtaining a sample mean. It will mean more if you watch the youtube video I referenced.
Sampling Distributions (onlinestatbook.com)
Please note that “N” is the sample size, and that each sample is obtained from the entire population. One of the requirements for sampling is that samples are random and are representative of the population. Otherwise the Central Limit Theory won’t work correctly to give you a normal distribution centered on the sample mean.
You keep on showing me elementary statistical texts, without ever explaining in what way they contradict what I’m saying. You may remember I and others have been suggesting you run simulations to test your arguments for some time.
N is the sample size – yes.
Samples need to random and independent – ideally yes
Sampling gives a distribution centered on the mean – yes
Increasing sample size gives you a narrower, hence more precise, less uncertain, distribution – yes
Now if you can find a simulation that demonstrates how uncertainty of the mean increases as sample size increases, that would go some way to providing evidence for your claims.
“Increasing sample size gives you a narrower, hence more precise, less uncertain, distribution – yes”
You are making an unwarranted assumption here. A more narrow distribution DOES NOT mean a more precise distribution. It is less uncertain as to the location of the mean because the standard deviation of the sample mean distribution is reduced. However, precise is controlled by the data used, not by the value of the standard deviation.
You misunderstand what dividing by the “sqrt N” does. Look at what happens as N goes to infinity. Do you really think this lets you say that the mean is now known, for example, as 15.0000000000000000000000000…000000039? This is what you are asserting. I know this is an argumentum ad absurdum example, but you need to prove it wrong mathematically if you want to assert it.
Another example is this. I have thirteen measurements of temperature.
21, 22, 24, 25, 27, 28, 30, 33, 38, 42, 45, 48, 52
The mean is
435/13 = 33.461538461538 -> 33
using significant digit rules applicable to measurements.
I take 3 samples of 5 each and get the following.
Sample 1 -> 21, 22, 27, 28, 33
Sample 2 -> 22, 27, 38, 45, 52
Sample 3 -> 24, 27, 33, 42, 48
What are the sample averages using significant digit rules? Remember, these are measurements and averages of measurements must use significant digit rules, and besides that we have not yet calculated the error of the mean.
Sample 1 -> 26
Sample 2 -> 37
sample 3 -> 35
Now according to the image.
μ = 98/3 = 32.66666666… -> 33
s = sqrt{[(26-33)^2+(37-33)^2+(35-33)^2]/3-1} = sqrt(69/2) = 5.8736700
SE = 5.8736700/sqrt(3) = 2.423565555
So the sample mean is 33 +/- 2.4
“A more narrow distribution DOES NOT mean a more precise distribution.”
I keep asking what definition of precision you are using. I’m using the definition regarding how close measurements are to one another. I find it difficult to see how a more narrow distribution DOES NOT mean a more precise distribution. Think of all those graphics showing tight patterns of shots is representing precision.
This definition of precision is distinct from how precise a result is reported in terms of number of digits. You can report a result to 100 decimal places but it doesn’t make it more precise if every result is different.
“However, precise is controlled by the data used, not by the value of the standard deviation.”
From the VIM
“Look at what happens as N goes to infinity. Do you really think this lets you say that the mean is now known, for example, as 15.0000000000000000000000000…000000039?”
As N tends to infinity the standard error of the mean tends to zero. This does not mean you “know” the true mean, as there can still be systematic errors. Look at the VIM’s definition of trueness
The average of an infinite number of measurements is assumed to eliminate all random errors, after that any difference between that and the true value is down to systematic errors.
“The mean is
435/13 = 33.461538461538 -> 33
using significant digit rules applicable to measurements.”
I don’t know what “rules” of digits you are using, but rounding to 33 is wrong by all I’ve seen. Firstly, when you divide one number by another you should report the result to the same number of significant figures as the one with the least number of significant figures.. In this case you have 435 which is 3sf, and you have 13 which is an exact number and so has an unlimited number of significant figures. Therefore the result should be given to 3sf, i.e. 33.5. Secondly, if a result is going to be used in a calculation, you should include more digits in order to reduce rounding errors. I think Taylor says you need at least 2 more digits, but really I cannot think of a reason to round of at all, especially if you are not calculating by hand.
If this was a math problem, you’d just leave it as 435/13. What’s (435/13) X 13? It’s not 429.
Bellman August 29, 2021 12:52 pm
Here are the rules for significant digits, same as the ones as taught by Mr. Hedji, my high school math teacher.
So as you can see … rounding to 33 is correct.
Regards,
w.
Did your teacher not have a rule for exact numbers?
http://www.astro.yale.edu/astro120/SigFig.pdf
Sure. And there’s a good definition in your link:
However, in the example above of 435/13, those are not exact numbers.
w.
13 is. It’s the exact number of samples.
Thanks, Bellman. You are 100% correct. As a result, the average is good to three significant figures, that is to say, 13.5.
w.
I’m not sure what you are trying to do in the final part. Why take three different samples each of 5? Are those 13 figures the population or were they a sample? IT doesn’t really mater though becasue it’s the reporting of the result that is wrong.
“So the sample mean is 33 +/- 2.4”
The “rule” is that you should report the result with digits to the same position as that of the uncertainty. Here you are reporting uncertainty to 1dp, so the result should also reported to 1dp. You should have said 32.7 ± 2.4, or if you followed Taylors advise to only use 1sf in the uncertainty it would be 33 ± 2.
Of course, if you took a much larger sample size than 3, the SEM would be much smaller and you’d be able to report the average to more decimal places.
If you want a better example of how to calculate and report a mean derived from multiple measurements, see the example in Taylor pages 102-103. He takes 10 measures of a spring (in N/m) each given to an integer.
86, 85, 84, 89, 85, 89, 87, 85, 82, 85
He takes the average as 85.7 N/m (Note he doesn’t round this to 86), and uses this 1dp average to calculate the standard deviation as 2.16 N/m. Only then does he round this to 2 N/m, using his 1sf for uncertainty values, and uses this as an estimate of the uncertainty of measurements of other springs.
In the next section he returns to this example to illustrate the standard error of the mean. He uses the standard deviation rounded to 2sf (2.2 N/m), to give 0.7 N/m for the standard deviation of the mean. This has been rounded to 1sf again as it’s the uncertainty. Which allows him give a final answer, based on these 10 measurements, of
85.7 ± 0.7 N/m.
Not this measure is give to 1dp despite all the measurements only being to the nearest integer. This is in keeping with the rule he gives that the answer should be given to the number places as the uncertainty.
Second point. Sorry, no idea what you are getting at there. The quote is talking about estimating the measurement precision for individual readings by taking the sample standard deviation. The sample size in the example is 5, and there’s no suggestion this is the entire population. Obviously it isn’t because the population is all possible measurements.
Go learn what precision is.
If I have a digital device that has a 3 digit digit display I can not use statistics to increase the precision. I can take an infinite number of readings, but I can”t get anything but 3 digits. you can’t make 158 into 158.023 using math. It is as simple as that.
“Go learn what precision is.”
It would help you you specified what type of precision you mean. In particular are you talking about precision as the number of digits reported, or do you mean in the sense of how close repeated measurements are to each other?
“If I have a digital device that has a 3 digit digit display I can not use statistics to increase the precision. I can take an infinite number of readings, but I can”t get anything but 3 digits.”
Yes you can. I demonstrated how to do it elsewhere, using you voltmeter example. All it requires is there to sufficient randomness and for the randomness to be independent in the results so you get different readings.
Third point – still not sure what you think the problem is. The quite is just explaining the difference between calculating a sample standard deviation and one for the population. It’s very standard statistics. Here’s the Khan Academy’s explanation
https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data/variance-standard-deviation-sample/a/population-and-sample-standard-deviation-review
You again seem to think that the 5 measures are the population. They are not, they are sample measures and the intention is to estimate the population mean (the true value) and use the standard error of the mean to estimate how precisely the sample mean does that.
Not once have you mentioned how significant digit rules are applied in obtaining averages of measurements. Regardless if you are calculating the mean of a population or samples from a population.
Did you not take any physical science classes where you had to follow these rules? Did the professors let you get away with increasing the precision of the measurement because 10 of you in your lab averaged all your readings? Why do you think that is?
I’ve explained elsewhere how significant digits should be used in averaging. You seem to think there is some rule that says averages must only be reported to the same number of decimal places as each measurement. I think this is wrong but I’ll ask you to provide some source for these alleged rules.
As I’m sure I’ve mentioned before I’m not a scientist or engineer. I don’t care what you teachers told you or if you understood what they were trying to tell you – the idea that an average does not get more precise with sample size is just wrong.
I believe it’s correct that an exact number has an infinite number of significant figures, and so if 12345.6 is divided by 10000, the mean is 1.23456. However, that’s just an intermediate step. To get the standard deviation, you have get x-xµ which is going to be affected by sig figs again.
Given x=1.9 and xµ = 1.23456, the difference 1.9 – 1.23456 = 0.66544. But since 1.3 has the least decimal places, the difference becomes 0.7.
So we take those 10,000 differences and sum the squares to get 3175.7. Divide by N and take the root and you have 0.3 for the standard deviation, and 0.003 for the error in the mean.
But I think the critical aspect is that these are measurements with a known inaccuracy of 0.05cm. That inaccuracy can’t be made to go away, so in the end the mean of the measurements has to be 1.2cm +/- 0.05cm. Even the standard derivation and error in the mean have this instrument inaccuracy that can’t be made to go away, so even they have some +/- instrument error at the end.
If the inaccuracy (bias) is known to be +0.05cm or -0.05cm then you just subtract or add the 0.05cm inaccuracy (bias) from any measurement or average of measurements.
It is the inaccuracy (bias) that you don’t know that is problematic. That is why you should be using different instruments to form the sample. That way you don’t contaminate the sample with the same inaccuracy (bias) on every measurement.
I think you are just illustrating why you shouldn’t round intermediate results, or at the very least you need to include extra significant figures, as Taylor suggests at least 1 additional figure for any intermediate result, otherwise you are just going to have a danger of rounding errors. Personally, assuming you are not doing the calulations by hand, I can think of no reason to ever truncate numbers before the final result. It’s just adding work to get a potentially less accurate result.
When you say the measurements have a known inaccuracy, I assume you mean a known uncertainty. You cannot make those uncertainties go away, but you can reduce them by taking several measurements.