Limitations of the Central Limit Theorem

Guest Essay by Kip Hansen — 17 December 2022

The Central Limit Theorem is particularly good and valuable especially when have many measurements that have slightly different results. Say, for instance, you wanted to know very precisely the length of a particular stainless-steel rod. You measure it and get 502 mm. You expected 500 mm. So you measure it again: 498 mm. And again and again: 499, 501. You check the conditions: temperature the same each time? You get a better, more precise ruler. Measure again: 499.5 and again 500.2 and again 499.9 — one hundred times you measure. You can’t seem to get exactly the same result. Now you can use the Central Limit Theory (hereafter CLT) to good result. Throw your 108 measurements into a distribution chart or CLT calculator and you’ll see your central value very darned close to 500 mm and you’ll have an idea of the variation in measurements.

While the Law of Large Numbers is based on repeating the same experiment, or measurement, many times, thus could be depended on in this exact instance, the CLT only requires a largish population (overall data set) and the taking of the means of many samples of that data set.

It would take another post (possibly a book) to explain the all the benefits and limitations of the Central Limit Theory (CLT), but I will use a few examples to introduce that topic.

Example 1:

You take 100 measurements of the diameter of ball bearings produced by a machine on the same day. You can calculate the mean and can estimate a variance in the data. But you want a better idea, so you realize that you have 100 measurements from each Friday for the past year. 50 data sets of 100 measurements, which if sampled would give you fifty samples out of 306 possible daily samples of the total 3,060 measurements if you had 100 samples for every work day (six days a week, 51 weeks).

The central limit theory is about probability. It will tell you what the most likely (probable) mean diameter is of all your ball bearings produced on that machine. But, if you are presented with only the mean and the SD, and not the full distribution, it will tell you very little about how many ball bearings are within specification and thus have value to the company. The CLT can not tell you how many or what percentage of the ball bearings would have been within the specifications (if measured when produced) and how many outside spec (and thus useless). Oh, the Standard Deviation will not tell you either — it is not a measurement or quantity, it is a creature of probability.

Example 2:

The Khan Academy gives a fine example of the limitations of the Central Limit Theorem (albeit, not intentionally) in the following example (watch the YouTube if you like, about ten minutes) :

The image is the distribution diagram for our oddly loaded die (one of a pair of dice). It is loaded to come up 1 or 6, or 3 or 4, but never 2 or 5. But twice more likely to come 1 or 6 than 3 or 4. The image shows a diagram of expected distribution of the results of many rolls with the ratios of two 1s, one 3, one 4, and two 6s. Taking the means of random samples of this distribution out of 1000 rolls (technically, “the sampling distribution for the sample mean”), say samples of twenty rolls repeatedly, will eventually lead to a “normal distribution” with a fairly clearly visible (calculable) mean and SD.

Here, relying on the Central Limit Theorem, we return a mean of ≈3.5 (with some standard deviation).(We take “the mean of this sampling distribution” – the mean of means, an average of averages).

Now, if we take a fair die (one not loaded) and do the same thing, we will get the same mean of 3.5 (with some standard deviation).

Note: These distributions of frequencies of the sampled means are from 1000 random rolls (in Excel, using fx=RANDBETWEEN(1,6) – that for the loaded die was modified as required) and sampled every 25 rolls. Had we sampled a data set of 10,000 random rolls, the central limit would narrow and the mean of the sampled means — 3.5 —would become more distinct.

The Central Limit Theorem works exactly as claimed. If one collects enough samples (randomly selected data) from a population (or dataset…) and finds the means of those samples, the means will tend towards a standard or normal distribution – as we see in the charts above – the values of the means tend towards the (in this case known) true mean. In man-on-the-street language, the means are clumping in the center around the value of the mean at 3.5, making the characteristic “hump” of a Normal Distribution. Remember, this resulting mean is really the “mean of the sampled means”.

So, our fair die and our loaded die both produce approximate normal distributions when testing a 1000 random roll data set and sampling means. The distribution of the mean would improve – get closer to the known mean – if we had ten or one hundred times more of the random rolls and equally larger number of samples. Both the fair and loaded die have the same mean (though slightly different variance or deviation). I say “known mean” because we can, in this case, know the mean by straight-forward calculation, we have all the data points of the population and know the mean of the real-world distribution of the dies themselves.

In this setting, this is a true but almost totally useless result. Any high school math nerd could have just looked at the dies, maybe made a few rolls with each, and told you the same: the range of values is 1 through 6; the width of the range is 5; the mean of the range is 2.5 + 1 = 3.5. There is nothing more to discover by using the Central Limit Theorem against a data base of 1000 rolls of the one die – though it will also tell you the approximate Standard Deviation – which is also almost entirely useless.

Why do I say useless? Because context is important. Dice are used for games involving chance (well, more properly, probability) in which it is assumed that the sides of the dice that land facing up do so randomly. Further, each roll of a die or pair of dice is totally independent of any previous rolls.

Impermissible Values

As with all averages of every type, the means are just numbers. They may or not have physically sensible meanings.

One simple example is that a single die will never ever come up at the mean value of 3.5. The mean is correct but is not a possible (permissible) value for the roll of one die – never in a million rolls.

Our loaded die can only roll: 1, 3, 4 or 6. Our fair die can only roll 1, 2, 3, 4, 5 or 6. There just is no 3.5.

This is so basic and so universal that many will object to it as nonsense. But there are many physical metrics that have impermissible values. The classic and tired old cliché is the average number of children being 2.4. And we all know why, there are no “.4” children in any family – children come in whole numbers only.

However, if for some reason you want or need an approximate, statistically-derived mean for your intended purpose, then using the principles of the CLT is your ticket. Remember, to get a true mean of a set of values, one must add all the values together divide by the number of values.

The Central Limit Theorem method does not reduce uncertainty:

There is a common pretense (def: “Something imagined or pretended“) used often in science today, which treats a data set (all the measurements) as a sample, then take samples of the sample, use a CLT calculator, and call the result a truer mean than the mean of the actual measurements. Not only “truer”, but more precise. However, while the CLT value achieved may have small standard deviations, that fact is not the same as more accuracy of the measurements or less uncertainty regarding what the actual mean of the data set would be. If the data set is made up of uncertain measurements, then the true mean will be uncertain to the same degree.

Distribution of Values May be More Important

The Central Limit Theory-provided mean would be of no use whatever when considering the use of this loaded die in gambling. Why? … because the gambler wants to know how many times in a dozen die-rolls he can expect to get a “6”, or if rolling a pair of loaded dice, maybe a “7” or “11”. How much of an edge over the other gamblers does he gain if he introduces the loaded dice into the game when it’s his roll?

(BTW: I was once a semi-professional stage magician, and I assure you, introducing a pair of loaded dice is easy on stage or in a street game with all its distractions but nearly impossible in a casino.)

Let’s see this in frequency distributions of rolls of our dice, rolling just one die, fair and loaded (1000 simulated random rolls in Excel):

And if we are using a pair of fair or loaded dice (many games use two dice):

On the left, fair dice return more sevens than any other value. You can see this is tending towards the mean (of two dice) as expected. Two 1’s or two 6’s are rare for fair dice … as there is only a single unique combination each for the combined values of 2 and 12. Lots of ways to get a 7.

Our loaded dice return even more 7’s. In fact, over twice as many 7’s as any other number, almost 1-in-3 rolls. Also, the loaded dice have a much better chance of rolling 2 or 12, five times better than with fair dice. The loaded dice don’t ever return 3 or 11.

Now here we see that if we depended on the statistical (CLT) central value of the means of rolls to prove the dice were fair (which, remember is 3.5 for both fair and loaded dice) we have made a fatal error. The house (the casino itself) expects the distribution on the left from a pair of fair dice and thus the sets the rules to give the house a small percentage in its favor.

The gambler needs the actual distribution probability of the values of the rolls to make betting decisions.

If there are any dicing gamblers reading, please explain to non-gamblers in comments what an advantage this would be.

Finding and Using Means Isn’t Always What You Want

This insistence on using means produced approximately using the Central Limit Theorem (and its returned Standard Deviations) can create non-physical and useless results when misapplied. The CLT means could have misled us into believing that the loaded dice were fair, as they share a common mean with fair dice. But the CLT is a tool of probability and not a pragmatic tool that we can use to predict values of measurements in the real world. The CLT does not predict or provide values – it only provides estimated means and estimated deviations from that mean and these are just numbers.

Our Khan academy teacher, almost in the hushed tones of a description of an extra-normal phenomenon, points out that taking random same-sized samples from a data set (population of collected measurements, for instance) will also produce a Normal Distribution of the sampled sums! The triviality of this fact should be apparent – if the “sums divided by the [same] number of components” (the means of the samples) are normally distributed then the sums of the samples must need also be normally distributed (basic algebra).

In the Real World

Whether considering gambling with dice – loaded and fair – or evaluating the usability of ball bearing from the machinery we are evaluating – we may well find the estimated means and deviations obtained by applying the CLT are not always what we need and might even mislead us.

If we need to know which, and how many, of our ball bearings will fit the bearing races of a tractor manufacturing customer, we will need some analysis system and quality assurance tool closer to reality.

If our gambler is going to bet his money on the throw of a pair of specially-prepared loaded dice, he needs the full potential distribution, not of the means, but the probability distribution of the throws.

Averages or Means: One number to rule them all

Averages seem to be the sweetheart of data analysts of all stripes. Oddly enough, even when they have a complete data set like daily high tides for the year, which they could just look at visually, they want to find the mean.

The mean water level, which happens to be 27.15 ft (rounded) does not tell us much. The Mean High Water tells us more, but not nearly as much as the simple graph of the data points. For those unfamiliar with astronomic tides, most tides are on a ≈13 hour cycle, with a Higher High Tide (MHHW) and a less-high High Tide (MHW). That explains what seems to be two traces above.

Note: the data points are actually a time series of a small part of a cycle, we are pulling out the set of the two higher points and the two lower points in a graph like this. One can see the usefulness of a different plotting above each visually revealing more data than the other.

When launching my sailboat at a boat ramp near the station, the graph of actual high tide’s data points shows me that I need to catch the higher of the two high tides (Higher High Water), which sometimes gives me more than an extra two feet of water (over the mean) under the keel. If I used the mean and attempted to launch on the lower of the two high tides (High Water), I could find myself with a whole foot less water than I expected and if I had arrived with the boat expecting to pull it out with the boat trailer at the wrong point of the tide cycle, I could find five feet less water than at the MHHW. Far easier to put the boat in or take it out at the highest of the tides.

With this view of the tides for a month, we can see that each of the two higher tides themselves have a little harmonic cycle, up and down.

Here we have the distribution of values of the high tides. Doesn’t tell us very much – almost nothing about the tides that is numerically useful – unless of course, one only wants the means, which would be just as easily eye-ball guessed from the charts above or this chart — we would get a vaguely useful “around 29 feet.”

In this case, we have all the data points for the high tides at this station for the month, and could just calculate the mean directly and exactly (within the limits of the measurements) if we needed that – which I doubt would be the case. But at least we would have a true precise mean (plus the measurement uncertainty, of course) but I think we would find that in many practical senses, it is useless – in practice, we need the whole cycle and its values and its timing.

Why One Number?

Finding means (averages) gives a one-number result. Which is oh-so–much easier to look at and easier to understand than all that messy, confusing data!

In a previous post on a related topic, one commenter suggested we could use the CLT to find “the 2021 average maximum daily temperature at some fixed spot.” When asked why one would want do to so, the commenter replied “To tell if it is warmer regarding max temps than say 2020 or 1920, obviously.” [I particularly liked the ‘obviously’.] Now, any physicists reading here? Why does the requested single number — “2021 average maximum daily temperature” — not tell us much of anything that resembles “if it is warmer regarding max temps than say 2020 or 1920”? If we also had a similar single number for the “1920 average maximum daily temperature” at the same fixed spot, we would only know if our number for 2021 was higher or lower than the number for 1920. We would not know if “it was warmer” (in regards to anything).

At the most basic level, the “average maximum daily temperature” is not a measurement of temperature or warmness at all, but rather, as the same commenter admitted, is “just a number”.

If that isn’t clear to you (and, admittedly, the relationship between temperature and “warmness” and “heat content of the air” can be tricky), you’ll have to wait for a future essay on the topic.

It might be possible to tell if there is some temperature gradient at the fixed place using a fuller temperature record for that place…but comparing one single number with another single number does not do that.

And that is the major limitation of the Central Limit Theorem

The CLT is terrific at producing an approximate mean value of some population of data/measurements without having to directly calculate it from a full set of measurements. It gives one a SINGLE NUMBER from a messy collection of hundreds, thousands, millions of data points. It allows one to pretend that the single number (and its variation, as SDs) faithfully represents the whole data set/population-of-measurements. However, that is not true – it only gives the approximate mean, which is an average, and because it is an average (an estimated mean) it carries all of the limitations and disadvantages of all other types of averages.

The CLT is a model, a method, that will produce a Mean Value from ANY large enough set of numbers – the numbers do not need to be about anything real, they can be entirely random with no validity about anything. The CLT method pops out the estimated mean, closer and closer to a single value whenever more and more samples from the larger population are supplied it. Even when dealing with scientific measurements, the CLT will discover a mean (that looks very precise when “the uncertainty of the mean” is attached) just as easily from sloppy measurements, from fraudulent measurements, from copy-and-pasted findings, from “just-plain-made-up” findings, from “I generated my finding using a random number generator” findings and from findings with so much uncertainty as to hardly be called measurements at all.

Bottom Lines:

1. Using the CLT is useful if one has a large data set (many data points) and wishes, for some reason, to find an approximate mean of the data set, then using the principles of the Central Limit Theorem; finding the means of multiple samples from the data set, making a distribution diagram, and with enough samples, by finding the mean of the means, the CLT will point to the approximate mean, and give an idea of the variance in the data.

2. Since the result will be a mean, an average, and an approximate mean at that, then all the caveats and cautions that apply to the use of averages apply to the result.

3. The mean found through use of the CLT cannot and will not be less uncertain than the uncertainty of the actual mean of original uncertain measurements themselves. However, it is almost universally claimed that “the uncertainty of the mean” (really the SD or some such) thus found is many times smaller than the uncertainty of the actual mean of the original measurements (or data points) of the data set.

This claim is a so generally accepted and firmly held as a Statisticians’ Article of Faith that many commenting below will deride the idea of its falseness and present voluminous “proofs” from their statistical manuals to show that they such methods do reduce uncertainty.

4. When doing science and evaluating data sets, the urge to seek a “single number” to represent the large, messy, complex and complicated data sets is irresistible to many – and can lead to serious misunderstandings and even comical errors.

5. It is almost always better to do much more nuanced evaluation of a data set than simply finding and substituting a single number — such as a mean and then pretending that that single number can stand in for the real data.

# # # # #

Author’s Comment:

One Number to Rule Them All as a principal, go-to-first approach in science has been disastrous for reliability and trustworthiness of scientific research.

Substituting statistically-derived single numbers for actual data, even when the data itself is available and easily accessible, has been and is an endemic malpractice of today’s science.

I blame the ease of “computation without prior thought” – we all too often are looking for The Easy Way. We throw data sets at our computers filled with analysis models and statistical software which are often barely understood and way, way too often without real thought as to the caveats, limitations and consequences of varying methodologies.

I am not the first or only one to recognize this – maybe one of the last – but the poor practices continue and doubting the validity of these practices draws criticism and attacks.

I could be wrong now, but I don’t think so! (h/t Randy Newman)

# # # # #

4.8 16 votes

Article Rating

450 Comments

Inline Feedbacks

View all comments

Bellman

December 17, 2022 1:27 pm

“Any high school math nerd could have just looked at the dies, maybe made a few rolls with each, and told you the same: the range of values is 1 through 6; the width of the range is 5; the mean of the range is 2.5 + 1 = 3.5.”

And what if the dice were loaded so that 6 came up more often and 1 less often? How would your high school nerd figure what the average would be then?

Kip Hansen

Author

Reply to Bellman

December 17, 2022 2:07 pm

Bellman ==> That’s why he rolled the die a few times…..he’s smart (I suppose you were too, in high school).

If he wanted an average of your differently loaded die, he’d need more rolls. He would have seen that six was coming up too often — to be fair, he might have noticed with MY loaded die that 2 and 5 never came up and gotten suspicious, too.

What would you have done in high school?

Bellman

Reply to Kip Hansen

December 17, 2022 2:41 pm

How many times does he have to throw it to determine there is a bias? How do you establish it is biased without using the dreaded statistics?

“What would you have done in high school?”

Never been to high school.

Rud Istvan

December 17, 2022 2:34 pm

Late to comment, but thought would let other comments play out. CTL is a theorem. In mathematics that means it is rigorously proven true. Now the rigor part means the underlying ‘axiomatic assumptions’ hold. For probabilistic statistics, CTL has exactly four, and in practice one or more are often not met. People mistakenly rely on CTL because they don’t check the rigor:

Random sampling. But stuff like convenience sampling isn’t random. In climate, sampling long record weather stations or tide gauges because they exist is convenient but not truly random geographically if one asserts some global mean.
Independent sample data, But in climate, time series partial autocorrelation means the data is usually NOT fully independent. The Hearst coefficient is but one way to show this. Red noise, not independent white noise. This tripped up Mann’s hockey stick big time.
If w/o replacement (replacement =>put the colored marble back in the urn after sampling it, then shake the urn before drawing the next sample) then the sample size for estimating the mean must be <10% of the population. Most sampling is NOT with replacement.
Each sample size must be N>30, which means via (3) that the sampled data population must be N > 300 for CTL to hold. Rules out Arctic ice and polar bears.

Kip Hansen

Author

Reply to Rud Istvan

December 17, 2022 3:33 pm

Rud ==> Thanks for the details of some of the Limitations, Caveats, and Necessary Conditions.

Even using it correctly, one gets something that is not quite the same as a measured mean and the SDs are not really the same as the uncertainty of the measurements (or the mean, really).

Pat Frank

Reply to Kip Hansen

December 17, 2022 4:46 pm

One only gets an estimate of the mean and the SD of the estimate. One gets no information about systematic measurement uncertainty nor is that uncertainty diminished.

Many seem to think employing the CLT normalizes a set of measurements, allowing use of the 1/sqrtN rule to diminish measurement uncertainty. The CLT does no such thing.

Kip Hansen

Author

Reply to Pat Frank

December 17, 2022 4:50 pm

Pat ==> So refreshing to have someone simply agree to the basic point of this essay.

bdgwx

Reply to Pat Frank

December 17, 2022 7:55 pm

Pat Frank said: “Many seem to think employing the CLT normalizes a set of measurements, allowing use of the 1/sqrtN rule to diminish measurement uncertainty. The CLT does no such thing.”

This statement is inconsistent with JCGM 100:2008.

-4

Jim Gorman

Reply to bdgwx

December 18, 2022 6:10 am

You are mistaken. The CLT’s use is in taking samples of a population. That is when you can make multiple measurements of the same thing.

I believe Pat is referencing the idea that you can reduce the uncertainty of an average of multiple single measurements by dividing by √n where n is the number of single measurements.

That just doesn’t work, even by rules in the JCGM 100:2008.

Look at the attached image from WIKI:

Central limit theorem – Wikipedia

See the little qualifier that says, “a sequence of i.i.d. random variables“.

Single measurements of temperature, while they may be independent (question about that), they are not “identical distributions” since they only consist of 1 value that are not equal so they are not identically distributed.

bdgwx

Reply to Rud Istvan

December 18, 2022 5:40 am

Rud Istvan: “Each sample size must be N>30, which means via (3) that the sampled data population must be N > 300 for CTL to hold.”

Using the NIST uncertainty machine you can see that the measurement model Y = a + b + c where a, b, and c are all rectangular inputs yields an output that is close to normal. And with 4 inputs it is almost perfectly normal.

4 of these summed equals this.

Here is the configuration I used.

version=1.5
seed=55
nbVar=4
nbReal=1000000
variable0=a;10;-1;1
variable1=b;10;-1;1
variable2=c;10;-1;1
variable3=d;10;-1;1
expression=a+b+c+d
symmetrical=false
correlation=false

Jim Gorman

Reply to bdgwx

December 18, 2022 6:43 am

You have proven nothing about temperatures and the CLT.

Set each variable to ONE value, just like a temperature reading. Assign each random variable a different uncertainty value. Run the program and see what you get for a combined uncertainty.

Richard S J Tol

Reply to Rud Istvan

December 18, 2022 6:11 am

Lyapunov’s Central Limit Theorem indeed assumes 1-3 (but not 4). There are, however, many later extensions that relax these assumptions and uphold the original result.

Jim Gorman

Reply to Richard S J Tol

December 18, 2022 8:03 am

That may be true. But, show us how daily midrange (Tavg) temps meet the restrictions of the other CLT variants. Show how monthly averages at a station meet the restrictions of other CLT variants. Then show how monthly averages of different stations meet the restrictions of other CLT variants.

Have you ever seen a paper that dealt with these issues? I sure haven’t. Just blind and blithe arithmetic averages with afterward justification of the “CLT”.

Bellman

Reply to Jim Gorman

December 18, 2022 8:46 am

Taking the mean of max and min values has nothing to do with the CLT. They are not random samples. It is just doing what Kip says in this essay, where he estimates the mean of a die by averaging the range of values.

As I’ve said before I’mnot sure if it makes any sense to treat a set of daily measurents as if they are a sample of the month. The average monthly temperature is that months average daily temperature. It’s an exact average with the only uncertainty being from measurements and missing data.

Monthly values of different stations are closer to a random sample, but still not that much. That’s why you don’t just take an average of all stations and calculate the SEM. If that’s all the papers you’ve read are doing that for the uncertainty of the monthly global anomaly estimates are doing, I’d agree that would be a poor analysis.

But that does not mean you can just assume the uncertainty of the global average is equal to the uncertainty of any individual measurement. Let alone claim it’s equal to the sum of all the uncertainties.

-1

Pat Frank

Reply to Bellman

December 18, 2022 9:02 am

“equal to the sum of all the uncertainties”

It’s the root-mean-square of all the systematic errors and the instrumental resolution.

Bellman

Reply to Pat Frank

December 18, 2022 9:57 am

My question remains, how can the uncertainty of the average be the same as the sum, either direct or through toot-mean-square?

The logic, which I’ve been arguing against for the past two years, is that an average based on 10,000 temperature readings can have an uncertainty of ±50°C, and increasing to 1,000,000 increases the uncertainty to ±500°C.

karlomonte

Reply to Bellman

December 18, 2022 1:08 pm

No, what you’ve been running away from is that your 1-million readings will have a vanishingly small uncertainty.

Nonphysical nonsense!

-1

Bellman

Reply to karlomonte

December 18, 2022 1:47 pm

They won’t. I’ve repeatidly told you why this isn’t something I agree with, I have not run away from it. You just ignore all the times I explain this to you, because you want to avoid answering the question I put to you. Do you agree with those who say uncertainty of the average is the same as uncertainty of the sum.

Considering how many times you accuse me of “Jedi mind tricks” when I ask you a question, or refuse to answer on the grounds that “you cannot be educated”, I think you’re in no position to accuse me of running away.

Pat Frank

Reply to Bellman

December 18, 2022 4:18 pm

“those who say uncertainty of the average is the same as uncertainty of the sum”

No one here says that.

Bellman

Reply to Pat Frank

December 18, 2022 5:03 pm

Tim Gorman says it all the time. It’s the main reason we’ve been arguing for almost 2 years.

See for example this from him:.

q = x + y
u(q) = u(x) + u(y)
q_avg = (x + y) /2
u(q_avg) = u(x) + u(y) + u(2) = u(x) + u(y)

https://wattsupwiththat.com/2022/12/09/plus-or-minus-isnt-a-question/#comment-3650683

karlomonte says it in the same comment section

So…
u(q_avg) = u(q_sum)
Oh my! How did this happen?!??

https://wattsupwiththat.com/2022/12/09/plus-or-minus-isnt-a-question/#comment-3650778

Pat Frank

Reply to Bellman

December 18, 2022 9:14 pm

My mistake.

karlomonte

Reply to Pat Frank

December 19, 2022 4:20 am

What bellman and bgw are not stating is that they use sigma/root(N) to justify two things:

1) Increasing the resolution of 1-degree air temperature averages by a factor of 100 (or more).

2) Reducing/removing realistic “error bars” on air temperature trend graphs that would otherwise make the tiny changes invisible.

As Jim Gorman wrote:

Again there are a lot of deflections going on here.
The real issue that originated this discussion is whether the Standard Error allows the addition of more resolution to the calculation of a mean. It does not. Without this, much of the anomaly resolution would disappear.

Bellman

Reply to karlomonte

December 19, 2022 7:20 am

Lying and deflecting again.

1) I have little understanding of how to properly assess the uncertainty in a global tpersture anomaly index. All I’ve been saying is that you don’t understand how your own equations work and the logic of them is that measurement and sampling uncertainties reduce when you take larger samples – under all the normal assumptions. This might be a start for understanding how global temperature uncertainties can be more accurate than any one individual measurement. But it is not simply a question of dividing anything by sqrt N.

1b) No quoted uncertainty interval for monthly anomalies suggests anything like an uncertainty of less than 0.01°C. It’s usually more like 0.05 for recent valeus get a lot larger for 19th and early 20th century measurements.

2) I’ve explained this to you many times before, error bars have little if any impact on the linear trend. You don’t understand this so just call it “trendology”.

2b) You never get the irony of you lapping up Monckton’s trend, presented with zero uncertainy on a trend of 8 years. Irrespective of the measurement uncertainties, which you claim are at least 1.4°C for monthly data, the real uncertainty is still vast in that trend because of the internal variability.

-1

Jim Gorman

Reply to Bellman

December 18, 2022 1:19 pm

The question is why you think averaging 10000 single measurement would give you an uncertainty of 0.5/100=0.005, or worse, 0.5/10000=0.00005 as some claim.

Bellman

Reply to Jim Gorman

December 18, 2022 1:41 pm

I don’t.

That might be the measurement uncertainty in an ideal world, but there will always be some idealized world where the only uncertainties came from absolutely random measurement error, but in the real world there will always be systematic errors.

Also, as I keep saying, I’m not really interested in the measurement uncertainty, it’s tiny compared to the sampling uncertainty.

Finally, I’ve no idea where you got “0.5/10000=0.00005”. It’s not something I’ve ever said, and makes no sense. You’re basically assuming the sum of 10000 instruments could have an uncertainty of 0.5, which makes no sense. The measurement uncertainty of 10000 thermometers each with an uncertainty of ±0.5°C, can either be 0.5 / 100 = 0.005, assuming all uncertainties are independent, or 0.5 assuming completely dependent measurement errors. And almost certainly the truth would be somewhere in between.

And none of this is “the question”. You are just trying to deflect from why some here think uncertainties increase with sampling.

karlomonte

Reply to Bellman

December 18, 2022 1:51 pm

And you still can’t understand that error and uncertainty are different!

Bellman

Reply to karlomonte

December 18, 2022 1:58 pm

And you still don;t understand that I don’t care. I’ll use which ever word is most appropriate to the situation, especially if it annoys you.

karlomonte

Reply to Bellman

December 18, 2022 4:38 pm

Then don’t be shocked when the stuff you write isn’t taken seriously by people who do understand the difference.

-1

Bellman

Reply to karlomonte

December 18, 2022 5:32 pm

Do you say the same about Kip Hansen? His last post kept mentioning error and uncertainty in the same breath. E.g.

To be absolutely correct, the global annual mean temperatures have far more uncertainty than is shown or admitted by Gavin Schmidt, but at least he included the known original measurement error (uncertainty) of the thermometer-based temperature record.

karlomonte

Reply to Bellman

December 19, 2022 6:34 am

Also, as I keep saying, I’m not really interested in the measurement uncertainty, it’s tiny compared to the sampling uncertainty.

Without numbers, this assertion is nothing hand-waved word salad. But it is gratifying to see you admit that, despite all your quotings and whinings, you really don’t care about UA, as long as it doesn’t hinder the watermelon party line(s).

Bellman

Reply to karlomonte

December 19, 2022 7:35 am

You’re an idiot. You really don’t get that I’m saying that the sampling uncertaintyis bigger than the measurement uncertainty.

Let’s give you some hypothetical figures, using Tim’s original example. 100 independent measures of 100 different thermometers each with an entirely random measurement uncertainy of ±0.5°C.

Looking just at the measurement uncertainty then the uncertainty of the average is 0.5 / √100 = ±0.05°C.

No assume that these 100 reading were from a range of different places, maybe the standard deviation is 5°C. It would probably be more, but let’s keep things simple. SEM based on the addition that these are random iid values is 5 / √100 = ±0.5°C.

If we want to combine these two uncertainties we could just add them and get 0.55, but assuming the measurement uncertainties are independent of the temperature we could come them in quadrature, sqrt(0.5^2 + 0.05^2) = 0.50 to two decimal places.

Bellman

Reply to Bellman

December 19, 2022 7:51 am

Now, if you can assume that there is a substantial systematic error in all your readings then that might become important. The measurement uncertainty doesn’t reduce and ends up becoming the dominent component of the overall uncertainty. If every thermometer would be reading 0.5°C too warm if too cold, then errors won’t cancel, and the combined uncertainty would be more like ±0.7°C.

And this would mean that even if you could take an infinite number of random measurements and get the sampling uncertainty down to zero, the uncertainty would still be 0.5.

But, this feels to me unlikely and a problem with your experimental design rather than uncertainty. It’s very probable that a range of measurements made with different instruments will all have the same bias.

Moreover, if we are looking at change, such as a rate of warming, having all readings have same bias would just mean the bias cancels out.

Now ad I’ve said before, if you want to look at possible bias on the tend you don’t really need to worry about the uncertainty in individual months, but in s systematic bias that might be changing over time. And yes, I’m sure that must happen in at least some of the data sets, because I’d the only way to explain the differences in the trends between different data sets.

Pat Frank

Reply to Bellman

December 21, 2022 7:10 am

“If every thermometer would be reading 0.5°C too warm… etc.”

Constant offset error is not the problem field measurements face.

The problem is uncontrolled environmental variables. These put errors of unknown sign and magnitude into every measurement.

The only way to deal with that is by field calibration experiments. These provide an estimated systematic measurement uncertainty that conditions every single field measurement.

That uncertainty never averages away.

This qualifier has been repeatedly provided here, and the same group of people invariably portray ignorance and behave as though the idea is a novelty.

It’s not.

Currie & Devoe (1977) Validation of the Measurement Process

Page 119: “If the systematic error is not constant, it becomes impossible to generate meaningful uncertainty bounds for experimental data.”

This is invariably the case for field air temperature measurements in unaspirated sensors, which are impacted by uncontrolled environmental variables (variable wind speed and variable irradiance) of unknown sign, magnitude, and duration.

Page 129: “Among recommended information to report, is included: “The estimated bounds for systematic error (not necessarily symmetric), … Because of lack of knowledge concerning error distributions and because of the somewhat subjective nature of inferred systematic error bounds, the conservative approach is preferred: simple summation of the random and systematic error bounds,…”

Jim Gorman

Reply to Pat Frank

December 21, 2022 8:28 am

Here is another study that shows varying systematic bias in MMTS weather stations.

“4. Conclusions

Although the MMTS temperature records have been officially adjusted for cooler maxima and warmer minima in the USHCN dataset, the MMTS dataset in the United States will require further adjustment. In general, our study infers that the MMTS dataset has warmer maxima and cooler minima compared to the current USCRN air temperature system. Likewise, our conclusion suggests that the LIG temperature records prior to the MMTS also need further investigation because most climate researchers considered the MMTS more accurate than the LIG records in the cotton-region shelter due to possible better ventilation and better solar radiation shielding afforded by the MMTS (Quayle et al. 1991; Wendland and Armstrong 1993).”

Air Temperature Comparison between the MMTS and the USCRN Temperature Systems in: Journal of Atmospheric and Oceanic Technology Volume 21 Issue 10 (2004) (ametsoc.org)

There are also several studies about the UHI infection of the land temperature data. These are systematic errors that are never, ever corrected in the various temperature databases.

And another.

“Based on the evidence presented in this note, we recommend that the USCRN program move to one of two proposed configurations to make USCRN air temperature measurements. Although the input channels are doubled for these two configurations the measurement errors inherent in the temperature sensor and datalogger system are significantly decreased. For fixed resistor(s) employed in the USCRN sensor, ±0.01% tolerance is applicable, but the TCR of ±10 ppm °C -1 is not sufficient to provide accurate long-term temperature observation.”

On the USCRN Temperature System (unl.edu)

-1

Bellman

Reply to Pat Frank

December 21, 2022 2:29 pm

“The problem is uncontrolled environmental variables. These put errors of unknown sign and magnitude into every measurement.”

But aren’t they then random. Really, I can only see two possibilities, either the signs and magnitudes are variable and cancel to some extent, or the are all the same in which case they are systematic.

Because of lack of knowledge concerning error distributions and because of the somewhat subjective nature of inferred systematic error bounds, the conservative approach is preferred: simple summation of the random and systematic error bounds…

Difficult to comment without more context. The source seems to be talking about analytical chemistry. Do they apply this logic to the average of large samples?

By “conservative approach”, I assume they mean a precautionary approach, allowing for the worst case. I’m sure that’s the correct approach for some fields, but in others concentrating on the most plausible range would seem more appropriate.

karlomonte

Reply to Bellman

December 19, 2022 9:00 am

Have fun in bellcurveman-world, you must have a annual pass.

Jim Gorman

Reply to Bellman

December 19, 2022 7:11 am

Then you also must believe that “anomaly” temperatures to the hundredths and thousandths decimal places can not be derived either!

Bellman

Reply to Jim Gorman

December 19, 2022 8:03 am

Indeed I don’t, but as I keep trying to explain, I don’t have your obsession as to how many decimal places results are give to, the more the merrier as far as I’m concerned.

bdgwx

Reply to Jim Gorman

December 19, 2022 7:49 am

JG said: “The question is why you think averaging 10000 single measurement would give you an uncertainty of 0.5/100=0.005, or worse, 0.5/10000=0.00005 as some claim.”

Nobody thinks that. And it doesn’t make any sense. According to JCGM 100:2008 equation 13 the uncertainty would be between 0.005 and 0.5 depending on correlation matrix r(x_i, x_j). For r(x_i, x_j) = 0 it is 0.005 and for r(x_i, x_j) = 1 it is 0.5. You can also verify this with the NIST uncertainty machine.

Pat Frank

Reply to Bellman

December 18, 2022 4:08 pm

RMS uncertainty is sqrt{[sum over (i = 1→N) of σ²_i]/N}. How do you get ±500°C out of that?

Bellman

Reply to Pat Frank

December 18, 2022 4:38 pm

Sorry. My mistake. I’d read that as root-sum-square for some reason.

bdgwx

Reply to Pat Frank

December 18, 2022 5:54 pm

PF said: “RMS uncertainty is sqrt{ [sum over (i = 1→N) of σ²_i]/N }”

Using JCGM 100:2008 notation that appears to be:

u(Y) = sqrt[ Σ[u(X_i)^2, 1, N] / N ]

Correct? Where are you getting that formula?

Pat Frank

Reply to bdgwx

December 18, 2022 9:13 pm

Your u = my σ. The equations are identical.

bdgwx

Reply to Pat Frank

December 19, 2022 5:33 am

Where are getting that formula?

Pat Frank

Reply to bdgwx

December 19, 2022 5:52 pm

It’s standard in any text on data reduction.

-1

bdgwx

Reply to Pat Frank

December 19, 2022 7:13 pm

Then it should be easy to point me to the reference where you got it.

Pat Frank

Reply to bdgwx

December 21, 2022 7:13 am

It is, and you’ve been pointed to it many times. Studied ignorance, bdgwx. It’s your stock-in-trade.

bdgwx

Reply to Pat Frank

December 21, 2022 7:59 am

The last time I asked you pointed me to Bevington. I can’t find that formula anywhere in Bevington or any other text on uncertainty.

bdgwx

Reply to Pat Frank

December 18, 2022 11:37 am

PF said: “It’s the root-mean-square of all the systematic errors and the instrumental resolution.”

If we can measure the height of males 18-24 years in the US to within 0.01 meters and find the average to be 1.75 meters then are you saying that the uncertainty of that average of all adult males is sqrt(30000000 * 0.01^2) = 54 meters? Do you really think the average could be as low as -52.25 m or as high as 55.75 m with coverage k=1?

Jim Gorman

Reply to bdgwx

December 18, 2022 12:19 pm

You just won’t understand what a measurand is will you? You’re as bad as Mosher. Single measurements of 300,000,000 people has nothing to do with uncertainty of the measurement of A (singular) single measurement. The appropriate statistical parameter you are looking for is Standard Deviation. What you should “predict” is that the next measurement has a 68% chance of being within one σ.

You specified an uncertainty of 0.01m, each measurement has that uncertainty. As much as you like using average as a functional relationship describing a measurand, it is not! Do I need to list what the GUM defines as a measurand again?

Pat Frank

Reply to bdgwx

December 18, 2022 4:14 pm

No.

See the definition of RMS uncertainty here. If you want the standard deviation instead, divide by N-1.

Same old ground, bdgwx.

old cocky

Reply to Pat Frank

December 18, 2022 2:18 pm

Toot mean square or root sum square?

</pedantry?

Richard S J Tol

Reply to Bellman

December 18, 2022 9:57 am

Min and max are random variables. The Central Limit Theorem does not apply — it is about the centre — but other limit theorems do.

Jim Gorman

Reply to Bellman

December 19, 2022 10:25 am

It isn’t the SEM. Try reading the TN1900 Example E2 again. Why did the author state the average as +/- 1.8° C (2.44°F) and mentioned that another variation would make the 95% confidence level +/- 2.0° C (3.6° F).

I wanted to show all these in order to put the conversation back on track as to the ability of an average of SINGLE measurements to have uncertainties that allow the determination of values 3 orders of magnitude smaller than the original measurements.

As much as some folks here want to deal with an average as a measurement, IT IS NOT A MEASURAND with a true value.

An arithmetic mean of a set of numbers is nothing more than a statistical parameter describing the central tendency of the distribution of discreet values used to create the distribution.

Look at Note 4.3(i), in NIST TN1900.

Each observation x = g(y)+ E is the sum of a known function g of the true value y of the measurand and of a random variable E that represents measurement error

Guess what “E” actually is.

In Example E2, “E” is the expanded standard uncertainty for this distribution and is treated as encompassing the significant uncertainties.

How funny that his calculations came up with an expanded standard uncertainty of 1.8° C (2.44° F)! Even better, since the standard uncertainty is 0.872° C, the author had to reduce the resolution of the average temperature to the tenths of a degree!

This is one reason why I have been asking for the variances of the GAT anomaly and other temperature trends. No one seems to want to quote the monthly variance when months are used to determine anomalies.

Worse, they never describe how adding random variables in order to find a mean value affects the total variance. When you add/subtract random variables then divide by “n” to get an average, you must add the variances and divide by “n” also. You can’t just say, lets throw all the numbers together and find the result of the ensuing distribution. Each random variable has its own variance which must be preserved.

Read the following.

Why Variances Add—And Why It Matters – AP Central | College Board

old cocky

Reply to Jim Gorman

December 19, 2022 12:39 pm

An arithmetic mean of a set of numbers is nothing more than a statistical parameter describing the central tendency of the distribution of discreet values used to create the distribution.

Apparently neither of us understand what an average is 🙂

Bellman

Reply to Jim Gorman

December 19, 2022 1:24 pm

“Try reading the TN1900 Example E2 again.”

How many more times do you want me to read it for you?

“Why did the author state the average as +/- 1.8° C (2.44°F) and mentioned that another variation would make the 95% confidence level +/- 2.0° C (3.6° F).”

I’ve explained where the 1.8 figure comes from numerous times. (and I see you are happy to add an extra significant figure to it when converting to an antiquated measurement system. Are you claiming the uncertainty is known to the hundredth of a hundredth of a °F but only to a tenth of a °C?)

The standard uncertainty is calculated from the standard deviation of the 22 values, divided by the square root of 22, i.e. the SEM. A coverage factor is derived from the student-t distribution with 21 degrees of freedom, to give the 95% confidence interval. The coverage factor k is 2.08. Multiply 20.08 by the standard uncertainty of 0.872°C gives 1.81376, which is rounded to the conventional 2 significant figures. So the 95% interval is ±1.8°C.

I really don’t know what more you need explaining.

As is explained in the document the slightly wider uncertainty of ±2.0°C is obtained if you don’t assume the 22 values came from a Gaussian distribution. I couldn’t tell you the details of that model, you would have to look up the supplied sources.

“I wanted to show all these in order to put the conversation back on track as to the ability of an average of SINGLE measurements to have uncertainties that allow the determination of values 3 orders of magnitude smaller than the original measurements.”

Whatever. Even assuming all errors are entirely random you would need 1,000,000 samples to get value three orders of magnitude smaller than the original measurements. It’s a lot of work for little gain. I think Bevington has a section explaining this.

Bellman

Reply to Bellman

December 19, 2022 1:40 pm

“As much as some folks here want to deal with an average as a measurement, IT IS NOT A MEASURAND with a true value.”

And again, I’ll ask you, if you don’t think an average is a measurand, why do you keep talking about the measurement uncertainty of an average?

However, look at this 1900 document. It specifically says of Example 2, that it is calculating a measurand, that is the monthly average.

Measurand & Measurement Model. Define the measurand (property intended to be measured, §2), and formulate the measurement model (§4) that relates the value of the measurand (output) to the values of inputs (quantitative or qualitative) that determine or influence its value. Measurement models may be

…

Observation equations (§7) that express the measurand as a function of the parameters of the probability distributions of the inputs (Examples E2 and E14).

“In Example E2, “E” is the expanded standard uncertainty for this distribution and is treated as encompassing the significant uncertainties.”

E, or better ε is the error term. Error is not uncertainty.

“How funny that his calculations came up with an expanded standard uncertainty of 1.8° C (2.44° F)! Even better, since the standard uncertainty is 0.872° C, the author had to reduce the resolution of the average temperature to the tenths of a degree!”

It would be a lot easier if you explained the point you were making rather than talking in riddles. what’s funny about 1.8? How is 2.44°F reducing something to tenths of a degree? How is 1.8 a reduction when the initial measurements were in 1/4 degrees?

Bellman

Reply to Bellman

December 19, 2022 1:54 pm

“This is one reason why I have been asking for the variances of the GAT anomaly and other temperature trends. No one seems to want to quote the monthly variance when months are used to determine anomalies.”

I expect no-one tells you because they don;t have a clue what you are talking about. If you want to know what the monthly variances why don’t you work them out yourself?

“Worse, they never describe how adding random variables in order to find a mean value affects the total variance.”

I’ve tried to explain it to the pair of you many times, you just don’t like the answer.

“When you add/subtract random variables then divide by “n” to get an average, you must add the variances and divide by “n” also.”

No, you divide by n^2. It should be pretty obvious if you just think what a variance is.

“You can’t just say, lets throw all the numbers together and find the result of the ensuing distribution.”

Strange, people have been “throwing all the numbers together” as you call it, or “performing the correct calculations” as I’d call it for decades.

“Each random variable has its own variance which must be preserved.”

Not if you are talking about a sample mean. Then each item is the same random variable and has the same variance. That’s how you get get the standard error of the mean. var(avg) = N*var(item) / N^2 = var(item) / N, so sd(avg) = sd(item) / √N.

“Read the following.
Why Variances Add—And Why It Matters – AP Central | College Board”

I take it you didn’t read as far as the section on the CLT

Jim Gorman

Reply to Bellman

December 20, 2022 6:08 am

“I expect no-one tells you because they don;t have a clue what you are talking about. If you want to know what the monthly variances why don’t you work them out yourself?”

“don’t have a clue what you are talking about”. Certainly describes you. Get this through you thick skull, if you calculate a mean, you have a distribution. That distribution surrounds the mean and has a variance/standard deviation.

The point is that to be scientific, one should report the statistical parameters that pertain to the distribution used to develop the mean.

“No, you divide by n^2. It should be pretty obvious if you just think what a variance is.”

Did you not look at the image you posted? It says the variance of the random variables add, and then is divided by “n”!

Var(x(bar)) = σ^2 / n.

What do you think the equation says? Just how do you think what I said is different? Please note that if the variances of each random variable is different, you won’t end up with

nσ^2 / n^2 where the “n” cancel.

“Not if you are talking about a sample mean. Then each item is the same random variable and has the same variance.”

You just described IID (independent and Identical Distribution) if they all have the same variance.

Exactly how do you assume that each item is the same random variable? How do you get a distribution at all if the are all the same?

A sample mean is calculated from the means of the samples. If all the samples have the same distribution as the population, then yes, the sample mean will have the same variance as the means of the samples.

Go to this website and read both the instructions AND DO THE EXCERCISES. It will explain better what sampling does.

As a check to see if the SD of the sample means is the SEM (Standard Error of the sample Mean) do the following. Multiply the SD of the the sample mean by the sqrt of the sample size and see if you don’t get the Standard Deviation of the population you have drawn. That is:

σ = SEM • √n,

where “n” is the sample size. The form normally seen is:

SEM = σ / √n

-1

Bellman

Reply to Jim Gorman

December 20, 2022 7:42 am

“Did you not look at the image you posted? It says the variance of the random variables add, and then is divided by “n”!”

Did you? Lines 2 and 3 are showing the sum of the variances, and they are divided by n^2. You are getting confused with what happens when all variances are the same and so it becomes n σ^2 / n^2 = σ^2 / n.

σ^2 is not the sum of the variances, it’s the individual variance.

If the variances are different you can’t make that simplification. Then the equation is just the one in line 3. You can look at that as the variance if the sum divided by n^2, or the average variance divided by n. This is the situation that Kip describes in the opening paragraph. The CLT can still apply but only under certain conditions, which I wouldn’t claim to understand.

“Exactly how do you assume that each item is the same random variable?”

Fair enough, I think I misspoke there. By the same variable, I meant different variables with the same distribution.

Bellman

Reply to Jim Gorman

December 20, 2022 7:54 am

“A sample mean is calculated from the means of the samples.”

No it isn’t. By definition a sample mean is the mean of a (single) sample.

“Go to this website..”

You didn’t post a link, but even if you did, what’s the point. All you ever do is point me to trivial sites that never say what you think they say.

“As a check to see if the SD of the sample means is the SEM (Standard Error of the sample Mean) do the following”

Why do you keep doing this? The standard error of the mean is the standard deviation of the sampling distribution. Nobody argues it is’nt. It’s the definition of the SEM. Why do you keep trying to prove something nobody is disputing?

Jim Gorman

Reply to Bellman

December 20, 2022 7:05 am

“And again, I’ll ask you, if you don’t think an average is a measurand, why do you keep talking about the measurement uncertainty of an average?”

Go to this link:

Decimals of Precision – Watts Up With That?

Search for “E.M.Smith”. This has been going on for a long time. This post to an article by Willis Eisenbach in 2012 tells you how long the issue has been around. This post lays out the problem pretty well.

It all boils down to folks like you claiming that the CLT and the ensuing SEM from it allows one to add digits of precision to averages of measurements with a given resolution.

When showing that recorded temperatures prior to about 1980 were all integers with a +/- 0.5 uncertainty, the argument degenerated by folks wanting to ignore the uncertainty and claim that the CLT obviated the need for recognizing the measurement uncertainty.

If you want to settle the argument, then do the following.

1) Show us the math that allows recorded measurements in 1920 to have an average that is not an integer, the same as the recorded temperatures. Cite where the Significant Digit rules have been waived for climate calculations that all other physical sciences obey.

2) Then show the math that allows one to subtract a 30 year baseline from that average and obtain a result with 2 or 3 decimal places. Cite where the Significant Digit rules have been waived for these climate calculations that all other physical sciences obey.

3) Show why the author of NIST TN1900 reduced the resolution of the measurements when quoting the average. Tell us why he was wrong to do so.

-1

Bellman

Reply to Jim Gorman

December 20, 2022 1:58 pm

You’re avoiding the question I asked. If a mean is not a measurand how can it have a measurement uncertainty?

Jim Gorman

Reply to Bellman

December 20, 2022 3:46 pm

A mean itself is not a measurand any more than a median or mode. Would you call those measurands? If so, why can they all vary amongst themselves?

A mean CAN be a true value under certain assumptions. Yet it is not a measurement taken. If you believe that a “true value = m +/- error”, then the true value can never be measured, can it? It can only be derived if all “errors” cancel, but never directly measured. Even then, it has uncertainty, because each measurement “m” always has uncertainty. That is why we say error is different than uncertainty.

Bellman

Reply to Jim Gorman

December 20, 2022 4:11 pm

“A mean itself is not a measurand any more than a median or mode.”

So, the question remains, if you believe that how can you talk about measurement uncertainty? Taking the GUM defintion

parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand

What measurand are you attributing the dispersion to if not the mean?

“Would you call those measurands?”

I don’t see why not. That NIST document says (my emphasis)

Measurement is an experimental or computational process that, by comparison with a standard, produces an estimate of the true value of a property of a material or virtual object or collection of objects, or of a process, event, or series of events, together with an evaluation of the uncertainty associated with that estimate, and intended for use in support of decision-making.

“If so, why can they all vary amongst themselves?”

I’m not sure what the “they” here are. Do you mean, means modes and medians can all vary, or different sample means?

“If you believe that a “true value = m +/- error”, then the true value can never be measured, can it?”

You can measure it, but you never normally know what thee exact value is. All measurements have an error, hence uncertainty.

old cocky

Reply to Jim Gorman

December 20, 2022 3:24 pm

Here is a reference from Rice University. I especially like the story in the document. Do you think it has any applicability to anomalies?
Significant Figure Rules (rice.edu)

I has me doubts about the “Rounding Off” section.

The story about the cost of precision was nice.

Oops, sorry. That was a reply to the wrong comment. My bad 🙁

Jim Gorman

Reply to old cocky

December 20, 2022 3:34 pm

I found that story a long time ago and didn’t save it. I just re-found it today. I have it saved now. To me it is less a story about rounding than it is about measurements in general and the information that is contained in them!

-1

Bellman

Reply to Jim Gorman

December 21, 2022 2:54 pm

“If you want to settle the argument, then do the following.”

I doubt that the argument will ever end.

“Cite where the Significant Digit rules have been waived for climate calculations that all other physical sciences obey.”

The “rules” are not scientific laws or mathematical theorems. They are just a rule of thumb or a style guide. They don’t need to be waived, just not taken to override actual analysis. If you think they are theorems, you have to demonstrate the proof that these the only possible way of deciding on the number of digits to report.

“Show us the math that allows recorded measurements in 1920 to have an average that is not an integer, the same as the recorded temperatures.”

(1 + 2 + 5 + 6) / 4 = 3.5

The average of integers is not necessarily an integer.

Now show the maths that allows the correct average to be 4, and show that this is a better estimate of the average of the four numbers than 3.5.

“Then show the math that allows one to subtract a 30 year baseline from that average and obtain a result with 2 or 3 decimal places.”

If the average is calculated to 2 or 3 decimal places and the base period is calculated to 2 or 3 decimal places, then the SF rules for subtraction require the result to be 2 or 3 decimal places.

“Show why the author of NIST TN1900 reduced the resolution of the measurements when quoting the average. Tell us why he was wrong to do so.”

Why would I tell you he was wrong. I’ve shown you several times in course of these comments how it was obtained and why it’s correct. I’ve also explained why it is not reducing the resolution of the measurements – the resolution of the measurements is 1/4 °C, or 0.25°C. The result is given to 0.1°C. The rules of significant figures has allowed the average to be written with an increased resolution.

Jim Gorman

Reply to Bellman

December 20, 2022 7:15 am

“I’ve explained where the 1.8 figure comes from numerous times. (and I see you are happy to add an extra significant figure to it when converting to an antiquated measurement system. Are you claiming the uncertainty is known to the hundredth of a hundredth of a °F but only to a tenth of a °C?)”

Because °C has 100 divisions between the same end points while °F has 212. In other words, °F has a higher resolution.

“Whatever. Even assuming all errors are entirely random you would need 1,000,000 samples to get value three orders of magnitude smaller than the original measurements. It’s a lot of work for little gain. I think Bevington has a section explaining this.”

You just hit the crux of the problem. Congratulations. Now explain how anomalies from 1920 have been computed to at least 2 decimal places and in some graphs, 3 decimal places.

bdgwx

Reply to Jim Gorman

December 20, 2022 8:08 am

JG said: “Now explain how anomalies from 1920 have been computed to at least 2 decimal places and in some graphs, 3 decimal places.”

They are almost certainly using IEEE 754 arithmetic which means the anomalies and uncertainties of the anomalies are actually computed to 15 digits. We are only seeing 2 or 3 of those digits after the decimal place though.

Jim Gorman

Reply to bdgwx

December 20, 2022 2:28 pm

You are spouting computer floating point arithmetic dude, why am I not surprised.

What does the IEEE 754 specification have to do with reporting measurements to an appropriate resolution? Show us a reference from that standard that deals with measurements at all.

Here is a reference from Rice University. I especially like the story in the document. Do you think it has any applicability to anomalies?

Significant Figure Rules (rice.edu)

And here are several more references. Note, they all talk about measurements. Your IEEE reference does not.

https://www.me.ua.edu/me360/spring05/Misc/Rules_for_Significant_Digits.pdf

https://www.physics.uoguelph.ca/significant-digits-tutorial

https://sites.middlebury.edu/chem103lab/2018/01/05/significant-figures-lab/

https://ndep.nv.gov/uploads/water-wpc-permitting-forms-docs/guide-signifcant-figure-rounding-2017.pdf

bdgwx

Reply to Jim Gorman

December 20, 2022 7:06 pm

JG said: “What does the IEEE 754 specification have to do with reporting measurements to an appropriate resolution?”

I’m pointing out that calculations on modern computing equipment will spit out 15 digits both for the value of interest and the uncertainty of that value. IEEE 754 doesn’t care about significant figure rules. That doesn’t mean that when you see 15 digits you should assume the uncertainty is ±1e-15 especially when the uncertainty itself is also provided. For example, Berkeley Earth reported 1.058 ± 0.055 C for 2022/10 and both of those values are almost certainly calculated/stored with several more digits than what is in the public file. Just because you see 1.058 C does not mean the uncertainty is ±0.001 C. We know it isn’t because they tell us it is actually ±0.055 C.

Jim Gorman

Reply to bdgwx

December 21, 2022 7:39 am

“For example, Berkeley Earth reported 1.058 ± 0.055 C for 2022/10 and both of those values are almost certainly calculated/stored with several more digits than what is in the public file.”

You make my point better. How does Berkeley justify those numbers? First, following most recommendations for reporting, that should be 1.06 ± 0.06° C.

How does Berkeley reconcile that minimal uncertainty with ASOS having a ±1.8° F
error? This is just another example of stating temperatures far exceeding the resolution of the instruments used to measure them.

You appear to be unable to accept the fact that a measurement resolution conveys a given amount of information. Adding additional information that wasn’t measured to that resolution is an act of fantasy fiction. It really doesn’t matter how many digits a computer can store, the only digits that count are the measured ones as determined by the measuring devices resolution.

You haven’t answered my question about what high level lab courses you have had. I suspect that is part of the problem you have with instrument resolution. It is not the same as dealing with counting numbers that are exact or that can be divided into smaller and smaller chunks that fit your fancy.

bdgwx

Reply to Jim Gorman

December 21, 2022 7:58 am

JG said: “How does Berkeley justify those numbers?”

I’m going to give you the same answer this time as I did the countless other times you asked it. Rohde et al. 2013.

JG said: “ First, following most recommendations for reporting, that should be 1.06 ± 0.06° C.”

It is my understanding that they were originally doing it that way, but people complained about having too few digits.

JG said: “How does Berkeley reconcile that minimal uncertainty with ASOS having a ±1.8° F
error?”

Can you post a link to the publication saying the uncertainty on ASOS measurements is ±1.8° F?

Jim Gorman

Reply to bdgwx

December 20, 2022 3:04 pm

You are joking I hope. Take the computation out as far as the computer will take it, then round to two or three decimal places just because you can.

You have read the GUM, NIST documents, metrology textbooks, and numerous posts but you insist on falling back to plain old calculator display digits being representative of actual physical measurements.

You are a mathematician who was taught using absolute accuracy in any number you were given. You still have that mind set.

Tell everyone, have you ever had an advanced lab class in physics, chemistry, electrical, research biology in your life that required measurements and resolving them to an answer? I will bet you have not. Have you ever had a job where measurements were the primary driver of what you did?

bdgwx

Reply to Jim Gorman

December 21, 2022 6:26 am

No. I’m not joking. karlomonte has told me repeatedly that when you see a value x published you are supposed to ignore the u(x) value that is published along side it and focus only on how many digits x had to infer its uncertainty. And it was either your or Tim that told me that if you aren’t using the proper sf rules then it means the whole calculation was wrong.

karlomonte

Reply to bdgwx

December 21, 2022 6:56 am

Idiot.

Bellman

Reply to Jim Gorman

December 20, 2022 9:48 am

Think about what you are saying. F has twice the resolution as C, hence each digit represents just half the width. If it’s correct to only report the mean to the nearest 0.1 C, how can it be OK to report it to 0.01 F? 0.01 F is about 0.005 C, so you are claiming that mearly converting to a different measurement scale can reducee your uncertainty by a factor of 200.

Bellman

Reply to Jim Gorman

December 20, 2022 9:56 am

Continued.

You can compute anomalies to anynumber of digits. The question is how meaningful or useful all the digits are. Then the next question is what difference do you think it would make to use 2, 3 or 4 digits in your calculations or graphs. Personally I prefer it if they give more digits than are useful and let me round the final result. If there is no difference in the result you haven’t lost anything by using more digits than are necessary, and if there is a difference, why would you assume the one based on rounded numbers would-be more correct.

Bellman

Reply to Jim Gorman

December 20, 2022 1:43 pm

By the way, if the uncertainty range is ±1.8°C, that would be ±3.24°F (not 2.44), and using Taylor’s rules would have to be written as ±3°F, given the first digit isn’t 1.

Jim Gorman

Reply to Bellman

December 20, 2022 2:33 pm

Good for you. Now address the real question instead of deflecting and dancing around it.

“Whatever. Even assuming all errors are entirely random you would need 1,000,000 samples to get value three orders of magnitude smaller than the original measurements. It’s a lot of work for little gain. I think Bevington has a section explaining this.”

You just hit the crux of the problem. Congratulations. Now explain how anomalies from 1920 have been computed to at least 2 decimal places and in some graphs, 3 decimal places.

-1

Bellman

Reply to Jim Gorman

December 20, 2022 3:22 pm

Already answered.

bigoilbob

Reply to Bellman

December 21, 2022 5:44 am

https://wattsupwiththat.com/2022/12/16/limitations-of-the-central-limit-theorem/#comment-3653232

Has Dr. Frank set the Gormans and Glee Clubber karlo straight yet? I’m getting a little cyanotic waiting….

bdgwx

Reply to bigoilbob

December 21, 2022 10:08 am

Here is the summary of positions people hold for the measurement model Y = Σ[X_i, 1, N] / N. If you are listed and think I’ve made mistake in your position post back and declare your position on u(Y).

Tim Gorman thinks u(Y) = sqrt[ Σ[u(X_i)^2, 1, N] ]

Jim Gorman thinks u(Y) = sqrt[ Σ[u(X_i)^2, 1, N] ]

karlomonte thinks u(Y) = sqrt[ Σ[u(X_i)^2, 1, N] ]

Pat thinks u(Y) = sqrt[ Σ[u(x_i)^2, 1, N] / N ]

Bevington, Taylor, JCGM, NIST, UKAS, etc say it is u(Y) = sqrt[ Σ[u(x_i)^2, 1, N] ] / N

bigoilbob, bdgwx, Bellman, kb, and Nick accept the u(Y) = sqrt[ Σ[u(x_i)^2, 1, N] ] / N result from Bevington, Taylor, JCGM, NIST, UKAS, etc.

old cocky

Reply to bdgwx

December 21, 2022 7:50 pm

There was a distinction made between the uncertainty of the average and average uncertainty, so I took those to be terms of art in a foreign field.

Dividing the total uncertainty by N seems to be consistent with the treatment of dimensional scaling in the provided references.

That still leaves open the question of straight addition vs. quadrature, which seems to be context sensitive.

Jim Gorman

Reply to old cocky

December 22, 2022 9:00 am

Straight addition is an upper bound on the uncertainty. It is appropriate under certain circumstances, as you say, context sensitive.

Quadrature also has assumptions that may or may not be met. It assumes partial cancelation when using measurements in a functional relationship. You may get more or less cancelation than quadrature provides.

From my perspective, experimental standard uncertainty can provide a better resolution as to how the individual pieces actually combine to arrive at a real value. It can take into account thing you don’t think of and things that are hard to determine the uncertainty of.

-1

Bellman

Reply to Rud Istvan

December 18, 2022 10:33 am

I’m not sure how relevant the last two points are.

The N > 30 is only a rule of thumb. It isn’t an axiomatic requirement for the CLT, just an indication of the size needed to get something roughly normal.

The w/o requirement is correct, but I’m not sure of the relevance to discussing measurement uncertainties. Any sample of measurements is with replacement as far as the errors are concerned. The same would also apply to samples of global temperatures where there is an infinite population.

Even if you are taking large samples with respect to a finite population without replacement, I would assume that this just makes the average more certain. If your sample is 100% of the population the average will be 100% correct.

Clyde Spencer

December 17, 2022 7:47 pm

… by finding the mean of the means, the CLT will point to the approximate mean, and give an idea of the variance in the data.

One might do as well using the Empirical Rule, or Chebyshev’s inequality .

Clyde Spencer

December 17, 2022 7:56 pm

4. When doing science and evaluating data sets, the urge to seek a “single number” to represent the large, messy, complex and complicated data sets is irresistible to many – and can lead to serious misunderstandings and even comical errors.

I’m reminded of an example from many years ago after the Three Mile Island nuclear reactor event. The NRC claimed that the average radiation within a given radius did not exceed the allowable dosage for human exposure. However, there was a narrow downwind plume that did significantly exceed the threshold. The point being is that averages inevitably reduce the information content and can be used to mislead.

bigoilbob

Reply to Clyde Spencer

December 18, 2022 7:38 am

“However, there was a narrow downwind plume that did significantly exceed the threshold.”

If they had enough spatial data, and used it properly, that would have been avoided. In oil and gas we seek and cherish outliers. Particularly positive outliers. That’s where the $ are.

Understanding the value of gathering more and better data and properly evaluating it is also why the IPCC is trying to improve the identification and detailed description of global extreme weather events. They recognize that the well identified trends in several US extremes are only available because of our best in show reporting. So, they want more of that in the rest of the world.

-1

karlomonte

Reply to bigoilbob

December 18, 2022 1:12 pm

Behold, blob, shilling for the IPCC and the garbage-in-garbage-out climate models.

Not a pretty picture.

bigoilbob

Reply to karlomonte

December 19, 2022 8:56 am

“…shilling for the IPCC and the garbage-in-garbage-out climate models”

Read again please. Carefully this time. I did not refer to “models” at all. Rather, I referred to the IPCC initiative to improve the identification, quantification, and reporting of worldwide extreme weather events. If the alt.world claim that they are not trending up, unlike many in the CONUS, is true, then you should be cheering them on…

AGW is Not Science

Reply to bigoilbob

December 20, 2022 3:54 pm

Why? They’ll “find” only what they’re looking for, any information that goes against their narrative will be memory holed.

Steven Mosher

December 17, 2022 9:38 pm

One Number to Rule Them All as a principal, go-to-first approach in science has been disastrous for reliability and trustworthiness of scientific research.

1 theres no evidence its the go to first approach

no evidence its diastrous

most people cant remember more than 7 numbers.

we start with all the data, lets say billions of measurements.

then we want to know numbers of interest, relateable numbers– highest, lowest,
most likely, most repeated, etc.

the data never disappears its there. except YOU WONT LOOK AT IT

how do i know? i see the download stats

Pat Frank

Reply to Steven Mosher

December 18, 2022 7:09 am

“we start with all the data”

And studiedly ignore the calibration uncertainty. Everything else you do is then specious.

karlomonte

Reply to Steven Mosher

December 18, 2022 1:13 pm

Where is “there”?

Geoff Sherrington

December 17, 2022 9:42 pm

In the 1970s, we in a mineral exploration company became very involved with the growing field of geostatistics as in the Fontainebleu school, Matheron etc. We sent colleagues to France for months and hosted French mathematicians here in Australia.
I have not kept uo with the art for a couple of decades now. I often see words like “krige” and wonder if modern authors have really studied and understood its applicability, strengths and weaknesses.
Because geostatistics has many elements in common with what Kip and I and others have been discussing here at WUWT this year, I would love to see some current, expert geostaticians join in and present articles to WUWT. Patricularly, geostatistics is a practical application that might balance theoreticasl mathematical inputs.
Kip, Charles, Anthony, I hope you do not mind this suggestion. Merry Christmas to all.
Geoff S

Robert B

December 17, 2022 11:32 pm

You get a more precise estimate with many measurements. It deals with purely random errors. The majority of these will have a corresponding error of the same magnitude but opposite sign. It will not be perfect but with a greater number of measurements, the smaller the chance of the average differing by a certain value from the true value PLUS THE SYSTEMATIC ERROR.

For some reason, many not well characterised systematic errors is just like random errors.

I used to use shooting as an analogy (students will be triggered, now). A good shot will get a small spread around where a fixed rifle shooting perfectly would hit, and the average is much more likely to be closer to that value with many shots than any single shot (one could always land perfectly). Even the average of a poorer shot shooting many times will likely be closer to the value of a perfect shot , as long as the spread is perfectly random. It might be a much bigger spread, but it will still be centred around the perfect shot.

None will be particularly accurate if lined up wrong because of an askew sight. Lining up the cross hairs on the bullseye and shooting a million times will not help. This is a systematic error and why a precise measurement might still be inaccurate.

Other reasons might be a technique that pulls the shot to the left, or a crosswind to the right, or not accounting for the fall. Treating these as random errors and expecting a million shots to make the average be on the bullseye is not logical.

Except in climate science.

-1

bigoilbob

Reply to Robert B

December 18, 2022 8:09 am

“Lining up the cross hairs on the bullseye and shooting a million times will not help.”

If we had multiple targets lined up behind each other, and we wanted to calculate the trajectory (i.e. the trend) of that bullet, it would help.

bdgwx

Reply to bigoilbob

December 18, 2022 2:35 pm

That’s a cool analogy. I may steal it sometime. In the meantime I may be more active on our local weather forum as we try to figure out the details of the snowstorm.

bigoilbob

Reply to bdgwx

December 18, 2022 6:52 pm

I’ll open it up. But TWC says that in the city, in addition to the thursday storm, we’re up for a high of 6, low of -2, windspeeds of 27 m/h on Friday. Tell me they’re wrong. if not, the anti gel/cloud fuel additive will go in my Colorado diesel tank tomorrow AM.

Robert B

Reply to bigoilbob

December 20, 2022 1:11 pm

Not even for the analogy does that make sense.
You are assuming that all, let alone any single one, systematic errors have the exact same offset effect on all targets.
Climate scientists assume that the error from this can be reduced by square root of number of measurements over the whole series like you would with random errors.

-1

bigoilbob

Reply to Robert B

December 20, 2022 1:27 pm

“You are assuming that all, let alone any single one, systematic errors have the exact same offset effect on all targets.”

Guess again.

I was expanding on a specific comment.
Even if some of the targets were a little high/low/left/right, in groups – your hiccup – the accuracy estimate for that trajectory would improve with the number of arrows shot.

But it’s all moot anyhow. Since even the most ridiculously stretched inaccuracies and imprecisions of even the older GAT and sea level data in trend evaluations, results in trend standard errors qualitatively identical to those of expected value only evaluations, BFD….

Robert B

Reply to bigoilbob

December 20, 2022 6:25 pm

You can’t pretend that systematic errors are like random errors is the point. You need a symmetrical distribution of measurements around the true value for an average of many measurements to be useful. If you have systematic errors, only fortuitously will the measurements be symmetrical around the true value.

I’m not guessing. You’re flapping about.

bigoilbob

Reply to Robert B

December 20, 2022 7:21 pm

“You can’t pretend that systematic errors are like random errors is the point.”

I’m not. Please find where I did.

“You need a symmetrical distribution of measurements around the true value for an average of many measurements to be useful. “

They are useful for trending. That’s the point of my previous post.

“If you have systematic errors, only fortuitously will the measurements be symmetrical around the true value.”

If they’re small enough compared to the change exhibited from the trend, they mean nada, qualitatively. And in reading many thousands of WUWT posts, no one has provided data for physically and/or statistically significant time periods that show otherwise. You could be the first. Or you could just talk, fact free, like the rest.

-1

Robert B

Reply to bigoilbob

December 23, 2022 11:08 pm

“They are useful for trending. That’s the point of my previous post.” Is my evidence of you pretending.

Jim Gorman

Reply to Robert B

December 24, 2022 11:38 am

You can’t average measurements of different things with various errors and reduce overall error. Systematic error with the same measurand is not reduceble by statistics.

All you you are doing is averaging wrong things, hoping that with luck, that somehow you’ll get a correct answer.

The modelers do this. Their ensemble average is no better than any individual prediction.

You can’t even determine the probability of an average of wrong things being being correct.

-1

bdgwx

Reply to Robert B

December 20, 2022 7:31 pm

Robert B said: “You can’t pretend that systematic errors are like random errors is the point.”

If all stations had the same systematic error (astronomically unlikely) then it would cancel out when you covert the observations to anomalies. If they all had different systematic errors (very likely) then there would be a probability distribution representing the dispersion of those errors. In other words, when viewed in aggregate the multitude of individual station systematic errors act like a random variable with a distribution.

bigoilbob

Reply to bdgwx

December 21, 2022 5:27 am

Glad you skipped discussing purely random errors. Even Dr. Frank implicitly admits that the resulting standard errors for averages and trends diminish with more data.

https://wattsupwiththat.com/2022/12/16/limitations-of-the-central-limit-theorem/#comment-3653232

I try and imagine the set of systematic errors that would qualitatively change physically/statistically significant GAT and/or sea level trends. They would have to have a very carefully curated series of errors. Those errors would be Trumpian YUGE in the negative direction decades ago, then regularly go to ~zero in the middle of the time series, finally ending Trumpian YUGE in the positive direction in near present time.

The chance of that goes from beyond slim to next to none. Which is why no one has presented any evidence of such convenient systematic errors.

Hunkered down. Stocking stuffers purchased. Visited DeGregorios on the Hill for Christmas Eve party snacks. Anti gel in the diesel tank. Tickets bought to take the blessed California grandkids to see Elf at the Symphony, ice skating at Steinberg, bowling, City Museum, lunch at Crown Candy Kitchen, and so on, post storm. Now, all that’s left is to click on your link and see if the Perfect Storm still on for tomorrow…

Robert B

Reply to bdgwx

December 23, 2022 11:22 pm

I’ll use the the error estimates of global heat content as the example. It corresponds to a few ten thousandth of a degree average temperature. It doesn’t seem to matter that no measurement was made with a resolution better than 0.1, as if any error is a perfectly random error.

No systematic error will be exactly the same for every reading over time. There will be many small systematic errors that will be a distribution but unlikely to be a perfectly symmetrically one so that a million measurements with an resolution of 0.1 degree for individual measurements can be assumed to be a factor of thousand smaller for the averages.

It’s not that they are a distribution, it’s that it has to be perfectly symmetrical, about as likely as all equal.

Jim Gorman

Reply to Robert B

December 24, 2022 11:03 am

It is not a good analogy to begin with. More appropriate is a million shots from a million different rifles by a million different people. The average would be meaningless. Worse the “error” of the average would be meaningless also.

In any case, manipulating the resolution of the measurements is fiction. You simply cannot use statistics to add information beyond what was measured.

-1

Reply to Robert B

December 19, 2022 5:16 am

Yes bias and precision are two different things. Agreed.
In your example, there is bias (systematic error) due to the same shooting equipment being used for each shot.
Each shot is not fully independent. To achieve that, you would need a different gun and different shooter for each shot. What’s more the gun sight would need to be independently calibrated on each gun.

Jim Gorman

Reply to Robert B

December 19, 2022 6:52 am

The only problem with this analogy is that you are using the same starting point. With temperature, it is more like 100 guys taking one shot and trying to find the average. What does it tell you?

Steven Mosher

December 18, 2022 12:29 am

At the most basic level, the “average maximum daily temperature” is not a measurement of temperature or warmness at all, but rather, as the same commenter admitted, is “just a number”.

number is just a word, or rather pixels on the screen, or rather an image in your retina

it is always possible to reductively remove meaning from common sense expressions

big hint, we can never measure temperature. heck we cant measure time

Jim Gorman

Reply to Steven Mosher

December 18, 2022 11:18 am

Lots of things you can’t measure with a physical device like a ruler. Yet the passage of time can be broken into defined segments such that other units can be derived. The possible energy contained a unit of mass can’t be directly measured by any physical device. Yet, just like time, temperature can be DEFINED to be an interval between phase changes of water and then broken down into uniform segments.

You might want to read up on SI units and their definitions. There are 7 basic units and time is one.

Richard S J Tol

December 18, 2022 1:55 am

After Kelly-Anne Conway came up with her “alternative facts”, it was only a matter of time before someone started “alternative maths”.

December 18, 2022 10:46 am

(1) Like others on here I note that Kip does not even know what the Central Limit Theorem is about. It is saying that combining several different distribution shapes gives a combined distribution that is likely to be close to the Normal distribution. I mean, look it up on Wikipedia if you don’t believe me. How can we have any confidence in the rest of the article if he cannot even get the basics right? You need to know what you are criticising before you can criticise it.

(2) Regarding the loaded dice example. Nature does not set out to intentionally trick us. It might seem like that sometimes, but the kind of chicanery mentioned in the article requires a conscious being to intervene in the randomness of it all. Statistical methods set out to deal with small random errors, not tricks, not fraud, and not “gross errors”, i.e. transcription errors and experimental mistakes. The example is thus not applicable.

(3) Notice however what a nice illustration of the CLT this is. The two loaded dice distributions are almost U-shaped distributions. Nevertheless, when combined, you can see the resultant distribution is beginning to look similar to a Normal distribution. It’s not difficult to see that combining further distributions to this, of any shape, will make the result more and more Normal.

Pat Frank

Reply to KB

December 18, 2022 4:35 pm

“combining several different distribution shapes gives a combined distribution that is likely to be close to the Normal distribution”

When all the x_i are iid.

Kip wrote, “the CLT only requires a largish population (overall data set) and the taking of the means of many samples of that data set,”

Wikipedia: “For example, suppose that a sample is obtained containing many observations, each observation being randomly generated in a way that does not depend on the values of the other observations, and that the arithmetic mean of the observed values is computed. If this procedure is performed many times, the central limit theorem says that the probability distribution of the average will closely approximate a normal distribution.”

Kip is exactly correct.

-1

Reply to Pat Frank

December 18, 2022 6:15 pm

Not exactly correct.
Notice the Wikipedia article is using the CLT to justify that the probability distribution will tend to be Normal. It is not saying this is what the CLT is.

Pat Frank

Reply to KB

December 18, 2022 9:16 pm

Wiki is using the CLT under the assumption that the probability distribution will tend to be Normal. Kip described using the CLT in that fashion.

Reply to Pat Frank

December 19, 2022 5:09 am

I’m sorry but he has not.

In his first paragraph he has used the Law of Large Numbers, whilst describing it as the CLT.

He is obviously confused, even about the basic terminology.

Jim Gorman

Reply to KB

December 19, 2022 11:08 am

You neither described how the conclusions were mistaken or incorrect. Remember, neither Kip nor many of us believe that the CLT can justify reduced uncertainty nor increased resolution of a mean value that allows anomalies to be quoted to a thousandths of degree.

If you believe it can, then it is up to you to show that. So far all you have done is make ad hominem attacks about how no one but you understands statistics. Get off your pedestal, do the dirty work, and provide some references that that show how it is possible. If I was in a high school debate, stood up, and accused the other party of being wrong without presenting evidence I would lose in a heartbeat. So far, that is all you have done.

Reply to Jim Gorman

December 19, 2022 5:49 pm

I have said how they are incorrect several times.
You can find it all explained well enough on Wikipedia.
I don’t pretend to be an expert, but even I can tell you and Kip are hopelessly confused.

Jim Gorman

Reply to KB

December 20, 2022 4:44 am

“I don’t pretend to be an expert, but even I can tell you and Kip are hopelessly confused.”

Somehow the illogic of this escapes you!

-1

Pat Frank

Reply to KB

December 19, 2022 5:57 pm

No, Kip didn’t.

The LLN is not mentioned until the second paragraph, and there described correctly.

The first paragraph describes the CLT, and there correctly.

-1

old cocky

December 18, 2022 4:42 pm

Mosh and ThinkingScientist, I would rather get it right than be right. If there is an error in my characterisation of averages, please explain where I’ve gone wrong.

old cocky

Reply to old cocky

December 18, 2022 10:27 pm

Waiting, waiting

old cocky

Reply to old cocky

December 19, 2022 11:58 am

Hellooo. Mosh? ThinkingScientist?

Where ARE you?

Hello…

old cocky

Reply to old cocky

December 19, 2022 5:00 pm

Peekaboo! Where ARE you?

It was a simple enough question: “Why is an average an expectation, and what is it an expectation of?”

Could somebody who is still here enlighten me?

old cocky

Reply to old cocky

December 19, 2022 10:32 pm

Gee, these crickets are loud. Perhaps they’re cicadas, drowning out the erudite explanations.

old cocky

Reply to old cocky

December 20, 2022 9:39 pm

These bloody cicadas are giving me a headache.

bdgwx

Reply to old cocky

December 19, 2022 8:02 am

I don’t know what the topic is here, but I can appreciate the sentiment. I hate being wrong. But I hate continuing to be wrong more.

old cocky

Reply to bdgwx

December 19, 2022 11:31 am

Some of the most important factors in software engineering are the code review. and unit testing It’s better to catch errors before they are let loose outside your own little play pen.

And you don’t want your friends spotting your silly mistakes 🙁