**Part One**

Geoff Sherrington

Modern climate research commonly fails adequate recognition of three guiding principles about uncertainty.

- Uncertainty estimation is essential to understanding.

“*It is generally agreed that the usefulness of measurement results, and thus much of the information that we provide as an institution, is to a large extent determined by the quality of the statements of uncertainty that accompany them.”*

2. Uncertainty estimation has two dominant parts.

*“The uncertainty in the result of a measurement generally consists of several components which may be grouped into two categories according to the way in which their numerical value is estimated:*

* A. those which are evaluated by statistical methods,*

* B. those which are evaluated by other means.”*

3. Uncertainty estimation needs to value diverse views.

“In 2009, the Obama Administration identified six principles of scientific integrity.”(Including these two) –

“Dissent. Science benefits from dissent within the scientific community to sharpen ideas and thinking. Scientists’ ability to freely voice the legitimate disagreement that improves science should not be constrained.

“Transparency in sharing science. Transparency underpins the robust generation of knowledge and promotes accountability to the American public. Federal scientists should be able to speak freely, if they wish, about their unclassified research, including to members of the press.”

…………………………

This article examines how well the Australian Bureau of Meteorology, BOM, satisfies these requirements in respect of the uncertainty estimated for routine daily temperatures.

Part One deals more with the social aspects like transparency. Part Two addresses mathematics and statistics.

This article uses Australian practice and examples dominantly involving BOM. Importantly, the conclusions apply World-wide, for there is much to repair.

The prominent, practical guide to uncertainty is by the France-based Bureau International des Poids at Mesures, BIPM, with their Guide to the Expression of Uncertainty in Measurement (GUM).

Several years ago, in email correspondence with BOM, I started to ask this question:

“

If a person seeks to know the separation of two daily temperatures in degrees C that allows a confident claim that the two temperatures are different statistically, by how much would the two values be separated?”

BOM has been trying to answer this question with several attempts. They have permitted me to quote from their correspondence on the condition that I reference the full quote, which I do here.

http://www.geoffstuff.com/bompastemails.docx

On March 31^{st} 2022, BOM sent the most recent attempt to answer the question. Here is a table with some of their text.

(Start quote) “The uncertainties, with a 95% confidence interval for each measurement technology and data usage, are listed below. Sources that have been considered in contributing to this uncertainty include, but are not limited to, field and inspection instruments, calibration traceability, measurement electronics or observer error, comparison methods, screen size and aging.

Measurement Technology | Ordinary Dry Bulb Thermometer | PRT Probe and Electronics |

Isolated single measurement – No nearby station or supporting evidence | ±0.45 °C | ±0.51 °C |

Typical measurement – Station with 5+ years of operation with 10+ years of operation with at least 5 verification checks. | ±0.23 °C ±0.18 °C | ±0.23 °C ±0.16 °C |

Long-term measurement – Station with 30+ years of aggregated records with 100+ years of aggregated record | ±0.14 °C ±0.13 °C | ±0.11 °C ±0.09 °C |

I would stress that in answer to your specific question of “*If a person seeks to know the separation of two daily temperatures in degrees C that allows a confident claim that the two temperatures are different statistically by how much would the two values be separated”*, the ‘Typical measurement’ Uncertainty for the appropriate measurement technology would be the most suitable value. This value is not appropriate for wider application to assess long-term climate trends, given typical measurements are more prone to measurement, random, and calibration error than verified long-term datasets.” (End quote)

These confidence intervals are essentially for one Part of the two Parts that comprise a complete estimation of confidence. They are mostly the Part A type that is derived from statistical methods. They are incomplete and unfit for routine use without more attention to Part B, those that are evaluated by other means.

There is a significant difference in the interpretation of temperature date, especially in time series, if the uncertainty is ±0.51 °C or ±0.09 °C, to use extreme estimates from the BOM table. It is vital to understand how the uncertainty of a single observation becomes much smaller when there are multiple observations combined in some way. Is that combination a valid scientific act?

In the case of routine temperature measurements (the test subject of this article), Type B might include, but not be limited to, all of those effects adjusted by homogenization of time series of temperatures. In this article, we use the BOM adjustment procedures for creating the Australian Climate Observations Reference Network – Surface Air Temperature (ACORN-SAT).

ACORN-SAT commences with “raw” temperature data as input. This is then examined visually and/or statistically for breaks in an expected (smooth) pattern. Sometimes a pattern at one station is compared with performance of other stations up to 1,200 km distant. Temperatures are adjusted, singly or in in blocks or patterns, to give a smoother-looking output, more in agreement with other stations, more pleasing to the eye perhaps, but often inadequately supported by metadata documenting actual changes made in the past. Sometimes, there is personal selection of when to adjust and by how much, that is, guesswork.

The BIPM guidelines have no advice on how to create uncertainty bounds for “guesses” – for good scientific reasons.

http://www.bom.gov.au/climate/change/acorn-sat/documents/About_ACORN-SAT.pdf

http://www.bom.gov.au/climate/data/acorn-sat/#:~:text=ACORN%2DSAT

Some other relevant factors affecting BOM raw data include:

- Data began with the Fahrenheit scale, then moved to Celsius scale.
- There were periods of years when a thermometer observation was reported in whole degrees, with no places after the decimal. (“Rounding effects.”)
- Almost every ACORN-SAT station of the 112 or so was moved to a different location some time in its life.
- Some stations have had new buildings and ground surfaces like asphalt put close to them, potentially affecting their measurements. (“Urban Heat Island effects, UHI).
- Thermometers changed from liquid-in-glass to platinum resistance.
- Screen volumes changed over the decades, generally becoming smaller.
- Screens have been shown to be affected by cleaning and type of exterior finish.
- The recording of station metadata, noting effects with potential to affect measurements, was initially sparse and is still inadequate.
- Some manual observations were not taken on Sundays, The Sabbath, at some stations.
- And so on.

……………………………….

In mid-2017, BOM and New Zealand officials met and emailed to produce a report that touched on the variables just listed but concentrated on the performance of the Automatic Weather Station, AWS, lately dominant, with mostly PRT sensors.

Review_of_Bureau_of_Meteorology_Automatic_Weather_Stations.pdf (bom.gov.au)

Some email correspondence within BOM and New Zealand about this review became public though a Freedom of Information request. Relevant FOI material is here.

http://www.bom.gov.au/foi/release/FOI30-6150_DocumentSet_ForRelease_Redacted_r_Part2.pdf

Here are some extracts from those emails. (Some names have been redacted. My bolds).

“While none of the temperature measurements resident in the climate database have an explicit uncertainty of measurement, the traceability chain back to the national temperature standards, and the processes used both in the Regional Instrument centre (the current name of the metrology laboratory in the Bureau) and the field inspection process suggest thatthe likely 95% uncertainty of a single temperature measurement is of the order of 0.5⁰C.This is estimated from a combination of the field tolerance and test process uncertainties over a temperature range from -10 to +55⁰C.”

“(We) should deal with the discrepancy between the BOM’s current 0.4⁰C uncertainty and the 0.1⁰C WMO aspirational goal.”

By reference to the table above, the PRT column offers a similar uncertainty of +/-0.51⁰C for “Isolated single measurement– No nearby station or supporting evidence”; also +/-0.37⁰C for “Typical measurement– Station with 5+ or 10+ years of operation”; also ±0.11⁰C for “Long-termmeasurement– Station with 30+ years of aggregated records.”

One does not know why there is a further offering for AWS of ±0.09 °C for “records with 100+ years of aggregated record.” Hugh Callendar developed the first commercially successful platinum RTD in 1885, but its use in automatic weather stations seems to have started about the time of the 1957-8 International Geophysical Year. There might be no examples of 100+ years.

Recall that for some 5 years before mid-2022, I had been asking BOM for estimates of uncertainty, which question was not answered by mid-2022. This has to be considered against the knowledge revealed in this 2017 email exchange, with *“**the likely 95% uncertainty of a single temperature measurement is of the order of 0.5⁰C.” *It is reasonable to consider that this estimate was concealed from me. One of the BOM staff who has been answering my recent emails was present and named among the email writers of 2017 exchange.

Why did BOM fail to mention this estimate? They were encouraged to provide an answer by my main question, but they did not.

This brings us to the start of this article and its three governing principles, one of which is “*“Transparency in sharing science. Transparency underpins the robust generation of knowledge and promotes accountability to the American public. Federal scientists should be able to speak freely, if they wish, about their unclassified research, including to members of the press.”*

As for America, also for Australia.

It happens that I have kept some past writings by officers of the BOM over the years. Here are some.

Recall that in Climategate, the BOM’s Dr David Jones emailed to colleagues on 7^{th} September 2007.

*Fortunately in Australia our sceptics are rather scientifically incompetent. It is also easier for us in that we have a policy ofproviding any complainer with every single station observation when they question our data (this usually snows them) and the Australian data is in pretty good order anyway. *

David Jones had not reached an apologetic mood by June 16, 2009, when he emailed me his response to a technical question:

Geoff, your name appears very widely in letters to editors, on blogs and your repeatedly email people in BoM asking the same questions. I am well aquatinted with letters such as this one –http://www.jennifermarohasy.com/blog/archives/001281.html. You also have a long track record of putting private correspondence on public blogs. I won’t be baited.

Further, there is an email involving BOM Media and Big Boss Andrew Johnson and others in the AWS review, 24 August 2017 9:58 AM:

*“I expect we will reply to this one with: The Bureau does not comment on any third-party research.”*

Continuing the theme is this 2017 BOM email in response to mine asserting with data that Australian heatwaves are not becoming longer, hotter or more frequent.

The Bureau is unable to comment on unpublished scientific hypotheses or studies, and we encourage you to publish your work in a suitable journal. Through the peer reviewed literature, you can take up any criticism you have of existing methodologies and have these published in a format and forum that is accessible to other scientists. Regards, Climate Monitoring and Prediction.

This fortress BOM mood might have started from the top. One redacted name at that 2017 email session revealed that –

“I am essentially ‘external’ as an emeritus researcher, but was head of the infrastructure/procurement/engineering/science measurement area when I retired from the Bureau in March 2016 last year.”

This person might or might not have been former BOM Director Dr Rob Vertessy. Newspapers in 2017 reported resignation comments from him. They send a message.

“Vertessy’s agency was under consistent attack from climate science denialists who would claim, often through the news and opinion pages of the Australian, that the weather bureau was deliberately manipulating its climate records to make recent warming seem worse than it really was.

“From my perspective, people like this, running interference on the national weather agency, are unproductive and it’s actually dangerous,” Vertessy told me. “Every minute a BoM executive spends on this nonsense is a minute lost to managing risk and protecting the community. It is a real problem.”

Note the common media spin methods in this press article. The BOM have seen a problem, framed it in their own way and express emotion without denial of the accusations.

At this stage of this article, I submit some words by others to indicate the scale of the problems that are emerging.

“An irreproducibility crisis afflicts a wide range of scientific and social-scientific disciplines, from public health to social psychology. Far too frequently, scientists cannot replicate claims made in published research.1 Many improper scientific practices contribute to this crisis, including poor applied statistical methodology, bias in data reporting, fitting the hypotheses to the data, and endemic groupthink. Far too many scientists use improper scientific practices, including outright fraud.

National Association of Scholars (USA).

Shifting Sands. Unsound Science and Unsafe Regulation

Report #1: Keeping Count of Government Science: P-Value Plotting, P-Hacking, and PM_{2.5 }Regulation

https://www.nas.org/reports/shifting-sands-report-i

From that report, there is a view about the central Limit Theorem on page 36.

*“The Bell Curve and the P-Value: The Mathematical Background *

“All “classical” statistical methods rely on the Central Limit Theorem, proved by Pierre-Simon Laplace in 1810.

“The theorem states that if a series of random trials are conducted, and if the results of the trials are independent and identically distributed, the resulting normalized distribution of actual results, when compared to the average, will approach an idealized bell-shaped curve as the number of trials increases without limit.

“By the early twentieth century, as the industrial landscape came to be dominated by methods of mass production, the theorem found application in methods of industrial quality control. Specifically, the p-test naturally arose in connection with the question “how likely is it that a manufactured part will depart so much from specifications that it won’t fit well enough to be used in the final assemblage of parts?” The p-test, and similar statistics, became standard components of industrial quality control.

“It is noteworthy that during the first century or so after the Central Limit Theorem had been proved by Laplace, its application was restricted to actual physical measurements of inanimate objects. While philosophical grounds for questioning the assumption of independent and identically distributed errors existed (i.e., we can never know for certain that two random variables are identically distributed), the assumption seemed plausible enough when discussing measurements of length, or temperatures, or barometric pressures.

“Later in the twentieth century, to make their fields of inquiry appear more “scientific”, the Central Limit Theorem began to be applied to human data, even though nobody can possibly believe that any two human beings—the things now being measured—are truly independent and identical. The entire statistical basis of “observational social science” rests on shaky supports, because it assumes the truth of a theorem that cannot be proved applicable to the observations that social scientists make.”

Dr David Jones emailed me on June 9, 2009 with this sentence:

“Your analogy between a 0.1C difference and a 0.1C/decade trend makes no sense either – the law of large numbers or central limit theorem tells you that random errors have a tiny effect on aggregated values.”

………………………………………….

Part Two of this article takes up the mathematics and statistics relevant to CLT and LOLN.

END.

Geoff Sherrington

Scientist.

Melbourne, Australia.

20^{th} August 2022.

………………………………………..

“They have permitted me to quote from their correspondence on the condition that I reference the full quote…”Sounds like they’ve already called Saul.

“

“An irreproducibility crisis afflicts a wide range of scientific and social-scientific disciplines, from public health to social psychology. “Usually involving models and questionnaires.

Not really.

The drug companies scan the literature for research findings that might profitably be used to produce new drugs. The first thing they do is to try to reproduce those findings. The drug companies have found that as much as 90% of experiments can’t be reproduced. A dismaying number of scientists can’t even reproduce their own experimental results. link

‘Publish or perish’ has been a thing ever since I was a pup. The trick is to produce findings that are novel enough to be interesting to editors, and not upset apple carts that would prevent them from getting through peer review. There’s no punishment for being wrong. The implications should be obvious to a novice in a nunnery.

It’s got so bad that a former editor of the British Medical Journal has said we should assume that health research is fraudulent until proven otherwise. link That’s fraudulent, as in the purported experiments didn’t even happen, or something like that. So, around 20% of published research findings are fraudulent, and then the majority of the rest are merely wrong.

Science is seriously broken.

RETRACTED ARTICLE: Mouse models…https://link.springer.com/article/10.1007/s00204-016-1747-2

Science is seriously corrupted, and the silence of the scientists is deafening.

Wow. So they published the same paper verbatim 6 years apart?

Even more concerning…

“RETRACTED ARTICLE: Mouse models of intestinal inflammation and cancerAya M. Westbrook,Akos Szakmary&Robert H. SchiestlArchives of Toxicologyvolume90,pages2109–2130 (2016)Cite this article“Would you cite it?

You just did… as a bad example.

–

–

(Mom would say some people were put on this Earth to serve as a bad example and cautionary tale for others. She never said that to my brothers, though. Hmmmm…)

There is a serious lack of moral fortitude in several areas of modern society.

JCU has runs on the board, I read.

Go get them, Geoff, you’re doing all scientists a favor by confronting government dogma which has an agenda. I can’t help but wonder when we will know if the planet Earth is undergoing something abnormal or only the normal complex and chaotic variation. Anyway, humans are extremely adaptable, and that would be the basis for their survival of any doomsday predictions.

Something like 209 changes for maximum temperature and 244 for minimum temperature to ACORN-SAT for “statistical reasons”. Statistics can’t tell us what the temperature was yesterday let alone one day 100 years ago.

http://www.bom.gov.au/climate/change/acorn-sat/documents/BRR-032.pdf

I have great difficulty believing that errors in “adjusted” temperature anomalies are random.

I believe Mr. Heller has demonstrated on numerous occasions that the adjustments consistently support the warming narrative by cooling the past and warming the present.

They are not random and lead to cooling periods that were warm and warming periods that were cool, as long as they are not in recent years. Recent years are always warmed more and in this case it was to hide the decline during the pause that caused them to panic.

This link just below was created on 17th November 2009 with all datasets cooling.

That situation looks very similar to the current situation now since 2016. Just one difference this time and that is the Artcic has been cooling for the first time since the 1970’s.

Recent years warmed more being around 14/15 years in this HADCRUT case below and show a huge amount of warming data.

“Long-term measurement– Station with 30+ years of aggregated records with 100+ years of aggregated record ±0.14 °C ±0.13 °C ±0.11 °C ±0.09 °C”They have broken many rules regarding long term measurement. How are changes of 0.3c or 0.4c even possible in the entire Northern hemisphere without serious tampering of data for the cause.

Regarding that main areas of change in this data.

1) Around 1938 to 1942 cooled when it was a time of peak global temperatures for many decades.

2) Around 1948 to 1958 warmed over a period that was partically cool.

3) Around 1990 to 1996 warmed over a period upto nearly 0.3c (one month) that was partically cool.

4) Around 1998 cooled slighly for the peak El Nino that was a record warm global temperature.

5) Since 2000 very mostly warmed near 0.1c upto very nearly 0.4c. (one month)

I’m a little confused. If I have a machine that can mill to 10 micron accuracy, does it mean if I produce 100 parts then they will be accurate to 2 microns?

That’s what the BOM believes. Insane, right?

It’s okay Nick will blend the average for you using neighbor parts … that is even better accuracy 🙂

There must be a tolerance on that 10 microns.

No. That 10 mill accuracy is for each part. What you can say, however, is that the average of the 100 parts is accurate to 10/sqrt(100) = 1 mill if the accuracy from part to part is uncorrelated. Refer to JCGM 100:2008, JCGM 101:2008, and JCGM 102:2011 here and the NIST uncertainty machine here. Note, I’m using the term “accuracy” loosely here because you did and that it actually has a specific meaning (ISO 5725) that may be incompatible with what you were trying to communicate.

Oh please, not this BS again.

It’s the only BS he’s got.

No kidding.

The man has never once run a milling machine or a lathe. He’s never used a gauge block or a micrometer.

But that won’t help at all because it the _accuracy_of_each_part_ is the only thing that counts (in climate terms: The actual temperature at the spot you’re trying to live and work in). “Averages” – accurate or not, precise or not – never say anything about actual conditions in the real world. The average of all letters in a book isn’t the story but just one letter, and the average of an image just one featureless colour. All the _information_ in the original data is lost by averaging, because that’s what averaging does. If you need details and clarity, don’t average. Ever.

It’s referring to the precision. The accuracy will be the difference between the desired size and the average of many samples. You can decrease that difference by adjusting settings – reducing systematic errors.

You need to improve your procedure to increase the precision or the spread in differences due to random errors. Each part will still be as likely to be 10 microns off the average if you just improve accuracy.

“Your analogy between a 0.1C difference and a 0.1C/decade trend makes no sense either – the law of large numbers or central limit theorem tells you that random errors have a tiny effect on aggregated values.”

The law of large numbers assumes perfect randomness and no systematic error – or perfectly constant ones over decades. These clowns then assume that it works with poor resolution, not just precision, measurements over decades of thermometers influenced by their changing environment (as if a measurement at the same latitude and longitude is like measuring a beaker of water), and after corrections.

He’s been told this many many many times but refuses to accept the truth.

He’s a one-trick pony that found his one trick gets him applause with a certain crowd that don’t know any better and so uses it for everything, whether it’s applicable or not.

Thank you for the cogent post on the subject. Far too many statisticians today think you can eliminate uncertainty via averaging single measurements of multiple things.

If you mill out 100 things then somehow magically the uncertainty associated with each will be reduced. A little handwaving and a milling machine with an uncertainty of 10 mills will transform into one with an uncertainty of 1 mill.

Jeesh! When I was working as a machinist I could only dream of such a machine that got more accurate after each use!

The reality is that gears and ways change dimensions with ambient temperature, and the accuracy and precision may change with the time of day and weather. The machine wears over time, and tolerances decrease, meaning that over time the accuracy and precision decrease. Different machinists will grind their tool bits differently, and set the shank at different lengths in the tool holder, and take cuts of different depths, all contributing to different results. The longer samples are taken, the more likely an extreme outlier will be observed, and that drift in product dimensions will result. The real world tends to operate differently from the theoretical world.

“ The real world tends to operate differently from the theoretical world.”

You always have such an expressive way of describing BS!

I’m not sure I’m following how this relates to Alexy Scherbakoff’s post. Was this post meant as a response to someone else? If it was meant for me would you mind clarifying what you are saying and how it relates?

Red(tard) alert!

Not true. The standard uncertainty is a combination of the standard uncertainty of the 100 measurements and the standard uncertainty of the machine. You then calculate the Expanded uncertainty, usually to 95%. See sections 5 and 6 of JCGM 100:2008.

You are assuming he can read for meaning instead of cherry picking.

What part do you feel is not true?

A meaningless number that no one uses except climastrologers.

Hi bdgwx. This bit:

“What you can say, however, is that the average of the 100 parts is accurate to 10/sqrt(100) = 1 mill if the accuracy from part to part is uncorrelated.”

You haven’t accounted for the instrument uncertainty. Each of the 100 parts was measured using an instrument with a certain accuracy. You must include this in your uncertainty budget. If the same instrument was used for all 100 and no other uncertainties (operator error, environment change..) then the instrument uncertainty (call it uIn) will be equal for all measurements, will propagate through, be averaged and the final uncertainty will be uIn. You cannot reduce it unless you use an instrument with a better spec.

Alexy Scherbakoff did not say anything specific about instrument or measurement uncertainty. Only that they were milled with an “accuracy” of ±10 micron. If you want to add other components of uncertainty to the scenario then great. You just need to state them. We go through the procedure in the GUM together to evaluate the combined uncertainty or just let the NIST uncertainty machine do it for us.

You said: ““What you can say, however, is that the average of the 100 parts is accurate to 10/sqrt(100) = 1 mill if the accuracy from part to part is uncorrelated.”

If each part is machined to +/- 10mil, then the average uncertainty cannot be 1mill.

That’s the type of figuring that Jim Gorman pointed out would get a quality engineer fired.

Boss: How well is the production line doing? Engineer: Well, we are running an average tolerance of 1mill for all our parts. Boss: Then why in tarnation are we getting so many returned parts? You’re FIRED!

That’s the piece that never seems to get through! You make those pieces individually. You sell those pieces individually. They get used individually. The average is meaningless.

It’s the same with temperatures from different locations. They get set individually based on their micro-climate. Temperature at one location doesn’t act at a distance to create a temperature somewhere else. The average of those temperatures is meaningless. Climate is local, not global.

If I have to drive a pin into a hole, do really think I care about the “average” true value of those parts? If that part is used in a nuclear device do you think an “average” is the best thing to use?

I’ve told this before. A plant makes 7 foot doors. A boss comes to the quality guy and says, “Why do we have so many returned doors?” The quality person says, “I don’t know, here is what a 50& sample of our doors shows.” He then shows a chart that indicates the average door is 7 feet with an error of 1/8 of an inch. The boss asks about the standard deviation. “Oh, we don’t use that other than to calculate the standard error.” When they go look the doors vary in height from 6’6″ to 7’6″. The boss fires the quality guy.

When are you going to learn that the SEM, that you just calculated, only tells you how closely an estimated mean is to the true mean. It has nothing to do with the standard deviation of the parts.

And if the 100 parts is a sample, the standard deviation of the sample means is the SEM. You don’t divide that by sqrt N to get the SEM. IT IS THE SEM.

He’ll never get it. He has no real world experience upon which to rely in figuring out the difference.

Nor does he *want* to “get it.” This is about defending the malfeasance and lack of suitability for the purpose of measuring changes to ‘climate’ of the “instrument temperature record” and its deliberate manipulations, not about admitting anything could possibly be “wrong” with it.

Amazing how repeated sampling of different times somehow reduces uncertainty . This drives me nuts. Pat Frank should work with you on this.

How in the world can they legitimately claim this? It should go up, not down.

Not one of them would pass the test to become a journeyman carpenter. Pick 100 2’x4′ studs out of the pile to build walls with and voila! The uncertainty in each individual stud goes from 1/8″ to 1/80″. You will have *NO* ripple in the ceiling drywall at all!

This is off topic, but you never can be certain where Bill Nye is concerned.

The End is NyeBILL NYEHAS BEEN on a mission to save the world for some time.However, it’s been five years since his last series — Netflix’s Bill Nye Saves The World — aired, and some things have changed, arguably not for the better. The world is in a darker place than it was in 2017, it seems, and with Peacock’s upcoming docuseries, The End is Nye, everyone’s favorite “Science Guy” has pivoted into new territory to keep up with the times.“We’re living in anxious times,” Bill Nye tells Inverse at San Diego Comi-Con. “People are anxious about all sorts of things. Apparently, human nature is such that when things are happy and good, we watch romantic comedies and fun movies. When things are scary, we watch disaster movies. So we made six disaster movies.”https://www.inverse.com/entertainment/the-end-is-nye-bill-apocalypse

” Apparently, human nature is such that when things are happy and good, we watch romantic comedies and fun movies. When things are scary, we watch disaster movies. So we made six disaster movies.””In these woke – ‘lived experience’ trumps all facts – times Nye is totally wrong. When I’m happy I don’t do soppy rom-coms etc – I get down with a guitar. Afterwards I might take in a film.. like Alien.

If I’m not happy I still take it out on a guitar.

The guy’s an absolute lying patsy.

He’s irritating in the – how do they put it? – in the extreme.

Or in the extremities.

Appearing at a comic-con is his level, he’s never been a serious science presenter, nor should he ever be seen as such. An entertainer, possibly, if you find his shows entertaining but nothing more.

He was being interviewed at Comi-Con, a place to which I look for science and intellectual guidance.

Thanks for keeping the heat on as part of the “fever swamp of

climate denial”. The more uncomfortable bureaucrats feel, the

more comfortable I feel!

One of the first things that made me skeptical of global warming

claims was that no one ever discussed the quality of the data

underlying their claims, either accuracy or statistical signifigance,

two important aspects of data that are drilled into your head from

high school physics onward. They also never mentioned how well the

data covered the globe as I had been to areas that obviously had

little, if any, data. The whole thing sounded too much like politics

& not enough like science. I needed to know data quality & coverage

before I’d accept GW as being valid.

The fix was decidedly in for diddling of temperatures records by the time “The Team” sent out this gem of their science policy –

“Why should I give you my data, when you’ll only try to find mistakes with it”

Says it all, really.

A

realscientist would be open to finding — and fixing — mistakes.BINGO! But then when you talk about “global warming” as “The Cause,” you can’t seriously be considered a “scientist.”

These idiots lost their objectivity long ago, and with it any claim that they represent the mantle of “science.”

The NIST and BIPM websites are a must read for those interesting in error handling with measurements.

But it all falls down when you come across the phase with the killer word in: Error budget

ESTIMATES.All good quality measurement sites state that you are meant to exercise

professional judgement in estimating error budgets.

If you so biased that you are bent double then there is no hope that your calculated values have any real world use

The NIST uncertainty machine is particularly useful as well especially for those wanting to use the BIPM techniques defined in JCGM 100:2008, JCGM 101:2008, and JCGM 102:2011.

Except all those documents assume you are working with ONE OBJECT, one single measurand.

They have *NO* procedures for working with multiple objects to get one uncertainty estimate. There *is* a reason for that! Single measurements of different things do not generate a true value. Multiple measurements of a single thing can, it’s not guaranteed and you must prove it each time, but it is possible. If you don’t have a true value with an identically distributed distribution then you can’t assume the individual uncertainties cancel. And the standard deviation of the stated values is only a good measure of uncertainty *if* the individual uncertainties of the stated values can cancel

Very good essay, Geoff.

This is a great principle followed by Federal scientists and Federally funded scientists only occasionally. Fauci, Collins, and Birx serve as excellent examples of scientists saying one thing to the public and another to selected colleagues and others. The history of James Hansen’s 1988 congressional testimony and contemporaneous publications is another.

Here is another statement made generally on faith that I quarrel with all the time.

Plausible or not, how does one justify mixing earth temperatures, even in one location, to achieve averages over long periods of time by appeal to the central limit theorem? There are distributions that don’t follow the central limit theorem even in theory let alone in actual measurement.

And, the distributions and standard deviations that came out of the averages are never reported.

I find the whole concept of homogenization of temperatures based on “near by stations” to be ludicrous. Does anyone really think that the temperatures at two stations separated by 5, 10, 40 or 100 miles must really be the same or have the same deviation from their average (anomaly)? Certainly the process does nothing to improve the MU and only serves to obscure the variability inherent in weather.

When you deal with statistics, divorced from what those numbers actually represent, any idiocy is acceptable. Might as well get an accountant to do ‘the science’.

Take Pikes Peak and Denver. Or the north side of the Kansas River valley vs the south side of the valley. Or one station on the east side of a mountain and one on the west side of the mountain! Take one station situated over Bermuda grass and one over sand.

How can you average these combinations and get anything meaningful. And this doesn’t even address the bi-modality created from mixing northern hemisphere temps with southern hemisphere temps. Or the different variances of summer and winter temps at the very same location, how is that adjusted for in developing monthly and annual base averages!

All you get when asking these questions is “all uncertainty cancels out in the averages”.

More like “all CERTAINTY cancels out in the averages.”

But is is some much easier than using all those different measurements, if there even are different measurements. And, without enough measurements, or assumed measurements, they can’t make sweeping statements about life, the universe, and everything — except they often do it anyway.

Indeed! I see this all the time looking at various “weather stations” separated by distances far smaller than that, or comparing weather data reported with what the thermometer in my car says.

Seems to me they do speak freely when they wish, saying whatever the hell they wish to make the general public believe.That makes me laugh – let’s see a “federal scientist” be openly skeptical about human induced ‘climate change’ and watch the shitstorm that follows!

That statement was just to provide cover for pseudo-scientists to spew alarmist nonsense as if it were “science.”

Thank you, KK.

Do join in when Part 2, the main part, appears.

Geoff S

How using thermometers accurate to .5 degrees somehow become more accurate measuring a different temperature series over time escapes me. One is making multiple readings of different temperatures, not multiple readings of the same temperaure.

You are exactly right. When measuring a time series such as air temperature, there is exactly one chance to get a number before that temperature is gone forever. The sample size can never be greater than one.

Climastrology has a number of false beliefs about what they do:

#1: subtracting a baseline removes uncertainty

#2: averaging reduces uncertainty

+100

Geoff,

Do you accept that the NIST uncertainty machine, which uses the technique specified in the GUM document (JCGM 100:2008) you linked to, produces the correct result when given the function y = f(x1, x2, …, xn) = Σ[xi, 1, n] / n or in the R syntax required by the site (x0 + x1 + … + xn-1) / n?

Do you accept that the uncertainty evaluation techniques 1) GUM linear approximation described in JCGM 100:2008 and 2) GUM monte carlo described in JCGM 101:2008 and JCGM 102:2011 produce correct results [1]?

bdgwx,

Part 2 addresses math and statistics. Wait a few days please.

You are missing the point, big way, if you imagine that what I accept matters.

Personal beliefs are not part of science, particularly in metrology.

Geoff S

No problem. I can wait.

This is only true when X1, X2, …, Xn are measurements of the SAME MEASURANDS! It doesn’t apply when X1, X2, …, Xn are DIFFERENT MEASURANDS.

from your jcgm 100:2008

uncertainty

“parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the

measurand(bolding mine, tg)“3.1.1 The objective of a measurement (B.2.5) is to determine the value (B.2.2) of the

measurand” (bolding mine, tg)“3.1.4 In many cases, the result of a measurement is determined on the basis of series of observations obtained under repeatability conditions”

EQ 1:

In most cases, a measurand Y is not measured directly, but is determined from N other

quantitiesX1, X2, …, XN through a functional relationship f :

Y = f (X1, X2, …, XN ) (bolding mine, tg)

tg: X1,X2, …, Xn are different quantities. The equation says nothing about averaging different quantities to find Y, only using a functional relationship between the quantities and the measurand, e.g. finding current in amperes using voltage and resistance.

“NOTE Nonetheless, Equation (1) may be as elementary as Y = X1 − X2. This expression models, for example, the comparison of two determinations of the

same quantity X.” (bolding mine, tg)here is the real kicker:

———————————————————————–

NOTE In some cases, the estimate y may be obtained from

y = Ỹ = (1/n) ΣY_, (from k=1 to n) = (1/n) Σ

f(X_1,k, X_2,k, …, X_N,k)I bolded “f” above because it this represents a functional relationship for Y using the various input quantities X1, X2, …, XN and *NOT* an average of X1, X2, …, XN.

Let’s see what you know about using this machine. From the manual.

What did you use for a probability distribution? Did you verify that the actual data had that distribution?

You do realize that an average is not functional model of a measurement, right? An average is a statistical descriptor, not a measurement.

Quit cherry-picking things that validate what you think. That is known as confirmation bias. Read and understand the requirements associated with using a mathematical tool.

1000x yes!

Geoff ==> As for the USA, our esteemed Gavin Schmidt, now director of NOAA’s GISS, puts even the long-term Global Annual Averages as having an uncertainty of +/- 0.5K at his blog:

https://www.realclimate.org/index.php/archives/2017/08/observations-reanalyses-and-the-elusive-absolute-global-mean-temperature/

“So our estimate for the absolute value is (using the first rule shown above) is 287.96±0.502K, and then using the second, that reduces to 288.0±0.5K. The same approach for 2015 gives 287.8±0.5K, and for 2014 it is 287.7±0.5K.

All of which appear to be the same within the uncertainty. Thus we lose the ability to judge which year was the warmest if we only look at the absolute numbers.”For the mathematically normal: +/- 0.5K means there is a whole degree (C or K) uncertainty range around GMST from GISS — that comprises MST of the claimed temperature increase since the mid 1900s.

That is one reason why anomaly values are preferred over absolute values. While the absolute value has an uncertainty of around ±0.5 C the anomaly values are around ±0.05 C for recent decades [Lenssen et al. 2019] (note that Schmidt co-authored that publication).

bdgwx,

What is the precise mechanism that governs this change?

Some people claim that your golf game uncertainty, as in “practise makes perfect”, can be reduced by repetition.

The temperatures of which I write are not sentient beings with brains capable of learning.

One temperature is not aware of another temperature, so not aware that it has a best value or fits a particular statistical distribution.

Geoff S

Geoff Sherrington said: “What is the precise mechanism that governs this change?”ΔT = (Tx + B) – (Tavg + B) = (Tx – Tavg) + (B – B) = Tx – Tavg

B is the time invariant systematic bias. For example, BEST has two methods of computing the average temperature. One use the air temperature above sea ice and one uses the water temperature below sea ice. The bias B for the 1st method wrt to the 2nd is -0.6 C whereas for the 2nd method wrt to the 1st it is +0.6 C. In both cases the bias B cancels out when converting to anomalies yield the exact same annual ΔT values for each method [1].

Don’t hear what isn’t being said. Converting to anomalies has advantages but it isn’t a panacea.

OMG I can’t believe you did this. If B is an uncertainty you DO NOT subtract them to determine the combined uncertainty!

Have you learned nothing over the last several months. Quote a reference from Taylor or the GUM that allows you combine uncertainties in this fashion!

First, an uncertainty is stated as:

(Tx ± B) and

(Tavg ± B)

Uncertainties are intervals and simply can’t be combined by subtracting them. What if you had four measurements to combine?

Based on past experience I can easily believe it.

What about a simple subtraction operation lowers the uncertainty?

Show your math.

I’m not good enough at math to break this down, but how can you compile absolute temp records with .5 uncertainty, then use those same records to derive an anomoly and that reduces uncertainty from the same underlying data? Sounds like CliSci math.

The global average absolute temperature has a systematic bias B which contributes significantly to the 0.5 C uncertainty (1st graph). Given this bias we known that Tobs = Ttrue + B. And when you convert to anomalies you get ΔT = (Tobs_x + B) – (Tobs_avg + B) = (Tobs_x – Tobs_avg) + (B – B) = Tobs_x – Tobs_avg. Notice that since Tobs is biased by B then Tobs_avg must also be biased by B as well which means the bias cancels out when converting to anomalies. Notice the removal of B (2nd graph) brings the datasets much closer together.

Don’t hear what isn’t being said. Converting to anomalies has advantages but it isn’t a panacea.

Nonsense. Uncertainty is not simply bias that cancels out in subtraction. That’s why it’s stated as a +/- #. If the uncertainty of the mean baseline is +/- 0.5 and the uncertainty of the individual measurement is also +/- 0.5, the uncertainty of the anomaly is +/- 0.7.

Bias (aka systematic error) may be part of the MU, but its magnitude and sign are unknown by definition since if known it would be removed in calibration. In any case whatever bias may exist in the uncertainly of the mean does not need to be the same magnitude or sign as the bias of an individual measurement.

I never said uncertainty is simply bias that cancels out in subtraction. What I said is that a significant contribution to the uncertainty of the absolute values is a systematic bias. If you remove the bias, whatever it happens to be, you remove that component of the uncertainty. You’re still left with the noise component of the uncertainty however. Notice how the agreement between datasets is significantly improved (2nd graph) when the systematic bias is removed as compared to the agreement with the systematic bias left in place (1st graph). You can perform a rigorous type A evaluation of the uncertainties with and without the bias removed and prove this out for yourself. Though, a simple glace at the graphs should be enough to convince you without the tedium of doing the analysis mathematically.

The usual bgwxyz hand-waving word salad nonsense.

You need a climate gibberish decoder ring

I’m selling them for $1 plus shipping.

bgwxyz MUST have brown eyes because he is full of IT!

BTW…to be more consistent with NIST TN 1297 I should have said anomalization removes the component of uncertainty arising from a systematic effect, but leaves the component of uncertainty arising from random effect. My original wording for the former was already most consistent with NIST TN 1297, but I’m certainly guilty here of using the term “noise” synonymously with the term “random” for the later. Hopefully that was rather obvious. I just wanted to address that now in the interest of being transparent.

These numbers are UNKNOWN, how can you remove them?

You can’t remove them. They add!

The uncertainty of an average of different things is not the uncertainty of a single component of the average. It is the sum of the uncertainties of *all* of the components.

It’s not Tavg + B, instead it is Tavg + nB, where n is the combined uncertainty of all of the components.

How does anomalization remove an unknown systematic effect?

Captain climate said: “How does anomalization remove an unknown systematic effect?”Here is how the math works out. Using the terminology from NIST TN 1297 which Geoff linked to in the article the uncertainty is composed of a systematic effect component and a random effect component. Let B be the systematic component and R be the random component. Because each observation has the systematic component B then the anomaly baseline Tobs_avg will also have systematic component B. Notice that the B terms cancel out leaving only the R terms. We don’t even need to know what B is.

ΔT = (Tobs_x + B + R_x) – (Tobs_avg + B + R_avg)

ΔT = (Tobs_x + Tobs_avg) + (B – B) + (R_x – R_avg)

ΔT = (Tobs_x + Tobs_avg) + R_x – R_avg

10 GOTO 10

“ΔT = (Tobs_x + B + R_x) – (Tobs_avg + B + R_avg)”

It’s actually ΔT = (Tobs_x + B + R_x) – (Tobs_avg +

nB + R_avg)fixed it for you. The uncertainty of the average is not the uncertainty of a single component of the average. It is the propagated uncertainty of *all* of the individual components.

bdgwx

I think you are wrong because the X terms are not the same magnitude in each equation, so they cancel only if they are the same size.

Geoff S

The X terms (Tobs_x and R_x) do not cancel. It is the time invariant systematic effect B that cancels.

A fantasy that exists only inside your mind.

They also only cancel if you are measuring the same thing using the same device. There is no guarantee that B_observed will be the same as B_avg when you are measuring different things.

It doesn’t. If you have 100 boards with an systemic bias of 1″ in their measurement, then the sum of their lengths (which is needed for the average) will have an uncertainty of 100 x 1″ = 100 inches. When you divide the sum by “n” to get the average, “n” is a constant and contributes nothing to the total uncertainty.

The uncertainty of Tavg/n is u(Tavg) + u(n). It is *NOT* u(Tavg)/n

bdgwx==> You apparently do not understand what bias is when referring to uncertainty. I’ll try to explain. Suppose we run a calibration on a thermometer by taking a series of measurements versus “know” references. If the instrument reads 1 degree higher than the reference each time, that’s a bias and we can simply adjust the instrument or subtract 1 from the readings to correct it.

OK, so far so good. But when we look at the calibration data for the references, we find that they have an uncertainty of +/- 1.0 degrees. There might be an actual deviation from the stated value that +0.7 or -0.4 or some different value for each temperature used in the calibration. We don’t know. All we know is there’s < a 5% chance it’s larger than 1.0 degrees either direction. What we do know is whatever the true deviation is in the reference is now embodied as an uncertainty component in the calibrated instrument. We can’t eliminate it, adjust for it or correct it because we don’t know what it is. If our calibration reference has an MU of +/- 1.0 degree, our calibrated instrument must include that MU in its MU determination.

NB: In practice we generally require calibration references to have MU’s less than 1/4th of the resolution of the item being calibrated. Less than 1/10th is commonly used. This keeps systematic bias due to references small enough to not be significant.

In your example, and using the NIST TN 1297 terminology, is that ±1.0 figure a random component of uncertainty, a systematic component of uncertainty, or total uncertainty?

I assume you mean the +/-1.0 MU of the references. It is the total uncertainty as determined by the supplier of the reference and includes both systematic and random sources. Many such references are primary standards calibrated directly by NIST. The calibration certificate may well include the detailed uncertainty budget applied.

Yes, the 1.0 MU is what I meant. So if that is the total uncertainty then it has both the random effect component and the systematic effect component included.

What I am referring to is the systematic effect component only which I’m guilty of calling bias despite NIST’s recommendation to the contrary. Anyway, to clarify the point I’m making I’ll include the random effect component as R in addition to the systematic component B.

ΔT = (Tobs_x + B + R_x) – (Tobs_avg + B + R_avg)

ΔT = (Tobs_x + Tobs_avg) + (B – B) + (R_x – R_avg)

ΔT = (Tobs_x + Tobs_avg) + R_x – R_avg

Same as before. The systematic effect component cancels out. But now you can see that the random effect components remain.

And then following the procedures discussed in NIST TN 1297 and JCGM 100:2008 we know that u(ΔT) = sqrt(R_x^2 + R_avg^2).

This is why I say converting to anomalies is not a panacea. You trade the elimination of the systematic effect with an increased random effect. So it only makes sense to do this when believe the systematic effect is larger than the random effect.

No, you cannot claim that your B for an individual reading is the same as your B for the average. They are unrelated and the systematic uncertainty for the average is the combined systematic uncertainty of all the values used in the average- in theory traceable to each instrument and the references used to calibrate them. But again it’s a stated range of probable uncertainty in every case. So it might be something like +0.7 for the single reading and -0.5 for the average and thus 1.2 for the result. But since no one can know the “true” value of either, you cannot know the true magnitude of the error. So you must state the uncertainty of your result in terms of how large the error might be at a given level of confidence.

I have observed that “scientists” seem to try to minimize their MU determinations to make their conclusions appear more certain, while engineers tend to make very conservative MU statements to avoid over confidence.

A good summary. The uncertainty of the average is not “B”, it is “nB” where n is a factor representing the propagated uncertainty of the individual components of the average.

bdgwx suffers from the delusion that multiple measurements of the same thing resulting in an identically distributed set of readings in which the uncertainty cancels thus giving a “true value” is the same as multiple measurements of different things in which an identically distributed distribution of readings are not guaranteed. Thus there is no guaranteed “true value”.

Rich C: “No, you cannot claim that your B for an individual reading is the same as your B for the average.”Algebra says otherwise. Given Tx_obs = Tx + B + Rx then:

Tavg_obs = Σ[Tx_obs + B + Rx, 1, N] / N

Tavg_obs = Σ[Tx_obs + Rx, 1, N] / N + B

Remember, B is the systematic component; not the random component. All observations are biased by the same systematic effect component. If you know the value of B you just apply the correction per JCGM 100:2008 section 3.2.3. If you don’t know what B is then it remains as an uncertainty component for the observation. Your only choices at this point are to live with it or convert your values to anomalies so that it cancels out.

And again, don’t hear what isn’t being said. Anomalization has advantages. But it has disadvantages to. It is not a panacea. And for timeseries data the anomalization procedure only works to cancel the systematic effect if the systematic effort is time invariant. In reality there are many systematic effects only some of which are time invariant.

More crappola nonsense BS—

you cannot know that B is a constant!Rich C, btw…the “uncertainty of the average” is neither B nor nB. It is still u(avg) which is composed of a systematic component and a random component. The error of the average has the systematic effect B and the random effect Ravg. Furthermore, the systematic component of the average is B; not nB. This is easily proven mathematically because of the identity Σ[Xi + B, 1, N] / N = Σ[Xi, 1, N] / N + B.

You do realize you have numerous professional people telling you that your assertion is incorrect. Wake up.

“Furthermore, the systematic component of the average is B; not nB.”

Pure, unadulterated malarky!

Variances add when combining random variables.

Since each independent measurement of a different thing is an independent, uncorrelated, random variable with a population size of one and a variance of +/- u, the variances add when you combine them.

Simple, plain statistics which you are trying to avoid by treating multiple measurements of different things in the exact same manner that you treat multiple measurements of the same thing!

If instrumental uncertainty is quoted as +/-1dgC it must be considered as a Type B uncertainty with a rectangular distribution assumed (see 4.3.7of the GUM). In this case it is converted to a standard uncertainty of 1/sqrt(3) and may then be combined with other uncertainties (such as a sequence of random measurements with standard uncertainty= ts/sqrt(N) for low N(Student’s t) or s/sqrt(N) for large N) in order to arrive at a combined standard uncertainty. This is then expanded to 95% by multiplying by k=2 or 99% (k=3).

You cannot average away this uncertainty. Say I had a DMM with stated accuracy of +/-10mV. I measure a voltage of 1.000V. I take a second reading and get 0.999V. What is the real voltage? All I can do is average the 2 values and apply the rules to report the value with a stated uncertainty. More than that I cannot say.

Now say I subtract the values and get 1mV. What is my uncertainty? Since I used the same DMM at the same temp, humidity etc I might expect that some of the reading value is down to systematic error in the meter. But I don’t know how much. I only have the spec and I have to assume a rectangular distribution. In other words my first measured value could be anywhere from 0.99V to 1.01V. I simply do no know the ‘true’ value. My second reading is between 0.989V to 1.009V. The reading ranges overlap.

If the 2 readings are uncorrelated then laws of propagation of uncertainty would give me sqrt(sum vars). Of course there is likely some correlation between the 2 values. But how correlated are they? Who determines this number. And remember that this is for 2 readings using the same DMM. Now I want to combine a number of temp readings from different instruments. They are not correlated. So uncertainties from different stations should be propagated orthogonally and averaged. Then, if all stations had 0.5dgC uncertainty the final uncertainty would also be 0.5dgC.

Absolutely true. One small point I would offer regarding the coverage factor k that converts combined uncertainty into expanded uncertainty: if the distribution of the combined uncertainty is normal, then using k=2 would represent 95% coverage for the uncertainty interval. However, if the combined uncertainty has elements for which the distribution isn’t known (especially Type B elements such as from a DMM manufacturer’s error spec sheet), then strictly speaking, it is incorrect to say that using k=2 gives 95% coverage.

In test laboratory certification that follows ISO 17025, a lab is required to develop a formal UA for all calibrations perfomed, and report uncertainties for each calibration result using k=2 for expanded uncertainty. So in this respect, k=2 has become simply the standard coverage factor.

It is very common to see expanded uncertainty referred to as “U_5”, and the GUM unfortunately seems to imply this. But using “U_k=2” instead eliminates any confusion, and it gives an easy way for downstream users to convert the calibration lab’s expanded uncertainty back into combined uncertainty to use in their own uncertainty calculations.

Thanks Carlo, Monte. It is my understanding (& maybe I am wrong) that all uncertainties are first converted to standard uncertainties (i.e. a normalized equivalent), then combined orthogonally and thus it is correct to use k=2 for 95%. For example, for type B instruments the distribution is generally considered rectangular (lets say +/a as a spec). Thus, by calculating a/sqrt(3) it becomes a standard uncertainty, i.e. equivalent to 1 sigma for a normal distribution and can therefore be combined orthogonally with a type A normal distribution. I am not a metrologist however so maybe I am wrong about this.

Nor am I; I was made aware of this issue by a mathematician friend. You are correct about the Type B calculations but remember that the choice of rectangular, trapezoidal, triangular is mostly a matter of engineering judgement of the interval within which a true value might lay. Then consider a combined uncertainty that is comprised of several Type Bs, and the actual distribution becomes intractable without a lot more information that usually isn’t available. And if it was available, you then might be able to consider them as Type As instead.

Analog Design Engineer said: “It is my understanding (& maybe I am wrong) that all uncertainties are first converted to standard uncertainties”Correct. The uncertainty u as described and used in the GUM is a standard deviation (see 3.3.5). Any method computing the uncertainty u requires all dependent uncertainties used as inputs to also be standardized.

Analog Design Engineer said: “For example, for type B instruments the distribution is generally considered rectangular (lets say +/a as a spec).”Not necessarily. And, in fact, if using the recommendation in the GUM 7.2.3 it should be stated as k*u_c(y) which is an expanded standard uncertainty.

However, per F.2.4.2 if knowledge of the stated uncertainty is unknown that it must be assumed to be rectangular.

Analog Design Engineer said: “I am not a metrologist”For the record…I’m not a metrologist either.

“The uncertainty u as described and used in the GUM is a standard deviation”

Standard deviation only applies if you have a normal distribution, i.e. multiple measurements of the same thing.

Standard deviation does *NOT* apply if you do not have a normal distribution (or at least one that is identically distributed). Multiple measurements of different things (i.e. temperature at different locations) cannot be guaranteed to be either a normal distribution or an identically distributed distribution. Therefore standard deviation is not a proper description of the uncertainty associated with multiple measurements of different things.

Another fail.

How could anyone tell?

This method works for measurements of the same thing where the uncertainty distribution can be classified in some manner, always requiring that the distribution be identically distributed. For a normal distribution you divide by sqrt(2). For a rectangular distribution you divide by sqrt(3), for a triangle distribution you divide by sqrt(6).

For a distribution that is not identically distributed there is no standard method of dividing the uncertainty to get a standard deviation that I know of. In fact, for a distribution that is no identically distributed the use of a standard deviation is pretty much meaningless, it is not a proper statistical descriptor for a such a distribution!

Since temperatures from numerous locations are not a normal distribution or even identically distributed, the total uncertainty from propagating the uncertainty of individual components is probably as good as you get for use.

Note that for Type Bs, the divisors come out to be 1.41, 1.73, and 2.45. In terms of relative uncertainty values, this range is rather small. It is possible to follow a worst-case path and just assume all are rectangular; for triangular the over-estimation is only a factor of 1.73 (sqrt(3), which shows up in many places in UA).

“It is possible to follow a worst-case path and just assume all are rectangular”

For measurements of the same thing, I agree. For measurements of different things I don’t. The distribution of measurements of different things can be multi-modal as well as the uncertainties associated with the measurements. Assuming a rectangular distribution is certainty possible but it may be WAY OFF from reality.

Consider that winter temps have a higher variance than summer temps. How does assuming a rectangular distribution have a chance of correctly describing a combination of the two?

I can’t argue, but many times the actual distribution isn’t known and a Type B remains a judgement call. Also, using relative uncertainty doesn’t work with temperature data without converting to absolute temperature (K). This points back to the issue of how climate science totally ignores actual distributions in the data they average into horizontal lines.

It is interesting that GUM says if your uncertainty has a large Type B contribution, you need to do something else (how “large” is fuzzy and not defined). But if the measurement is primarily electrical measurements using DVMs, for example, there is no way to reduce the Type B contribution. You are stuck with the manufacturer’s error specs.

“This points back to the issue of how climate science totally ignores actual distributions in the data they average into horizontal lines.”

I agree with this.

I’ve often thought that before anyone gets a PhD in a physical science they should be required to pass a journeyman’s exam in something, carpenter, electrician, welder, machinist, etc. It might give them some well needed grounding in reality.

A buddy of mine at NIST had a boss for a time who had a PhD in EE from somewhere in Taiwan—this guy was so clueless he literally did not know what a resistor looked like, let alone the value/tolerance bands.

Most of the good engineering schools here in the states still require labs where you actually have to build things. Not saying they learn anything, some of the lab experiments are like the old Heathkit kits – find a part that looks like this and stick in the breadboard here and here. When done measure the voltage at Point A on the breadboard and record it here.

I was building vacuum deposition systems as a senior.

Wonder how much of this started with the merging of EE and CS departments?

I sure the merging was a factor. The truly sad thing is seeing so few first year engineering students today and the increasing number that drop out at 2nd semester. This is at both my alma mater and my son’s alma mater.

And it’s not jut in engineering. My youngest son (now a PhD in immunology) was told they no longer had a math requirement for microbiology majors (around 2005). They said if you needed any math done to just find a math student or grad student to do it for you. The blind leading the blind – the microbiologist that knew no math and the math major that knew no biology. He didn’t buy into that and took basic calculus and statistics so he could judge data for himself. But he thinks that is a large contributor to so many studies today can’t be replicated.

Analog Design Engineer said: “You cannot average away this uncertainty.”Nobody is saying that the uncertainty of individual measurements goes away when you average those measurements. The contrarians who disagree with the GUM are reframing the debate to make it sound like people are saying that as a strawman argument to then attack the conclusion that the uncertainty of the average as computed via equation 10 is u(x_avg) = sqrt[Σ[(1/N)^2 * u(x_i)^2]] when x_i are uncorrelated.

Analog Design Engineer said: “Say I had a DMM with stated accuracy of +/-10mV. I measure a voltage of 1.000V. I take a second reading and get 0.999V. What is the real voltage?”The uncertainty of each measurement is still ±10 mV. It doesn’t matter how many measurements you take. It is always ±10 mV.

Analog Design Engineer said: “Now say I subtract the values and get 1mV. What is my uncertainty?”Using the procedure in the GUM defined in section 5 and assuming the measurements are uncorrelated we get the following.

Y = f(x_1, x_2) = x_1 – x_2

Our partial derivatives are:

∂x_1/∂f = 1

∂x_2/∂f = 1

Let:

x_1 = 1.000 ± 0.010 mV (rectangular)

x_2 = 0.999 ± 0.010 mV (rectangular)

Therefore

u(x_1) = 0.010 / sqrt(3) = 0.0058 mV

u(x_2) = 0.010 / sqrt(3) = 0.0058 mV

Starting with GUM equation 10:

u_c^2(Y) = Σ[(∂f/∂x_i)^2 * u(x_i)^2, 1, N]

u_c^2(Y) = 1^2 * 0.0058^2 + 1^2 * 0.0058^2

u_c^2(Y) = 0.000067

u_c(Y) = sqrt[0.000067]

u_c(Y) = 0.0082

Therefore

Y = 0.001 ± 0.0082 mV

I confirmed this with the NIST uncertainty machine.

The astute reading will recognize that the linear approximation of the law of propagation of uncertainty reduces to the familiar root sum square in this case because Y = f(x_1, x_2) = x_1 – x_2 and the partial derivatives of the inputs are both 1.

Analog Design Engineer said: “If the 2 readings are uncorrelated then laws of propagation of uncertainty would give me sqrt(sum vars).”That is correct.

Analog Design Engineer said: “Of course there is likely some correlation between the 2 values. But how correlated are they? Who determines this number.”Absolutely. They will almost certainly have some correlation. You determine their correlation experimentally. You can use the procedure and equations in GUM JCGM 100:2008 section F.1.2.3 to form the correlation matrix that is then used in GUM equation 15. The NIST uncertainty machine allows you enter this correlation matrix as well.

Analog Design Engineer said: “Now I want to combine a number of temp readings from different instruments.”How are you wanting to combine them? Can you define the combining function f mathematically?

Analog Design Engineer said: “Then, if all stations had 0.5dgC uncertainty the final uncertainty would also be 0.5dgC.”Not necessarily. It depends on how you are combining them. For uncorrelated measurements this is only true when the function yields ∂x_i/∂f = 0.5 for all x_i.

sorry but I stopped after:

“x_1 = 1.000 ± 0.010 mV (rectangular)”

its either 1000 +/- 10mV or 1.000 +/-0.01Volts

Analog Design Engineer said: “its either 1000 +/- 10mV or 1.000 +/-0.01Volts”Doh…typo. I hate it when I do that. Thank you for spotting that so that I can fix it. Since I’m not allowed to correct it I’ll replicate that portion in its entirety with the typos fixed and boldened. If you spot any more typos let me know.

Using the procedure in the GUM defined in section 5 and assuming the measurements are uncorrelated we get the following.

Y = f(x_1, x_2) = x_1 – x_2

Our partial derivatives are:

∂x_1/∂f = 1

∂x_2/∂f = 1

Let:

x_1 = 1.000 ± 0.010

V(rectangular)x_2 = 0.999 ± 0.010

V(rectangular)Therefore

u(x_1) = 0.010 / sqrt(3) = 0.0058

Vu(x_2) = 0.010 / sqrt(3) = 0.0058

VStarting with GUM equation 10:

u_c^2(Y) = Σ[(∂f/∂x_i)^2 * u(x_i)^2, 1, N]

u_c^2(Y) = 1^2 * 0.0058^2 + 1^2 * 0.0058^2

u_c^2(Y) = 0.000067

u_c(Y) = sqrt[0.000067]

u_c(Y) = 0.0082

Therefore

Y = 0.001 ± 0.0082

VI confirmed this with the NIST uncertainty machine.

The astute reading will recognize that the linear approximation of the law of propagation of uncertainty reduces to the familiar root sum square in this case because Y = f(x_1, x_2) = x_1 – x_2 and the partial derivatives of the inputs are both 1.

And so, bdgwx, you will agree that in this case the uncertainty value is more than 8 times the difference. Similarly, subtracting 2 temperatures which perhaps vary by 0.1dgC using instruments with stated accuracy of 0.5dgC will yield a uncertainty far larger than the difference. Thus, how could anyone possibly claim uncertainty to hundredths or thousandths of a degree on a globally calculated average.

Analog Design Engineer said: “And so, bdgwx, you will agree that in this case the uncertainty value is more than 8 times the difference.”Absolutely. That’s what the math says.

Analog Design Engineer said: “Similarly, subtracting 2 temperatures which perhaps vary by 0.1dgC using instruments with stated accuracy of 0.5dgC will yield a uncertainty far larger than the difference.”The NIST uncertainty machine says that for the measurement model y = f(x_1, x_2) = x_1 – x_2 and where u(x_1) and u(x_2) are both 0.5 C then u_c(y) = 0.71 C. So if abs(x_1 – x_2) = 0.1 then we cannot say that x_1 and x_2 differ by a statistically significant amount.

Analog Design Engineer: “Thus, how could anyone possibly claim uncertainty to hundredths or thousandths of a degree on a globally calculated average.”I’ve not seen any one claim uncertainty for a global average temperature down to ±0.01 C nevermind into the thousandths. The lowest I’ve seen is around ±0.03 C from BEST. Anyway, the primary reason why these uncertainties are as low as they are given the relatively high instrumental uncertainty is due to the law of propagation of uncertainty equation E.3 in GUM JCGM 100:2008. When your measurement model is an average (or more generally contains a division) the combined uncertainty is generally less (often significantly) than the uncertainty of the inputs to that model.

Consider your example of 1.000 ± 0.010 V and 0.999 ± 0.010 V. When the measurement model is y = f(x_1, x_2) = (1/2)x_1 + (1/2)x_2 then u_c(y) = 0.0041 V. I encourage you to verify this with the NIST uncertainty calculator. Use a rectangular distribution for both inputs and enter 0.5*x0 + 0.5*x1 as the R expression at the bottom.

The usual bgwxyz obfuscation — 0.03 is certainly in this range, and is most certainly absurd.

Go back to playing with the NIST calulator, you have been exposed, again.

HAHAHAHAHAHAHA

Like I said, totally absurd.

FOUR milli-Kelvins?

HAHAHAHAHAHAHAHAAHAHAHA

And you mocked me for using this unit, yet here YOU are!

How many anomaly graphs have you seen that show values of 0.025? How about in 1900 when temps were only recorded to integer values?

Look at this page from giss. Look at the very first entry of 1880. An anomaly of “-0.17”!

Tell us how using temps recorded as integers with an uncertainty of ± 1 and an integer divisor can possibly provide a measurement of -0.17. Do these people think that Significant Digits don’t apply to their work?

Just visualize for me and add 1 to each temperature, then subtract 1 from each temperature and use those as the limits of what might be possible. Draw a black line between those limits. Do you really think you could find a line of -0.17? That is what uncertainty is. You can’t see between those limits. You don’t know what the true values are between those lines and, you can never know.

you forgot the link.

“Anyway, the primary reason why these uncertainties are as low as they are given the relatively high instrumental uncertainty is due to the law of propagation of uncertainty equation E.3 in GUM JCGM 100:2008.”

E.3.1 starts off with defining z = f(w1,w2, …, wn).

If w1, w2, …, wn are uncorrelated, which temperatures would be, then we get the standard law of propagation:

(σ_z)^2 = Σ (∂f/∂w_i)^2 (σ_i)^2

————————————————————-

E.3.4 says: “Consider the following example: z depends on only one input quantity w, z = f (w), where w is estimated by averaging n values w_k of w;

these n values are obtained from n independent repeated observations q_k of a random variable q; and w_k and q_k are related by” (bolding mine, tg)w_k = ɑ + βq_k

Here ɑ is a constant “systematic” offset or shift common to each observation and β is a common scale factor. The offset and the scale factor, although fixed during the course of the observations, are assumed to be characterized by a priori probability distributions, with α and β the best estimates of the expectations of these distributions.

The best estimate of w is the arithmetic mean or average w obtained from

ϖ = (1/n)Σ w_k = (1/n)The quantity z is then estimated by f(ϖ) = f(σ, β, q1,q2,…qn) and the estimate u^2(z) of its variance σ^2(z) is obtained from Equation E.3. If for simplicity it is assumed that z = w so that the best estimate of z is z = f(ϖ) = ϖ, then the estimate u^2(z) can be readily found.

—————————————————-

This only applies, as usual, to the situation where you have multiple measurements of the same thing! Two independent weight measurements are not the same thing. Two independent temperature measurements are not the same thing.

Let me repeat the restrictions for use: “

these n values are obtained from n independent repeated observations q_k of a random variable q;”Multiple measurements of the same thing!

No measurement data set that I know of has multiple measurements of the same temperature. They all include multiple measurements of different things. Therefore Section E.3 does not apply.

Let me quote further from Section E.3.5:

“Of more significance, in some traditional treatments of measurement uncertainty, Equation (E.6) is questioned because no distinction is made between uncertainties arising from systematic effects and those arising from random effects. In particular, combining variances obtained from a priori probability distributions with those obtained from frequency-based distributions is deprecated because

the concept of probability is considered to be applicable only to events that can be repeated a large number of times under essentially the same conditions,with the probability p of an event (0 ≤ p ≤ 1) indicating the relative frequency with which the event will occur.” (bolding mine, tg)I have yet to see a proper analysis of uncertainty in most of the clisci literature. I have seen excuses for using small uncertainties ranging from “the GUM covers both multiple measurements of the same thing and multiple measurements of different things” to “averages cancel uncertainty” to “anomalies cancel uncertainties” to “measurement device resolution is its uncertainty” to “uncertainty is a random walk which eventually cancels” to “standard deviation of the sample means is the uncertainty of the population”. There are probably others. They are all wrong. Every Single ONE.

They are the very same inanities used to denigrate Pat Frank’s analysis of the accumulation of uncertainty in iterative models from inputs with uncertainty.

The GAT is a joke. Send in the clowns.

blob hops this freight train just down-thread with a smorgasbord of hand-waved word salad to arrive at the impossibly small uncertainty station.

Have to educate them one by one. Not a single one can tell the difference between multiple measurements of the same thing and single measurements of multiple different things.

BZZZZZZT. Same old nonsense, a different day.

Get some clues PDQ.

“The contrarians who disagree with the GUM are reframing the debate to make it sound like people are saying that as a strawman argument to then attack the conclusion that the uncertainty of the average as computed via equation 10 is u(x_avg) = sqrt[Σ[(1/N)^2 * u(x_i)^2]] when x_i are uncorrelated.”

The contrarians are trying to teach you basic metrology, but like a stubborn mule you refuse to learn.

You are trying to fool everyone that the GUM talks about calculating the uncertainty of independent objects. It doesn’t.

For independent objects the uncertainty is calculated as follows.

—————————————————-

Avg = [Σ(x1, …, xn) ] /n

u(avg) = sqrt[ (u(x1)/x1)^2 + (u(x2)/x2)^2 + … (u(xn)/xn)^2 + (u(n)/n)^2 ]

Since u(n) = 0 because it is a constant the u(avg) becomes

sqrt[ (u(x1)/x1)^2 + (u(x2)/x2)^2 + … (u(xn)/xn)^2 ]

i.e. the uncertainty of the sum.

—————————————————

Equation 10 in JCGM 100:2008 is:

[ u_c(y) ]^2 = Σ (ẟf/ẟx_i)^2 u(x_i)^2 evaluated from 1 to N

Where f is the function given in Equation 1. Each u(x_i) is a standard uncertainty evaluated as described in 4.2 (Type A evaluation) or as in 4.3 (Type B evaluation). The combined standard uncertainty u_c(y) is an estimated standard deviation and characterizes the dispersion of the values that could reasonably be attributed to the measurand (Y).

Section 4.2

4.2.1 In most cases, the best available estimate of the expectation or expected value μq of a quantity q that varies randomly [a random variable (C.2.2)], and for which n independent observations qk have been obtained under the same conditions of measurement (see B.2.15), is the arithmetic mean or average q (C.2.19) of the n observations:

———————————————-

Since we have ONE measurement of multiple things, the arithmetic mean or average doesn’t apply. This only applies when you have a distribution of measurements like those described in Figure 1 attached. Multiple measurements of different things cannot be guaranteed to provide this kind of distribution.

“with a rectangular distribution assumed”

I have never agreed with this. The actual distribution is unknown. In fact, if you consider that the true value lies within the interval then the true value has a probability of one and all the other values in the distribution have a probability of zero. An impulse function, if you will. The problem is that you don’t know where that impulse lies within the interval. That’s why it is called “uncertainty” – it’s unknown and can never be known.

“They are not correlated. So uncertainties from different stations should be propagated orthogonally and averaged. Then, if all stations had 0.5dgC uncertainty the final uncertainty would also be 0.5dgC.”

If they are not correlated then propagating them orthogonally would mean you add them using root-sum-square I believe. In other words the uncertainty would grow, it would not remain constant.

Rick,

Good summary! I would only note that the minute that calibrated instrument is placed in the field, component drift will alter the instrument. While the systematic bias might be removed in the lab it wont’ stay removed in the field. Thus the usual specification for field temperature measurement devices of +/- 0.5C.

You don’t seem to know what uncertainty is.

What is the uncertainty of the 30-year baseline average that is subtracted from all the observations to create the anomaly?

That *IS* the question.

According to bdgwx it is the less than the uncertainty of each individual component in the average!

Clyde Spencer: “What is the uncertainty of the 30-year baseline average that is subtracted from all the observations to create the anomaly?”Using the terminology from NIST TN 1297 and JCGM 100:2008 the uncertainty has two components: the systematic effect and the random effect. So for any observation Tx_obs = Tx + B + Rx where B is the error from the systematic component and R is the error from the random component. So Tbaseline = Σ[Tx_obs + B + Rx, 1, N] / N = Σ[Tx_obs + Rx, 1, N] / N + B.

How many more times are you going to spam this false garbage?

I’ll ask another question: How does one characterize the systematic component (B), when the 30-year time series has a trend (increasing temperature)?

In this particular case we don’t even attempt to characterize it with a value. We leave it as a variable to be cancelled via algebra. But generally speaking the GUM does describe a procedure for doing so numerically in section F if that is desired.

He excels in nonsense.

Your analysis assumes that B is the same for every measurement, and for all time, everywhere in the world?

Retired_Engineer_Jim said: “Your analysis assumes that B is the same for every measurement, and for all time, everywhere in the world?”Correct. Though by “every measurement” in this context it is all of the annual global average temperature values in graph 1 of the Schmidt article. If they all have a systematic bias B then that bias cancels out when you do the conversion to anomalies.

And you DON’T know this to be true, and can NEVER know them.

What is this, psychic data mannipulation?

Except you forgot that the uncertainty of the average is not B but instead nB!

bdgwx==> This is where you go off the rails. The systematic error (or component of uncertainty) is likely different for every individual instrument. Further, it is unlikely to be constant over time. It may also be non-linear over the instrument range. e.g. some instruments have MU stated as “+/- X” while others may have it stated as “+/- Y% of measured value” or “+/- Z% of full scale”. The assumption that it’s constant across all measurements of a specific parameter is just plain wrong. I can buy temperature measurement instruments for anywhere from $10 to $10,000 with various levels of accuracy (MU) from say +/- 2 dg C to +/- 0.001 dg F.

The only way to legitimately reduce MU is to average multiple measurements of the same thing with the same instrument done at the same time under stable environmental conditions. You can then divide the instrument’s stated MU by the square root of the number of readings. BUT the MU thus derived is the

MU of the average. It does not do anything to change the MU of individual measurements. But it does reduce the MU of the result of the measured item at that time. That’s because we assume that the variation in measurements is random and normally distributed. But this assumption is only valid in general because we use calibration methods and references to minimize possible systematic error to insignificant levels. If, in fact, the primary source of uncertainty was systematic error (bias) averaging multiple measurements would do nothing to improve uncertainty.By the way, I am a trained metrologist, PEME (retired) and have conducted seminars on the GUM, ISO 17025 and compliance with MU reporting requirements for testing and calibration laboratories. There is a great deal more nuance to the subject, but you need to give up on the idea that your “B” is a correct representation of bias of systematic error. In the real world we simply never know what it actually is and can, at best, provide a reasonable estimate of its probable maximum.

*

APPLAUSE!*Maybe this lesson will sink in.

No, it won’t.

Rich C said: “The systematic error (or component of uncertainty) is likely different for every individual instrument.”It’s not just likely; it’s certain. But we aren’t discussing the systematic error of the individual observation. We are discussing the systematic error of the measurement model that computes the global average absolute temperature. There will be a systematic effect component that is the same for every monthly value. And don’t hear what isn’t being said. It is not being said that there isn’t a systematic effect that does change from month nor is being said there isn’t a systematic effect on every individual measurement that is different.

Rich C said: “Further, it is unlikely to be constant over time.”Absolutely. Such is the case with the time-of-observation bias, instrument/shelter change bias, station move bias, etc. That is not being discussed at this time. That is whole other complication.

Rich C said: “The only way to legitimately reduce MU is to average multiple measurements of the same thing with the same instrument done at the same time under stable environmental conditions.”That is certainly true. But we aren’t discussing the reduction of MU via averaging right now. We are discussing the removal of a time invariant systematic error as can be seen in graph 1.

Rich C said: “If, in fact, the primary source of uncertainty was systematic error (bias) averaging multiple measurements would do nothing to improve uncertainty.”Nobody is saying it would. In fact, I stated this fact explicitly with the math Σ[Xi + B, 1, N] / N = Σ[Xi, 1, N] / N + B above. The point is that if you convert to anomalies via ΔXj = (Xj + B + Rj) – Σ[Xi + B + Ri, 1, N] / N the B term cancels completely. Do the algebra and prove this out for yourself. And don’t hear what isn’t being said. Anomalization has advantages, but it is not a panacea. It comes at the cost of increasing the random component of uncertainty and removing the absolute component of the temperature.

Rich C said: “There is a great deal more nuance to the subject, but you need to give up on the idea that your “B” is a correct representation of bias of systematic error. In the real world we simply never know what it actually is and can, at best, provide a reasonable estimate of its probable maximum.”And yet the GUM JCGM 100:2008 says that the systematic effect can be corrected. They even provide a procedure for determining B (they use lower case b) experimentally in F.2.4.5. And notice from equation F.7b that when the correction b for the systematic effect is constant then u(b) is zero. Also notice that the uncertainty can be expanded such that U = u + b.

“And yet the GUM JCGM 100:2008 says that the systematic effect can be corrected.”As I stated earlier if a systematic effect is of known magnitude and corrected for, it is no longer a systematic error. The systematic error component of MU is, by definition, unknown. If I know my bathroom scale always shows 3 pounds more than my true weight I just subtract 3 from what it indicates – no error, but if I don’t know it’s off I think I weigh 3 pounds more than I really do – the error exists and is not recognized.

Now instead of wanting to know your absolute weight W you only want to know the change in your weight ΔW. Does the 3 lb systematic error, which exists on each weight measurement W, contaminate ΔW?

Inappropriate analogy to global air temperature measurements—another fail.

And you artfully dodged the question about error corrections for a calibration versus unknown errors.

I’m just don’t understand why this point keeps getting lost by those trying to justify the GAT as actually meaning something.

Autism perhaps? Its all I can think of.

Once again, you are talking about measuring the same thing!

If his neighbor weighs himself on a difference scale with a different systematic bias and you average those weights together how does that correct obvious systematic bias? Even taking the deltas won’t cause a cancellation. Some of the systematic bias will still be there.

And let’s be clear, with Tmax and Tmin you are measuring different things. When you ADD them together to form a sum the systematic bias in each ADDS. Dividing by 2 doesn’t decrease that systematic bias. It remains. Dividing by a constant with no uncertainty neither adds or subtracts from uncertainty. To use the GUM the contribution from the constant would be (ẟf/ẟn)u(n).

ẟf/ẟn = 0 for a constant and u(n) = 0 for a constant. Therefore u_c(y) would see no contribution, either positive or negative, by dividing by n.

When you average those mid-range values to create an anomaly those added uncertainties from Tmax and Tmin will remain attached to each component. And, once again, according to Equation 10 those contributions to uncertainty will add. Dividing by the number of components, as above, will neither add or subtract from the total u_c(y). Therefore the uncertainty of the base used to derive the anomaly will be the sum of the uncertainties of the individual components. When you subtract the current mid-range temp from the average mid-range temp it ADDS to the total uncertainty from the base.

The uncertainties for T1+T2 and T1-T2 both add. They don’t subtract. And (ẟf/ẟn)u(n) still remains 0.

He thinks if he can find ANY examples of constants being subtracted during a measurement, his nonsense is vindicated. Just like he was doing with temperature averages.

RickC,

Elsewhere, I have often used the bathroom scales as an analogy for explaining measurement.

One electronic type allows the user to put the instrument on the floor, then do something to activate an automatic zero setting. Step on the scales and note your weight, time after time. (Optionally plot your weight into a time series and self-flagellate.)

The zero set process is similar to subtracting a mean T from a series to get an anomaly. Some commenters think you can improve uncertainty by doing this, some think you can ignore that the zero setting mechanism has its own uncertainties that cannot be thrown away. And more.

Then there are those who try to use the bathroom scales to weigh a letter for postage calculation. (I won’t describe what a ‘letter’ is, see Wiki). This useful analogy with T observations can raise many more sub-topics of total uncertainty, but not from me here this time.

Geoff S

Even the most expensive bathroom scales are sensitive to where you place your feet, to the front of the scale, to the back, or to the sides. It can easily make a 1lb difference or more.

Zeroing the scale won’t eliminate this systematic uncertainty.

“The systematic error component of MU is, by definition, unknown.”

Somehow this point always seems to escape those trying to justify the GAT as a valid metric!

Congrats, you found something new in the GUM to cherry pick.

bdgwx and Rivk C,

Please recall that this article is based on the question –

“If a person seeks to know the separation of two daily temperatures in degrees C that allows a confident claim that the two temperatures are different statistically, by how much would the two values be separated?”Would you and Rick C give your clear answers to that question at some stage to focus the discussion? Appreciated Geoff S

Geoff Sherrington said: “Would you and Rick C give your clear answers to that question at some stage to focus the discussion?”Sure. It’s a great question. We should be able to use the GUM to answer it generally under the assumption that “different statistically” means abs(y) > 2*u_c(y). And by “daily temperature” I assume you mean (Tmin + Tmax) / 2.

Our measurement model will be:

y = f(Tx_min, Tx_max, Ty_min, Ty_max)

y = (Tx_min + Tx_max) / 2 – (Ty_min + Ty_max) / 2.

Using the NIST uncertainty machine.we can see that u(y) = 0.5.

So we need the separation to be greater than 1.0 C.

Geoff Sherrington said: “Would you and Rick C give your clear answers to that question at some stage to focus the discussion?”bdgwx said: “So we need the separation to be greater than 1.0 C”I forgot to mention that I was using u(T) = 0.5 C uncorrelated for that.

Anyway here is the general analytic derivation using the GUM.

Our measurement model will be:

y = f(Tx_min, Tx_max, Ty_min, Ty_max)

y = (Tx_min + Tx_max) / 2 – (Ty_min + Ty_max) / 2.

Therefore the partial derivatives are:

∂f/∂Tx_min = 0.5

∂f/∂Tx_max = 0.5

∂f/∂Ty_min = -0.5

∂f/∂Ty_max = -0.5

Starting with GUM equation 10 assuming no correlation:

u_c^2(y) = Σ[(∂f/∂x_i)^2 * u(x_i)^2, 1, N]

Assuming:

u(T) = u(Tx_min) = u(Tx_max) = u(Ty_min) = u(Ty_max)

Then:

u_c^2(y) = 0.5^2 * u(T)^2 + 0.5^2 * u(T)^2 + -0.5^2 * u(T)^2 + -0.5^2 * u(T)^2

u_c^2(y) = 0.25 * u(T)^2 + 0.25 * u(T)^2 + 0.25 * u(T)^2 + 0.25 * u(T)^2

u_c^2(y) = 0.25 * (u(T)^2 + u(T)^2 + u(T)^2 + u(T)^2)

u_c^2(y) = (1/4) * (4 * u(T)^2)

u_c^2(y) = u(T)^2

u_c(y) = sqrt[u(T)^2]

u_c(y) = u(T)

2*u_c(y) = 2*u(T)“∂f/∂Tx_min = 0.5

∂f/∂Tx_max = 0.5

∂f/∂Ty_min = -0.5

∂f/∂Ty_max = -0.5″

The contribution of each contribution is actually (ẟf/ẟx_i)u(x_i).

The partial derivative is actually 1. u(x_i) = 0.5

“y = (Tx_min + Tx_max) / 2 – (Ty_min + Ty_max) / 2.”

For a DAILY mid-range temperature the model woudl be

y = (Tx_min + Tx_max)/2

u(Tx_min) = (ẟf/ẟx)u(x) = (1)0.5 = 0.5

u(Tx_max) = (ẟf/ẟx)u(x) = (1)0.5 = 0.5

u_c(y) = sqrt[ 0.5^2 + 0.5^2) = sqrt( 0.5) = 0.7

2* u_c(y) = 1.4

When you average all the mid-range values together to get a base to use in calculating anomalies per Eq 10 from the GUM you get

u_c(y)^2 = Σ (ẟf/ẟx_i)^2 u^2(x_i)

ẟf/ẟx_i = 1 except for ẟf/ẟf/ẟn = 0

u^2(x_i) = 0.7^2 = 0.5

u_c(y)^2 = (n*0.5)^2

u_c(y) = sqrt[ (n*(0.5)^2 ] = 0.7 * sqrt(n)

So the significance factor would be

2 * (0.7 sqrt(n) )

If n = 1000 then you get 2(0.7 * 32) = 44

Thus the both the uncertainty and the significance factor both outweigh any anomaly change we’ve seen for the GAT.

Pretty sure I did the math right. Any corrections?

This number looks familiar!

Is this true when the 30-year time-series has a trend?

CS said: “Is this true when the 30-year time-series has a trend?”Yes. And I want to make sure it is perfectly clear that B is a time invariant systematic effect that contaminates all monthly absolute global average temperature values.

Don’t hear what isn’t being said. It is not being said that all systematic effects are time invariant. It is not being said that there isn’t station specific systematic effects either time invariant or not. It is not being said that there is not random effects.

Again…anomalization has advantages, but it is not a panacea. You trade the removal of the time invariant systematic effect B for losing the absolute temperature and increasing the random component of the uncertainty.

And now you are back to telling bald-faced lies.

Since when is B time invariant? If it was time invariant then why the need to do homogenization? What is the random uncertainty of a digital measuring station as opposed to its systematic uncertainty?

“The point is that if you convert to anomalies via ΔXj = (Xj + B + Rj) – Σ[Xi + B + Ri, 1, N] / N the B term cancels completely.”

This can only happens if you are measuring the SAME THING using the SAME DEVICE!

And even then you can’t be assured that the systematic bias is equal for both measurements. You simply don’t know that. The systematic bias may be dependent (and probably is) on where you place your feet on the scale. It may be dependent on the temperature in the house. It may be dependent on the humidity in the house. Depending on the hysteresis of the scale it may even depend on the weight of someone who used the scale before you!

The issue is that you don’t KNOW. Therefore you simply can’t assume it all cancels except under controlled laboratory conditions!

He refuses to acknowledge the truth.

I’m reminded of a quote from a Tom Cruise movie!

Thank you, Rick C,

This is an excellent example of the high quality type of comment that I was hoping for, reflecting experience. Part 2 in a few days is the more mathematical part. I hope that you comment again.

One gains an impression here that few who comment have read enough of the GUM. Geoff S

Yes he does, and it is complete and total nonsense. He has no experience or training in real world measurement metrology.

Perfect for the BOM !

You can’t, bgwxyz has no idea what he is yapping about.

Quite so. It stands to reason that if the individual temperature records have a +- 0.5C uncertainty then the anomaly between such records is like uncertain.

Not necessarily. Read NIST TN 1297 that Geoff linked to. Uncertainty is composed of a systematic effect and a random effect. Anomalization will remove the time invariant portion of the systematic effect. The math proving this is quite simple. I posted it above.

WRONG.

Yep. bdgwx assumes that the uncertainty of the average is the same as the uncertainty of a single individual component.

When you are combining unlike thigs the uncertainty of the average is *NOT* the uncertainty of an individual component but a sum of the uncertainties of *all* of the individual uncertainties.

You would think that a statistician would understand tha,t from a statistics viewpoint, that Variance_total = V1 + V2 + … Vn when combining individual, random components. The variance of individual random measurements of different things can be assumed to be the uncertainty interval of each component. Thus the variances add, meaning the uncertainties add.

[note: each individual measurement component is a random variable with a population size of one and a variance of +/-u]

Yes one should expect such, but in climastrology, they have vested interests that need the “errors” in the trend lines be as small as possible, so they hand wave and make stuff up as they go along.

WRONG!bdgwx ==> The claimed uncertainty for anomaly values…..

They reduce the uncertainty by a whole order of magnitude, based on the same uncertain measurements….

That is not possible in the real world — only in the magical world of CliSci.

The article you cited says it is possible.

Baloney

It’s right there in the article Kip linked to. 0.5 C for absolute values and 0.05 C for anomaly values.

Temperature (and real world observation) uncertainties are not the same thing as mathematical probability uncertainties and should never be used together as if they were.

That directly contradicts what NIST Technical Note 1297 and GUM JCGM 100:2008 say and which Geoff linked to in his article.

More garbage, it is YOU who does not understand that which you attempt to read.

In his dreams

Kip was using the value stated by GISTEMP!

Even using their value the result gives an uncertainty interval that overwhelms the differences trying to be identified!

Without wasting time looking at the details, I suspect the 0.05 is the SEM which is interval within which the population mean may lay. That IS NOT the uncertainty in measurements.

From:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2959222/#

You obviously don’t know what a statistical descriptor is. It is a numerical value that describes the data you are summarizing. That is why you use the SD of the data and not the SEM which is only the standard deviation of the sample means distribution.

You’ll never get bdgwx or bellman to admit to this. They are too steeped in clisci religious dogma.

Bullshite.

That paper is utter garbage. You can’t measure have an uncertainty of +\-0.15 °C in 1880 thermometer that has gradations of 1C. You morons.

The ±0.15 C figure is not that of the uncertainty of the thermometer. It is the uncertainty of the measurements from based on the measurement model defined by GISS.

Can anyone parse this into English. I failed…

This has been argued about many times. As I understand it,

Laplace produced a mathematical proof more than 200 years ago for the special condition: multiple measurements of one thing, when those measurements meet the conditions of the proof.As far as applying the idea to measurements of many different things with different instruments under different conditions, not meeting the conditions of LaPlace’s proof, only assertations have been presented, not unlike the assertion that people in hell want ice water.Perfect!

Would you mind posting a link material discussing this?

“” The standard version of the central limit theorem, first proved by the French mathematician Pierre-Simon Laplace in 1810, states that the sum or average of an infinite sequence of independent and identically distributed random variables, when suitably rescaled, tends to a normal distribution.””

“” Laplace and his contemporaries were interested in the theorem primarily because of its importance in repeated measurements of the same quantity. If the individual measurements could be viewed as approximately independent and identically distributed, then their mean could be approximated by a normal distribution.””

https://www.britannica.com/science/central-limit-theorem

Jim Gorman,

Jim,

Thank you again. I quoted that Laplace assertion near the end of my essay above.

Geoff S

“While the absolute value has an uncertainty of around ±0.5 C the anomaly values are around ±0.05 C for recent decades”

This, along with the rest of your posts in this subthread are malarky!

“ΔT = (Tx + B) – (Tavg + B) = (Tx – Tavg) + (B – B) = Tx – Tavg”

The problem is that Tavg + B TOTALLY UNDERESTIMATES THE UNCERTAINTY OF THE AVERAGE!

The uncertainty of the average GROWS as you create the average since you are combining different thigs! It is *NOT* the same as the uncertainty for an individual measurand!

This is nothing more than the typical garbage of “uncertainty disappears when you average”. The average is made up of DIFFERENT OBJECTS. Those measurements cannot be assumed to create a identically distributed distribution in which uncertainty cancels out to give a True Value. When you combine different things you must *add* the uncertainties, either directly or by root-sum-square.

Thus your formula should be ΔT = (Tx + B) – (Tavg + nB) where n is the additive sum of the uncertainties of the individual components making up the average.

The uncertainties of the individual components propagate through to the average. It is just that simple. You simply cannot have less error in the average than you do in the individual components making up that average.

A group of 100 2’x4′ boards, each with an uncertainty of 1/8″ does not magically give you an average length with an uncertainty of 1/80″ of an inch. First off, he average itself is meaningless when you are dealing with different measurands. Secondly, the uncertainty distribution is not guaranteed to be identically distributed.

“A group of 100 2’x4′ boards, each with an uncertainty of 1/8″ does not magically give you an average length with an uncertainty of 1/80″ of an inch.”

The uncertainty of each board is not under discussion. The uncertainty of the average of all of them is. Mainly for trending.

Trend the 125 averages. You will find that, even given the coarse 1/8″ uncertainty of each board length to what was intended. that trend is 0.00100 +/- 0.00001 inches per every stack of 100. I.e., a ratio of average/standard deviation of ~100. If I graphed that line segment, and it’s error bands you would only be able to see the line…

How exactly do you propose to ADD one mil onto the end of a sawed board, blob?

Trendology at its finest.

Reading comprehension. My first instruction was o buy a forest, so Mr. Gorman could compile 125 stacks of 100 boards/stack.

Hey! sqrt(125*100} is

veryclose to the ratio of the trend to it’s standard error.. Huda’ thunk it?But you mke a good point. You give Mr. G so many fact free attaboys that you must know him better then I. I figured that he was the kind of guy that needed lots of boards near by., for his Davidian compound. But maybe not. So, if he merelyrepeated 125* over, he would end up with the same trend and the same trend/standard error.

startedwith boards an eighth of an inch longer, measured, shaved 0.001″ from each, remeasured, andAdmittedly lots of shaving, measuring, data gathering and evaluating. But you could help. It’s not as if either of you are busy….

“So, if he merely

startedwith boards an eighth of an inch longerwho said they were longer?

“shaved 0.001″ from each”

How do you shave 0.001″ from anything when your measuring device has an uncertainty of 1/8″?

“he would end up with the same trend and the same trend/standard error.”

What’s the trend got to do with the average and its uncertainty?

You are doing the very same thing the clisci folks do. Equating resolution with uncertainty!

Try again!

Word-salad blob digs deep and hits loon-ville.

What’s the trend got anything to do with this?

“ that trend”

If your measurement uncertainty is 1/8″ how are you going to add 1mil to each board? And what difference does the *trend* make!

You don’t use the trend in using the boards to build a stud wall or a beam spanning a foundation!

I’ll stand by my statement: “A group of 100 2’x4′ boards, each with an uncertainty of 1/8″ does not magically give you an average length with an uncertainty of 1/80″ of an inch.”

You haven’t disproved that in any way, shape, or form.

“You don’t use the trend in using the boards to build a stud wall or a beam spanning a foundation!”

I agree. But your “use” is irrelevant to the parameters – averages and trends – under discussion. OTOH, my example is almost perfectly relevant.

“I agree. But your “use” is irrelevant to the parameters – averages and trends – under discussion. OTOH, my example is almost perfectly relevant.”

The average is *NOT* a trend. The entire thread is about the average!

And the uncertainty of the average is *NOT* the standard deviation of sample means.

Nor can the average have an uncertainty of 1/80″ when the components have an uncertainty of 1/8″

if Avg = (x1 + x2 + … + xn) /n

then the uncertainty of Avg is:

u(Avg)^2 = (u(x1)/x1) + (u(x2)/x2)^2 + … + (u(xn)/xn)^2 + (u(n)/n)^2

u(n)/n = 0

so

u(Avg)^2 = (u(x1)/x1) + (u(x2)/x2)^2 + … + (u(xn)/xn)^2

Uncertainty GROWS.

Each of x_i is an independent, random variable with a variance (uncertainty) of 1/8″.

When adding independent, random variables you add variances.

The exact same thing happens.

σ(Avg)^2 = (σ_x1)^2 + (σ_x2)^2 + … + (σ_xn)^2

It truly *IS* just that simple.

Only in alt.world does more data result in more uncertainty. This is truly down the rabbit hole stuff.

But it could be worse. This QAnon version of statistics is only passed back and forth under the rock, as none of it’s adherents venture out from it.

blob reveals his real lib-tard TDS feathers.

What a surprise.

“Only in alt.world does more data result in more uncertainty. This is truly down the rabbit hole stuff.”

From the GUM:

u_c^2(y) = Σ (∂f/∂x_i)^2 u^2(x_i) evaluated from i=1 to i=N

Thus the greater the value of N, the the more uncertainty terms (i.e. x_i terms) you have and the larger u_c^2(y) becomes.

See Sec 5.1.2, Equation 10.

Why don’t you come back and tell us that JCGM 100:2008_E, Sec 5.1.2 is wrong as all git out.

Dare you!

For the lurkers…GUM equation 10 can be used to adjudicate this debate.

u_c^2(y) = Σ[(∂f/∂x_i)^2 * u^2(x_i), 1, N]

Let

y = Σ[x_i, 1, N] / N

u(x) = u(x_i) for all x_i

Therefore

∂f/∂x = ∂f/∂x_i = 1/N for all x_i

So

u_c^2(y) = Σ[(∂f/∂x_i)^2 * u^2(x_i), 1, N]

u_c^2(y) = Σ[(1/N)^2 * u^2(x), 1, N]

u_c^2(y) = N * [(1/N)^2 * u^2(x)]

u_c^2(y) = 1/N * u^2(x)

u_c^2(y) = u^2(x) / N

u_c(y) = u(x) / sqrt(N)Both the partial derivatives ∂f/∂x and the final result can be confirmed with the NIST uncertainty machine. You can also verify this with your favorite computer algebra system.

“y = Σ[x_i, 1, N] / N

u(x) = u(x_i) for all x_i”

You just defined a relationship. N is part of that relationship, the rest of the components are x_i, not x_i/N.

rewritten the equation becomes (x_1 + x_2 + … + x_n) / n

Since n is a component of the relationship the n has to be added in the uncertainty calculation. It can’t just be ignored.

Thus your uncertainty equation becomes

u_c^2 = ∂f/∂x = ∂f/x_i with no 1/N factor.

The uncertainty of N is added to the uncertainty of the rest of the individual components and since the uncertainty of N is zero, it contributes nothing.

This is no different than any formula evaluated for uncertainty. e.g. m = kX/g. The uncertainty is not evaluated for k/g and X/g but for k, g, and X, the individual components.

The unending skipping record…you are beyond hope and help.

In the above derivation we let u(x) = u(x_i) for all x_i. Let’s relax this assumption and say u(x_i) is different for all x_i.

Let

y = Σ[x_i, 1, N] / N

Therefore

∂f/∂x = ∂f/∂x_i = 1/N for all x_i

So

u_c^2(y) = Σ[(∂f/∂x_i)^2 * u^2(x_i), 1, N]

u_c^2(y) = Σ[(1/N)^2 * u^2(x_i), 1, N]

u_c^2(y) = N * (1/N)^2 * Σ[u^2(x_i), 1, N]

u_c^2(y) = 1/N * Σ[u^2(x_i), 1, N]

u_c^2(y) = Σ[u^2(x_i), 1, N] / N

u_c(y) = sqrt[Σ[u^2(x_i), 1, N]] / sqrt(N)The astute reader will recognize this as the root sum square rule divided by the square root of N.

And it is WRONG, regardless of how many times you spam it.

Nope!

y = (x1 + x2 + … + xn/n

The components of the relationship are

x1, x2, …, xn, n

The components are *NOT* x1/N, x2/N, etc

Each uncertainty component is ∂f/∂x_i * u(x_i) as well as ∂f/∂N * u(N)

This is no different than calculating the uncertainty of the relationship of a spring

m = kX/g.

Each separate component is k, X, and 1/g, not k/g and X/g. You do not find ∂f/∂(k/g) and ∂f/∂(X/g). You find the uncertainty of each individual component, ∂f/∂k, ∂f/∂X, and ∂f/∂(1/g). Since “g” is a constant it does not contribute to the uncertainty.

“g” is a constant (its uncertainty is usually considered to be insignificant) just like N is a constant.

In fact, according to John Taylor’s tome, you should actually use relative uncertainties in this case, see his rules 3.18 and 3.47.

[u_c(y)/y]^2 = [ ∂f/∂x1 * u(x1)/x1]^2 + … + [ ∂f/∂xn * u(xn)/xn]^2 + [∂f/∂n * u(n)/n]^2

Thank you, Kip.

Gavin is already bookmarked for likely inclusion in Part Two.

Geoff S

Very welcome essay. I look forward to part 2.

One of the best articles on this website so far in 2022.

No author has had less need to tell us he was a scientist than this author.

I wonder how uncertainty with infilled numbers compares with uncertainty from homogenization? The infilled numbers can never be verified. There are no data. I don’t know how much infilling is done in Australia.

Moving outside of science:

The Australian average temperature is whatever the BOM says it is. Can the BOM be trusted? Perfect measurements mean nothing without honest scientists to compile them.

I mean that after homogenization, does the BOM also “pasteurize” their numbers with a “thumb on the scale”?

RG said: “The infilled numbers can never be verified.”Sure they can. Just perform a data-denial experiment. BEST uses jackknife resampling (which is a form of data-denial) to asses the uncertainty of their kriging technique. HadCRUTv5 uses stochastic ensembling to asses their combination of guassian regression and homogenziation technique.

RG said: “I wonder how uncertainty with infilled numbers compares with uncertainty from homogenization?”It’s a good question. I see two possibilities. First, use a dataset that performs infilling but not homogenization (like BEST) and compare it to a dataset that performs homogenization but not infilling (like HadCRUTv4). Second, use a dataset performs both homogenization and infilling (like HadCRUTv5-infilled) and compare it to a dataset that performs only homogenization (like HadCRUTv5-noninfilled).

Using the later option I see that from 1979/01 to 2022/06 the average of the infilled/homogenized monthly uncertainties is 0.025 with a median of 0.024 whereas the average of the noninfilled/homogenized monthly uncertainties is 0.064 with a median of 0.062. We can thus conclude that infilling reduces the monthly uncertainty by about 0.04.

There are many examples of data denial experiments available. Lupu et al. 2012 is one such example. They answer the question of how well a 3D-VAR or 4D-VAR observing system performs when it is challenged by denying it data that it would typically ingest. It might be a good place to start for those who are not familiar with the data denial class of experiments.

A pro-warming biased BOM employee is tasked with inventing numbers for infilling. The employee believes in a coming global warming crisis and his organization has predicted a coming global warming crisis. But that bias will have no effect on the numbers used for infilling? Don’t make me laugh.

When I use the word “bias” I’m talking about a systematic error as defined in NIST TN 1297 or JCGM 200:2012 being E = Et + Er where Es is systematic error, Er is random error. In other words it is error minus random error. I’m not talking about cognitive biases which is a completely different thing.

Total clown show.

Admission of data fraud NOTED.

Both honored more in the breach than the observance.

Note the qualifier “legitimate” in the “dissent” principle, a smokescreen for providing the climate fear mongers license to simply deem skeptical disagreements to not be “legitimate.”

And of course the second is all about ensuring that “federal (pseudo) scientists” should be free to spew their alarmist claptrap to the press. Should any scientist express remotely skeptical views, a shitstorm would certainly ensue!

“Uncertainty of routine datasets.”

This is an important topic. For future comparison. We should get it right going forward.

Meanwhile, there are 50 million daily measurements over ~120 years — all ‘uncertain — in the ushcn.tmax RAW.

Thought experiment:

Visualize a single station. It has measured TMAX 48,910 times since 1888. True, the accuracy-to-perfection might have been off consistently by x degrees, plus or minus. Worse, it might have been on a swerving path up and down per accuracy-to-perfection.

Yet … might there be value in observing the trend of this station? If accuracy has been random and swerving, might not the over/under wash? Might not the variance be trivial in the first place?

For our visualized station, even though measurements have been ‘uncertain,’ the Climate Crisis would poke its nose out. Would raise its hand. Would make its factual basis felt. Would be visible in the graph of its own readings.

Now, one by one, (no blending or averaging across stations for this thought experiment) examine 1218 actual stations’ graph of TMAX RAW. Most, if not all, would display the spike claimed by NOAA et al.

They do not.

Objection already handled: “that’s only for the USA, a small portion of the earth.” Really? The Climate Crisis is raging all around the world, where the historical record of direct measurement is sparse, uncertain (/s) or completely absent, yet in the USA where 50 million measurements were made, somehow in our lovely little continent, the Climate Crisis has avoided being detected in the trends of individual stations?