By Geoff Sherrington.
Scientist, Melbourne, Australia.
Disgraceful conduct from the Australian Government’s Bureau of Meteorology, BOM, is alleged and documented. This article is short, but most of its links are quite long because they are thorough. Please persevere, there are many hidden gems in this article. Use it as a reference library if you wish.
Please digest a 2006 email from Dr David Jones, Senior Climatologist at BOM to Prof Phil Jones, University of East Anglia. Also at Climategate #0601.txt .
… “ Fortunately in Australia our sceptics are rather scientifically incompetent. It is also easier for us in that we have a policy of providing any complainer with every single station observation when they question our data (this usually snows them) and the Australian data is in pretty good order anyway. Truth be know (sic), climate change here is now running so rampant that we don’t need meteorological data to see it. Recent polls show that Australians now rate climate change as a greater threat than world terrorism.”
From the start, BOM were unwilling to exist outside their own cocoon of beliefs.
For me, Here is when it all began. Prof Phil Jones to my geologist colleague Warwick Hughes, early 1990s, about his Australian BOM observations – “Why should I make the data available to you, when your aim is to try and find something wrong with it.”
Almost all of many BOM criticisms by Australians have essentially the same starting point, the “raw” daily Tmax and Tmin historic temperature observations available to the public from the BOM web site Climate Data Online, CDO. Photos of some original observation sheets for Melbourne 1859 and 1860 are here. Note the frequent corrections in pen and ink. A broad question is why such raw data are seldom used for analysis.
BOM QUOTE. “Yes—the Bureau provides the public with raw, unadjusted temperature data for each station or site in the national climate database, as well as adjusted temperature data for 112 locations across Australia….
…. The Bureau does not alter or delete the original temperature data measured at individual stations.”
Are the “raw” data really raw? Examples of manipulation are here. (Please pardon my typo dates of JULY 2014 and JULY 2015 in the first figure. They should be1914 and1915.)
There are more claims of BOM adjusting data. “In December 2009 the BoM warmed the RAW mean minimum and maximum temperatures by about half a degree from the temperatures that had previously been recorded on the BOM website database for August 2009.”
Then there is metrication, because the originals were in Fahrenheit, when now we use Celsius. BOM temperatures went metric on 1st September 1972. Fahrenheit degrees were converted to Celsius. This introduces an error when there are too few significant figures carried.
Another cause to doubt these BOM temperatures comes from two different and differing compilations. First, the Commonwealth of Australia Yearbooks for 1953 and 1954; second, a CSIR (before CSIRO) compilation of earlier temperatures printed in 1933.
There have been changes of instruments over the decades. At first, put simply, there were Liquid-In-Glass thermometers, LIG, in large Stevenson screens. LIG have measurement uncertainty errors of their own. Later, the screens became small and smaller, the agreement between screens was studied and again here, claiming a false 0.5 ⁰C warming; the thermometers became electronic with possibly different response times to changes.

The examples so far raise “uncertainty”. This observer was uncertain between 81.7 and 81.8 ⁰F. The BOM is uncertain if a change in December 2009 was natural or man-made. Uncertainty comes from changes of instruments and locations. There is parallax uncertainty when reading an LIG thermometer, etc etc.
An estimate of the total measurement uncertainty of all of the BOM historic temperature data is absent, but needed. The classic reference for uncertainty is from the International Bureau of Weights and Measures, BIPM in France, under Guide to Uncertainty in Measurement, GUM. BOM has stated its compliance with GUM. Importantly, GUM has no provision for the use of statistics with data that are guesses, as much BOM work like ACORN-SAT is alleged to be.
BOM has its uncertainty methods in Report ITR 716 of March 2022 quoted in reply to my 2018 question “If a person seeks to know the separation of two daily temperatures in degrees C that allows a confident claim that the two temperatures are different statistically, by how much would the two values be separated?” Getting this answer was like pulling teeth out. The answer was a non-answer because of a BOM rider that “This is not an estimate of the uncertainty of the ACORN-SAT’s temperature measurement in the field” – which is what was requested.
This cascade of often uncalculated uncertainties has caused production of adjusted temperature sets, although almost all were derived from the originals. BOM started in public with a “High Quality” set, then added 6 versions of Australian Climate Observation Reference Network – Surface Air Temperature, ACORN-SAT.
ACORN-SAT uses time series plots of temperature versus time and/or customary statistics to detect break points, some of which have plausible causes in metadata, others being “statistical” without a known cause. Usually, there is no known way to determine the magnitude of an adjustment. Recourse is made to inferred adjustments from the patterns of other stations, nearby to distant.
Note: There is an exception. To the author’s knowledge, the only type of adjustment free from subjective guesswork is found here on Bomwatch.com blog. It is based on water, like rain or lawn sprinkler maintenance, cooling weather station sites by evaporation. An outcome of this method by colleague Dr Bill Johnston is that few if any corrected Australian stations show any significant warming over many decades to now.
A typical conventional time series plot follows, showing Melbourne.

The number in the box before ”x” is the trend in ⁰C/year for maximum (blue) minimum (tan) temperatures. Visually, Tmax might have a break point at about 2010. There was a station change in year 2014, so it is accepted that some type of adjustment is warranted but how much adjustment? Author’s note: I do not endorse the linear least squares procedure for analysis of these time series graphs. I show it here because it is in widespread use by others.
In this case, comparisons with numerous stations show Melbourne as the odd man out and by how much. (Good luck with inferring a magnitude for adjustment from nearby stations).
This graph suggests that Tmax has increased at 0.7 ⁰C per century and Tmin at 1.53 ⁰C per century since 1856. These are not the numbers used for calculation of Australia’s national global warming. Adjusted ACORN-SAT numbers are used. The result is that the official claim is that Australia has warmed by 1.51 +/- 0.23 ⁰C in the last 134 years. This is claimed to be wrong. The better, unadjusted estimate is less than 1 ⁰C over that time.
Adjustments and cherry picking are used to create stories from this CDO data. Natural events like heatwaves are claimed to be becoming “catastrophic”. Two heatwave offenders, authors of a highly-cited paper are Lisa Alexander and Sarah Perkins-Kirkpatrick. Australian scientist Joelle Gergis as seen by Stephen McIntyre of Climate Audit blog is another part of this trio of young female scientists whose work often shows emotion and cherry picking, for example using 1950 or so as a start date for analysis.
Numerous Australian citizens and scientists have objected to the stories because they are not based on solid science. BOM has chosen the anti-science technique of ignoring dissenting articles. Many have been linked here. My apologies are extended to any authors who are not linked, by my error, but who know they should be on the list of dissidents.
The consequences of poor science by BOM include Australian political adoption of “Net zero Carbon” (whatever that means) leading to a loss of electrical generation from cheap and reliable hydrocarbon combustion. The Australian economy is suffering as other countries such as the US are distancing themselves from this poor science.
BOM should at least attempt to face and answer the many criticisms in this article. The BOM silence is deafening. Why do they decline to answer even a single slip-up???
My best answer is a quote from the movie “Dr Strangelove”. The full movie title is “Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb”. You are invited to drop the final “b” for relevance.
Air Force General Turgidson is in the Pentagon War Room with President Muffley just before a wayward nuclear bomb from an SAC B-52 is likely to trigger the Doomsday Machine and end Life on Earth.
Turgidson: The duty officer asked General Ripper to confirm the fact that he had issued the go code and he said, “Yes gentlemen, they are on their way in and no one can bring them back. For the sake of our country and our way of life, I suggest you get the rest of SAC in after them, otherwise we will be totally destroyed by red retaliation. My boys will give you the best kind of start, fourteen hundred megatons worth, and you sure as hell won’t stop them now. So let’s get going. There’s no other choice. God willing, we will prevail in peace and freedom from fear and in true health through the purity and essence of our natural fluids. God bless you all.” Then he hung up. We’re still trying to figure out the meaning of that last phrase, sir.
Muffley: There’s nothing to figure out General Turgidson. This man is obviously a psychotic.
Turgidson: Well, I’d like to hold off judgment on a thing like that, sir, until all the facts are in.
Muffley: (anger rising) General Turgidson, when you instituted the human reliability tests, you assured me there was no possibility of such a thing ever occurring.
Turgidson: Well, I don’t think it’s quite fair to condemn a whole program because of a single slip up, sir.
- THAT famous email explained and the first Volunteer Global Warming Skeptic « JoNova
- Climate Data Online – Map search
- https://www.geoffstuff.com/origts.jpg
- Uncertainty Of Measurement of Routine Temperatures–Part Three – Watts Up With That?
- Australian Bureau of Meteorology temperature database bug
- Temperature roundings, metrication in Australia
- Australian historic and modern temperature averages
- https://doi.org/10.3390/s23135976
- Model simulations don’t reflect the climate of Sydney and Melbourne
- (PDF) A Preliminary Investigation of Temperature Screen Design and Their Impacts on Temperature Measurements
- Another Temperature Bias: The Shrinking Stevenson Screen = Warming – Watts Up With That?
- Analysis of Parallel Tmax Data from Brisbane Aero | kenskingdom
- Thermometer Equivalence – Jennifer Marohasy
- https://doi.org/10.59161/JCGM100-2008E
- http://www.geoffstuff.com/bomitr.pdf
- https://www.geoffstuff.com/bomquest.docx
- Australian Climate Observations Reference Network – Surface Air Temperature Dataset – Dataset – Data.gov.au
- https://www.geoffstuff.com/querycoldst.docx
- About – www.BomWatch.com.au
- https://www.geoffstuff.com/halfwarm.docx
- https://www.geoffstuff.com/nothot2024.docx
- On the Measurement of Heat Waves in: Journal of Climate Volume 26 Issue 13 (2013)
- Joelle Gergis, Data Torturer « Climate Audit
- Update on Australian NetZero efforts – Climate Etc.
- Trump’s EPA revokes the “endangerment finding” on greenhouse gases, a major reversal in climate policy. Here’s what to know. – CBS News
- Dr. Strangelove | Summary, Characters, & Facts | Britannica
I found Melbourne to be on the cool side but few things are as insane as Canadian alarmists worrying that it might warm up a degree or two.
” few things are as insane as Canadian alarmists worrying that it might warm up a degree or two.”
It’s not an unreasonable concern. In the Canadian Arctic, permafrost acts like the ground’s natural structural cement. When warming causes it to thaw, the loss of ground ice reduces soil strength and leads to uneven settling. Because much northern infrastructure was engineered assuming permanently frozen ground, this is already creating problems and is likely to worsen over time.
https://www.canada.ca/en/services/policing/emergencies/preparedness/get-prepared/hazards-emergencies/permafrost.html
Re: “Because much northern infrastructure was engineered assuming permanently frozen ground, this is already creating problems and is likely to worsen over time.“
Turgidson: Well, I don’t think it’s quite fair to condemn a whole program because of a single slip up, sir.
Permafrost doesn’t melt when it gets colder.
Interesting that you post a graph from a Blog called “sunshinehours”, that brings up zero references on Google, though I managed to find it via a single post on a blog by a certain “sunshinehurs1”
As regards his graph he/she says …
“I download monthly data from the Environment Canada (EC) websites. EC treats some stations as special and calculates anomalies against what they call Normals. As of today the Normals are calculated for 1971-2000 and I am using those special stations. “
So what exactly make the special stations “special”?
You do know that if the balance of stations (due to closures) through the period shifts north then there will be a cold bias introduced?
And there were a lot of station closures in that period.
“Since 2000, Environment and Climate Change Canada (ECCC) has continued to transition from manual, human-staffed weather stations to automated systems, resulting in the closure or conversion of numerous traditional climate monitoring sites”
EC says …. “Long-term observations from these stations, and others, constitute an important part of Canada’s climate record. The 30-year averages of these climate variables such as temperature and precipitation are calculated from these station observations. These 30-year averages are called Climate Normals and they are used to describe the average climate conditions of a particular location.”
So it seems that plot is using stations outside of the ones that EC uses as it’s database for the calculation of the 30 yr climate of Canada, and is in fact a plot of these “special” stations’ differences from the “normals”.
It is not therefore a true spacially oriented, long term station record from well maintained stations.
IE: it is bogus
This is the true EC annual mean temperature trend for Canada ….
BTW: I do expect your usual accusation to be made here.
“Since 2000, Environment and Climate Change Canada (ECCC) has continued to transition from manual, human-staffed weather stations to automated systems, resulting in the closure or conversion of numerous traditional climate monitoring sites”
It is known that automated systems regularly cause higher temperature reading.
And as you say… close cooler stations and use stations in urban areas…
… you get a warming trend.
Use stations that are stable and have reliable enough data.. you get a cooling trend.
Why should Canada be warming, when USA has been cooling.. ?
“Use stations that are stable and have reliable enough data.. you get a cooling trend.”
That’s just the point – EC does but that graph is of stations that aren’t !
“The 30-year averages of these climate variables such as temperature and precipitation are calculated from these station observations. These 30-year averages are called Climate Normals and they are used to describe the average climate conditions of a particular location.”
And not the ones you liked taken from some anonymous blogger from a hard to find corner of the internet.
BTW: would you care to have a chat with the Inuit? ……..
https://indigenouspeoplesatlasofcanada.ca/article/climate-climate-change/#:~:text=Unprecedented%20rates%20of%20summer%20sea,stretching%20back%20thousands%20of%20years.
“Unprecedented rates of summer sea ice loss, reduced sea ice in the winter, ocean acidification, temperature and sea level rise, melting permafrost, extreme weather events and severe coastal erosion undermine our ability to thrive in our environment. Rapid climate change is affecting our ability to access our country foods (wild foods harvested from our lands and waters) at a time when too many families are already struggling to put food on the table. There is an increase in hazards and risks on ice, including increased incidents of Inuit falling through ice, some of which can be attributed to the warming environment and unpredictable weather. There have been studies showing that Inuit who are cut off from traditional seasonal activities like hunting and fishing suffer impacts to their mental health and sense of identity.
But of course you know better!
As for this baseless bias-bolstering myth of the US not warming …
Story Tip.
Any trend in Canadian “temperature” seems to be from “adjusted and homogenised” data…
…. and a major step change around 1998, when two things happened.
1.. The 1998 EL Nino
2.. Changes to AWS.
The shift in 1998 is NOT compatible with CO2 warming.
Joseph Hickey: “Is Canada Warming?” | Tom Nelson Pod #372
It should be noted that Ant’s data does not counter anything in the post I made.
In fact you can see the cooling from 1998-2015
And from our esteemed host Anthony Watss
Government report: Canadian climate data quality ‘disturbing’ – Watts Up With That?
And from the late Tim Ball
Another Model -vs- Reality problem – National Weather Offices: Canada, A Case Study With National And Global Implications. – Watts Up With That?
Bless 4 replies.
Must have got to him.
Didn’t read BTW, as I know it’s away with the fairies.
Or the rabbits.
BTW: This is a UAH TLT global map of T trends from 1978.
Notice that the contiguous US is in the 0.2 – 0.3 C/dec shading…
As is Alaska.
Are you goig to say that Mr Spencer’s satellite is seeing UHI?
Anthony,
There is little point for you by showing a temperature map as evidence of warming, on this thread whose central theme is the large uncertainty of such observations. That is an attempted diversion.
To keep with the theme, can you provide justification for officials who ignore this uncertainty on their way to crying catastrophe?
Geoff S
A more salient point is how the air becomes warmer than the permafrost such that it can melt the surface. Where does the heat come from? The permafrost itself? What about absorbed insolation increasing?
Of course they see UHI! Do you think the satellites have a filter to remove UHI?
So averages don’t work to give an accurate conclusion! Why do you think station closures and moves can’t introduce warm biases?
All I can think of is the adage about kneeling in a tube of ice with my head in a hot oven. My average temperature will be fine.
If averages have errors and biases, why would a “spacially oriented” average be any better?
You are missing the entire purpose of the essay. Topography does effect the microclimate significantly. Surrounding flora, even the height and species of grass under a station can effect measurements.
Spatial separation is not the savior of averages.
Uncertainty is the issue in measurements and how they are properly compared. The GUM says this at the very beginning.
Not once have you mentioned the issue of measurement uncertainty and how it is propagated throughout temperature analysis. Everything you post about spatial averaging is moot unless you address the underlying accuracy and resolution of the measurements being made.
Here is an uncertainty budget I have started for ASOS stations in the U.S.
Sum:
[0.333 + 0.001 + 0.053 + 0.003 + 0.084 + 0.030 + 0.013 + 0.084 + 0.003 = 0.604]
u_c = sqrt{0.604} = 0.777°C}
—
Expanded uncertainty (95% confidence)
U = k • u_c = 2 • 0.777 = 1.55°C
Why don’t you address how this affects your spatial averaging.
Are you sure that resolution uncertainty adds in quadrature? I was under the impression it is a type B uncertainty which is propagated separately, so that 0.029 forms a floor which is carried through. That would give u_c = 0.029 + sqrt (0.603)
Type A ad Type B measurement uncertainties are handled the same way by the GUM. The GUM considers them to be equivalent. It’s just the process used to quantify the values that differ.
I think I misunderstood what was being said in the NASA Measurement Uncertainty Analysis.
That dates back to 2010, and still references errors and uncertainties. It shows the errors as adding, but the combined uncertainty is handled as you showed.
In any case, it’s somewhat larger than 0.005.
This is the part of metrology that takes some getting used to. Uncertainty is a compilation of all the standard uncertainties involvedin a measurement.
If you use standard uncertainties, NIST and most textbooks require the expansion of the uncertainty by a factor of two.
It is not correct to just assume ±half-intervals is the only uncertainy. It is also not correct that a standard uncertainty less than the resolution allows one to state the measurement value to a resolution beyond what was measured.
It’s funny that you are the only person to question this when climate science routinely ignores the whole process. The assumption of random, Gaussian and cancels is pervasive.
I see Tim gave an answer.
But yes, I attempted to convert everything to a “standard uncertainty”, that is a standard deviation by estimating the distribution, and dividing by the appropriate factor. An example is assuming resolution of 0.1 is a uniform distribution with a half interval of ±0.05 and dividing by the √3 gives 0.029.
This doesn’t change the fact that significant digit rules preclude rounding temperatures to a value below the tenths digit except for interim calculations.
I’m not entirely sure about that. It can be quite useful to know that the average is somewhere between the measured values e.g. the average of 8 measurements of a big-end journal of {1.499″, 1.499″, 1.500″, 1.500″, 1.500″, 1.500″, 1.499″, 1.499″} reported as 1.4995″ +/- 0.0005″ rather than 1.500″.
Going off on a bit of a tangent here, you may want to investigate whether those measurements were around the mid-point between the lines on the micrometer thimble, or whether there is 0.001″ ovality or taper.
I suppose journal measurements could reasonably be regarded as an interim step in calculating bearing clearance, which is the object of the exercise.
You have kind of addressed the reason for going to intervals rather than plus/minus.
Under the old error paradigm, the mean was considered a true value. Not to be confused with the real true value. Under the uncertainty paradigm, the mean is the central value used to define an interval.
Here is the problem, you didn’t measure 1.5000 nor 1.4990. You can’t be certain what the last ten thousandths digit is. It could have been 1.5009 or 1.4995. The 1.499 measurements could have been 1.4999 or 1.4985.
In the long run, if you follow practice, you are going to cover all this in your report. That way someone reading it can rely on what is given.
One thing I can tell you is that if you quote an interval rather than a plus/minus value it is easier to explain that any single measurement in the future should fall into that interval and to not expect the central value to be the measurement.
That tends to be more of an issue with digital instruments than analogue. With a proper micromeer, you can see whether the reading is low or high. Of course, it can make rounding more of a challenge 🙂
Jeez, I hope 1.5009 doesn’t round to 1.500, or 1.4999 round to 1.499 😉
I know what you mean, though.
“I’m not entirely sure about that. It can be quite useful to know that the average is somewhere between the measured values e.g. the average of 8 measurements of a big-end journal of {1.499″, 1.499″, 1.500″, 1.500″, 1.500″, 1.500″, 1.499″, 1.499″} reported as 1.4995″ +/- 0.0005″ rather than 1.500″.”
You can perhaps put this argument forth when you are measuring the same thing multiple times using the same instrument under the same conditions. If you can show that the measurements form a random, Gaussian distribution then assuming the mean is the “best estimate” can possibly be justified, but it also requires the assumption that no systematic uncertainty exists. Violate any of these restrictions and going one digit further simply can’t be justified.
Remember the purpose of the uncertainty paradigm. It is to let others performing the same experiment determine if their measurements are a reasonable match to your measurement. When you quote your measurement past the resolution limits of your instrument you make it impossible for someone using a different instrument with the same resolution to know if their measurement matches yours.
You have ventured into the dimension of the “Great Unknown” by using a cloudy crystal ball.
We’re probably getting into the realm of what the meaning of “is” is.
You’re treating the mean as a measurement, when in reality it is just a summary statistic describing the central tendency of a set of numbers. Note that the median of an even number of numbers is interpolated to the mid-point between the middle 2 numbers.
Given the set {1.499″, 1.499″, 1.500″, 1.500″, 1.500″, 1.500″, 1.499″, 1.499″}, you have:
population size = 8
mean = 1.4995
median = 1.4995
mode = {1.499, 1.500}
standard deviation = 0.0005
Rounding these loses information.
From the GUM,
2.2.3, Note 3: “It is understood that the result of the measurement is the best estimate of the value of the measurand, and that all components of uncertainty, including those arising from systematic effects, such as components associated with corrections and reference standards, contribute to the dispersion.”
4.2.1 “In most cases, the best available estimate of the expectation or expected value μ_q of a quantity q that varies randomly [a random variable (C.2.2)], and for which n independent observations q_k have been obtained under the same conditions of measurement (see B.2.15), is the arithmetic mean or average q_bar (C.2.19) of the n observations:”
I would note that if your data is not random and Gaussian then the mean may very well not be the best estimate and nor would the median. The mode would be a far better estimate.
Rounding doesn’t lose any information that you actually know. The “best estimate” should convey what you actually know, not what you guess at when using a cloudy crystal ball. Stating the mean out to the ten-thousandths digit when you don’t actually know what that digit is does nothing except make others think your measuring device is better than it actually is.
If your resolution is only in the thousandths digit, then an interim calculation to the ten-thousandths digit is acceptable, but the final result should be rounded to the thousandths digit. In your case the measurement would given as 1.500 +/- .0005.
The statistical descriptors of mean and standard deviation are only useful in one specific case, a Type A experimental situation. In all other cases the best estimate is just that and nothing more. And the uncertainty interval is typically not the standard deviation of the measurement data but the propagated measurement uncertainties of the individual data items.
This is why I have advocated here for a long time that the 5-number statistical descriptor would give a much better picture of the temperature data than a mean and standard deviation.
Methinks you are attributing too much meaning to the mean. To paraphrase Popeye the sailor man, it is what it is, and that’s all what it is. The mean and median are measures of centrality, both of which work “best” under certain conditions. Neither necessarily have any physical meaning. As the saying goes, the average person has one testicle and one ovary.
You’re arguing the mean has physical meaning. All the mean means in this case is that the centre of the measurements is somewhere between the measurements.
Rounding up skews that to the next highest potential measurement.
Ideally, the full data set and metadata should be available, but the summary statistics I provided should provide sufficient information to the next person measuring that journal.
We know from the modes that there were an equal number of 1.499″ and 1.500″ measurements, but just the mean and s.d. tell us that probably half the measurements were each value.
I was a bit remiss, and should have provided the range {1.499″, 1.500″}
The mean doesn’t mean much by itself, but the combination of summary statistics does provide useful information.
You are the one that is implying that averaging, i.e. finding the mean, can add to resolution. That’s attributing a lot of meaning to the mean.
GUM: ““In most cases, the best available estimate of the expectation or expected value μ_q of a quantity q …. is the arithmetic mean or average q_bar (C.2.19) of the n observations:”
You have to start somewhere when setting standard methods and protocols.
GUM: “the best available estimate of the expectation”
The metrology standard, the GUM, *is* giving the mean a physical meaning when it comes to measurements.
Rounding up skews that to the next highest potential measurement.”
And what does rounding down do?
Rounding up and down to match the resolution of the instrument will also increase the precision of the progression of measurement values. Rounding doesn’t really skew anything, it only realizes the *known* information instead of guessing at “unknown” information.
The final significant digit should represent the highest expectation estimate of the true value within the resolution of the measuring instrument. Going beyond the resolution of the measuring instrument can only represent a guess. What happens if the mean is a repeating decimal? Where do you place the final significant digit?
Summary statistics are a tool for understanding your data. Extending the summary statistics into the realm of the Great Unknown doesn’t further the understanding of the data. Understanding the source and limitations of the data is necessary for proper application of the statistical summaries.
I can only give you an example from my son’s journey in getting his microbiology degree at university. His first-year university advisor told him “don’t worry about taking courses in statistics, just give your data to a math major for analysis”. This does nothing but result in the blind leading the blind. The microbiology major won’t fully understand what the statistical summaries are saying and the math major won’t understand why he can’t take the value of the mean out to the digit limits of the computing device.
GUM: “the best available estimate of the expectation”
The metrology standard, the GUM, *is* giving the mean a physical meaning when it comes to measurements.”
You blokes have just been arguing that the mean only applies to symmetric distributions 🙂
The mean is what it is, and that’s all what it is. The GUM is noting the use of the mean.
The GUM calls it an “estimate” rather than a “guess”. Yes, the measure of centrality can be expressed beyond the resolution of the measurements used to derive it. Not doing so leads to information loss by rounding up or down. It’s important to know that the centre is somewhere in the middle.
Strictly, it can be represented to the order of magnitude of the (sample|population) size, but in practice 1 extra s.d.
You absolutely have to show that the centre of the measurements lies between potential measurements if it does. Perhaps it should be recorded as 1.4995 +/- 0.0005, or perhaps as {1.4999, 1.5000}, but rounding in either direction skews the result.
Rounding down gives 1.499 +/- 0.0005, which loses 1.500.
Rounding up gives 1.500 +/- 0.0005, which loses 1.499.
Reporting to the resolution limit of the instrument provides a shorthand method of showing the uncertainty bounds of the readings.
Quite so. The summary statistics don’t care what the data set represents. They are just maths.
It is important to know what the underlying data represents, and the limitations of the data, when evaluating the results.
The mathematicians apply the formulae blindly.
The statisticians understand the limitations of those formulae.
The subject matter experts understand the limits of the underlying data.
“You blokes have just been arguing that the mean only applies to symmetric distributions”
It’s a basic truth. That’s when the mean equals the mode. If the mean doesn’t equal the mode then exactly what does the standard statistical descriptors of mean/standard deviation tell you about the distribution?
“The GUM calls it an “estimate” rather than a “guess”.”
GUM: “7.2.6 The numerical values of the estimate y and its standard uncertainty uc(y) or expanded uncertainty u_c(y) should not be given with an excessive number of digits. It usually suffices to quote u_c(y) and U [as well as the standard uncertainties u(xi) of the input estimates xi] to at most two significant digits, although in some cases it may be necessary to retain additional digits to avoid round-off errors in subsequent calculations.”
“excessive number of digits” includes increasing resolution using averaging.
“Not doing so leads to information loss by rounding up or down.”
You are still using the standard climate science meme of “all measurement uncertainty is random and Gaussian. You simply cannot use that assumption in the real world where different things are measured one time using different instruments and the measurements are put into a common data set. If nothing else sampling uncertainty will limit the certainty of what the mean actually is.
“You absolutely have to show that the centre of the measurements lies between potential measurements if it does. Perhaps it should be recorded as 1.4995 +/- 0.0005, or perhaps as {1.4999, 1.5000}, but rounding in either direction skews the result.”
You are caught in a catch-22. You didn’t give the actual measurement uncertainty of the data. So how do you propagate it? If it is greater than the extra significant digit then you simply don’t know what it is. This is the trap statisticians and climate scientists fall into when they think they can increase resolution past the limits of the measurements by using averaging. It’s the same logic as assuming that the only uncertainty is the sampling uncertainty and you can minimize it by increasing the number of samples. I.e, make the SEM smaller and smaller so you can justify adding resolution.
At a minimum, it’s half the resolution limit.
I think this is where we start working at cross purposes, and I think I’ve finally worked out why.
For statistical analysis purposes, the average is still an interim result as per GUM 7.2.6.
In most cases, it will be used in subsequent analysis.
Nevertheless, measurement uncertainties should be propagated at all steps, and really should be shown.
I’ve been busy and didn’t get to post a couple of things.
One of the problems that occurs with the mean is that the measurements used to calculate it are uncertain even if the numbers used are whole.
You posed 1.499 and 1.500 and used them to calculate a mean as if they were 100% accurate, i.e., 1.4995. But, each of those measurements have uncertainty which makes the mean itself uncertain. That is on top of any sampling uncertainty. And, a standard deviation itself has uncertainty because it is calculated from uncertain values. That the makes the standard deviation of the mean uncertain also. It is a large reason significant digits have such importance. Rounding to the nearest graduation on the measuring device helps overcome some of this along with proper treatment of significant digits.
I can’t find the paper I thought I had saved but it discussed some of this.
Maybe I’m just decades out of date, but the implicit resolution uncertainty is +/- half the lat quoted place – in this case 0.0005″.
I keep getting conflicting answers on what to report for the uncertainty of the mean; either the same as the measurements (so, 0.0005″), or the standard uncertainty (so, 0.001/sqrt(12))
I appreciate there are other sources of uncertainty, but was just trying to keep it to the basics.
/it occurred to me later that Tim is treating the average as a final measurement, while I’m treating it as an intermediate result.
Join the crowd. Read this and look for the section “Only Upper and Lower Limits”. This web site has some good training material.
Type A and Type B Uncertainty: Evaluating Uncertainty Components
Thanks. I’ll take a look at that.
“Only Upper and Lower Limits”
Unless I’m totally missing something, in the symmetric case half-width / sqrt(3) is the same as interval width / sqrt (12).
(2 * a) / (sqrt(4) * sqrt(3))
What is the advantage, apart from the fairly trivial calculation of the interval width, of dividing the half-with by sqrt(3) instead of consistently calculating the interval width and dividing by sqrt(12)?
It just seems rather inconsistent to use a special case to get the same result as the general case for minimal extra effort.
In any case, it appears the recommendation for the standard uncertainty of the average is the uncertainty interval of the measurements / sqrt (12)
Yes, confusing isn’t it. I try to keep in mind that what is being done is to develop a standard uncertainty that is common among different uncertainty categories. In the end, one must choose their method and discuss it when presenting a measurement. Climate science ignores the whole issue.
That’s a little harsh.
They wave their hands and say the uncertainty is less than 0.005 K 🙂
Rounding these loses information.
Ah, but is it information you really know? That is the question.
Is it actually a lack of precision? Will the next guy estimate the value the same as you? How about a digital readout? What will it read?
One of the reasons about rounding to the nearest graduation is that it provides a more reliable statement that others will likely observe regardless of their instrument.
Don’t let statistics fool you. The sample mean is only an estimate. The estimate is only a perfectly reliable estimate as n->∞.
Could your next reading be 1.498? Or, 1.5001? Could the mean move one way or another with a few more measurements? Uncertainty, uncertainty, too bad we can’t be perfect.
“The estimate is only a perfectly reliable estimate as n->∞.”
That is only true if you have a random and Gaussian distribution from multiple measurements of the same thing using the same instrument under the same conditions.
If you are using single measurements of different things from different instruments under different conditions, then the uncertainty interval grows as n gets larger since the uncertainties of the individual components add. In such a case the long shot sometimes wins, i.e. is the true value.
I simply said, “perfectly reliable estimate“. As n -> ∞), the range of measurements should also approach a valid estimate of the limit of what you can expect measurements to be. Of course if the distribution is Gaussian, then one can use standard deviation (or multiples of it) to estimate the range of measurements that fall within a given probability range. For example, one σ means ~68% chance that the next measurement will fall within that range.
It definitely is. It might be a bit fuzzy, but is the central value given the data set.
Precision or resolution?
In either case, the mean is a summary statistic calculated from the available data. If the data points have uncertainty, that should presumably be propagated to the mean (and median, mode and midpoint)
Statistical uncertainty is orthogonal to measurement uncertainty, and is expressed in the variance and standard deviation. The mean has no statistical uncertainty – it’s just a central estimate.
I should certainly hope so, given the same data set. The summary statistics may well be different for a different data set, which is as it should be.
It will read whatever it reads, . That may result in a different data set, and different summary statistics.
Readings should definitely be rounded. Summary statistics, however, are derived from those readings and have to abide by their own rules.
If the mean is half way between graduations, so be it. The uncertainty of the data set is captured in the measurement uncertainties and data set uncertainty (s.d.)
The sample mean is an estimate of the population mean, so it becomes perfectly reliable when n = N.
Those are both quite possible. It makes no difference to the summary statistics of each data set.
You’re getting into sampling uncertainty there, which is where the SEM comes into play.
What you mean “we”, white man?
Something which just occurred to me. Are you treating the average of the measurements as a measurement? Rounding or truncation may well be appropriate in that case.
I’ve been treating this as statistical analysis of the data set. Rounding or truncation are not appropriate for that.
“Statistical uncertainty is orthogonal to measurement uncertainty, and is expressed in the variance and standard deviation. The mean has no statistical uncertainty – it’s just a central estimate.”
Unless you know the entire population, a mean calculated from samples certainly has statistical uncertainty – statisticians have mis-named it the “standard error of the mean”.
Every measurement data point in a sample should be given as “estimated value +/- measurement uncertainty”. Those measurement uncertainties then get propagated onto the mean of the sample. When you estimate the population mean by averaging the means of multiple samples then that average should also have the measurement uncertainties of the sample means propagated onto the estimate of the population mean.
Statisticians and climate scientists never propagate the measurement uncertainties of the sample data onto the estimate of the population mean. They just average the estimated means of samples and generate a sampling error value based on dividing the standard deviation of the sample mean estimated averages by sqrt(n).
This goes back to understanding the data you are using. Statisticians and climate scientists just drop measurement uncertainties because they don’t understand what the meaning and impacts of the uncertainties is. They base everything solely on extracting the estimated values of the actual data which is “estimated value +/- measurement uncertainty”.
In the real world, the measurement uncertainties will usually be the determining factor as to what the last significant digit should be. Taylor covers this explicitly. I don’t remember the numbers exactly but it’s like giving an average as 728 mph +/- 10mph. If your uncertainty is in the tens digit then taking the average out to the units digit is “guessing” at a value for the units digit. This should be given as 730 mph +/- 10mph. You don’t “lose” information from rounding the average because its information that you actually don’t know. You can’t lose what you don’t know.
Yes, I stuffed that one up. I think that was meant to be population mean, but you are absolutely correct regarding sample means. My bad 🙁
Yep.
That should be “mathematicians” rather than “statisticians”. They know and blindly apply the formulae to get answers they like, but don’t understand where they are inapplicable.
728 is close enough to 730 to round up, especially with uncertainty of +/- 10. 725 +/- 5 is not such a clear decision.
Remember that with 725 ±5, your uncertainty is in the units digit, not the tens digit.
Something like 725.3 ±5 would be incorrect both from an uncertainy standpoint and probably from a significant digit rules also.
Yeah, 725.3 +/- 5 is certainly pushing it.
It’s an interesting thought, though.
725 +/- 5 could come from lots of distributions, but the simplest case is that half the readings are 720, and half are 730. If you have a sample size of 100 and a couple of outliers of 740 or 750 instead of 730, reporting the average as 725.3 could actually be useful in the absence of the other summary statistics.
Reporting to that additional s.d. puts up a bit of a flag that there might be something going on that’s worth looking at. It provides statistical information, even though it’s not valid measurement information.
The numbers you gave of 720, 730, 740, and 750, only have two significant digits. Quoting a mean to 4 sig figs is certainly pushing the available measured information.
That is where statistical parameters can lead one astray.
They can lead you astray, but they can be quite useful as a “that’s funny”* indicator. Most of the time there’s no “there” there, but sometimes there is.
Edward Lorenz came up with a new field of mathematics by premature rounding 🙂
[*] The quote attributed to Isaac Asimov is quite salient:
“The most exciting phrase to hear in science, the one that heralds the most discoveries, is not “Eureka!” (I found it!) but ‘That’s funny…”
“728 is close enough to 730 to round up, especially with uncertainty of +/- 10. 725 +/- 5 is not such a clear decision.”
725 +/- 5 gives an uncertainty interval of 720 to 730, assuming a Gaussian distribution for the uncertainty. If the uncertainty distribution *is* Gaussian, and the measurement data is as well, then 725 is an acceptable “best estimate” for the measurand property.
That’s a lot of assumptions that have to be met in the real world.
Remember that your crankshaft journal example had two basic unstated assumptions, that your measurements were all taken at the same place on the journal and that the journal was perfectly round even after being in a worn condition. If neither of these unstated assumptions were met then adding resolution through averaging is just adding information gleaned from a cloudy crystal ball.
Thank you. That’s all I was trying to get at.
Actually, it really just need to be symmetric, rather than Gaussian.
I think the next paragraph said that with those readings you would be checking for ovality and taper.
I should have said they were taken at -45 , 0, 45 and 90 degrees at 1/3 of the way from either end of the journal. I’m going to pretend that was deliberate to illustrate the importance of metadata.
It wasn’t – I just forgot to write it 🙁
If they were taken at the same place I’d probably be going for the higher resolution micrometer.
Well, in practice the readings are all within tolerance, so it doesn’t really matter 🙂
The mean is still what it is. I wouldn’t be relying on it for anything, though. If those were 8 readings taken at the same spot with the same micrometer, I’d want to know damned well that they were the result of rounding.
“Jeez, I hope 1.5009 doesn’t round to 1.500, or 1.4999 round to 1.499″
You simply can’t know whether it really is 1.5009 or 1.500 if your resolution is only in the thousandths digit. Adding a digit by averaging is nothing more than using a cloudy crystal ball to see into the Great Unknown. It doesn’t matter if you are using a digital or analog instrument. I do recognize your concern with rounding but uncertainty is based on the last reported digit. If the uncertainty is in the thousandths digit because of instrument resolution then going out to the ten-thousandths digit with your reported value is misleading readers about your results.
What you have described here is known as “truncation”. Truncation adds a systematic effect by always driving the reported result downward. Rounding rules are used to minimize this systematic effect.
Nah, Jim just didn’t proofread what he wrote 🙂
He probably meant either 1.5004 or 1.50049.
Quite so, but truncation isn’t rounding. I couldn’t resist having a little dig at:
“Here is the problem, you didn’t measure 1.5000 nor 1.4990. You can’t be certain what the last ten thousandths digit is. It could have been 1.5009 or 1.4995. The 1.499 measurements could have been 1.4999 or 1.4985.”
Don’t forget that the variance of daily temperatures changes according to latitude. Even if the measuring station locations were perfect for spatial sampling, you *still* can’t just directly average their values. You need to use a weighted average based on the variance of the data sets being used in the average. It’s really no different than propagating measurement uncertainty as if the uncertainties are variances.
Just one more thing climate science ignores by assuming that all variances are Gaussian, random, and cancel => including measurement uncertainties.
It is one reason that expanded uncertainty should be used. This is a value to be used for the next single measurement.
Technically each measurement should be evaluated with its own uncertainty budget with changed microclimate conditions.
A lot of the uncertainty budget items I included should be different when measuring Tmax vs Tmin. Items like solar radiation, ventilation (wind), dew point can cause different values of uncertainty. Hopefully a value like U =1.55°C will cover most of the uncertainty.
It surely makes measurement values in the hundredths and thousandths really questionable.
Thanks, Jim and Tim,
You reinforce in clear words that one big criticism of BOM that I made is the absence of not only a full, comprehensive estimate of uncertainty but also publication of it with emphasis on its consequences.
One consequence is that the claims of Australian warming should not be used by those legislating actions to combat future harm because the reality of future harm is lost in the big space between the uncertainty boundaries. It is in Arthur or Martha land.
The public and the legislators should be told this in clear words.
I am in no way in denial that Australia might have warmed over the last 150 years. I simply ask that it is stated that uncertainty exists.
Geoff S
It isn’t even the uncertainty that is the problem. The other big problem is using Tavg calculated by (Tmax+Tmin)k/2. That hides the pertinent information about what is actually. ignoring how seasonality affects annual averages hides rather summer, winter, or both are warming. Many of the Global Average Temperature groups average land air temperature with average Sea Surface Temperature. Another average of two different things entirely. If they want to use SST because of the paucity of ocean air temperatures, then they should use land (soil) temperatures. Both the ocean and land are heat sinks that in essence remove insolation energy from immediate emission and store it for later release. I didn’t even have to argue with Copilot about that issue, it readily found studies that found maximum temps happen 2 – 4 hours after max insolation.
Jim,
Agreed on Taver.
I have avoided it where possible and in one of the links I describe why it is invalid.
Analogy. Pilots routinely measure aircraft speed over the ground and speed through the air. Each has a purpose. But I have never seen them averaged. Although they are similar their origins are too different.
Geoff S
“Although they are similar their origins are too different.”
Not for a climate scientist or a computer programmer.
Excellent.
Here is a document that it is pertinent.
IE361_Stat-and-Metrology-Notes.pdf
I have a couple of other references but I need to find them. I remember one of them discussed that rounding to the nearest graduation based on an estimated next digit allows a relatively constant representation across different instruments.
Canada is far cooler now than for most of the last 10,000 years
This is nothing new. Almost 20% of large urban areas in the US suffer from land subsidence causing infrastructure problems – due to extraction of ground water. Why should the Canadian Arctic be exempted from infrastructure problems?
Mr. Eldrosion: Your comments here demonstrate no education or experience in geology, arctic or otherwise, only a capacity to cherry pick an article serving the cause. Never occurs to you that “natural structural cement” is no more permanent than man-made cement; both will be broken up by the earth, over time. Do you really suppose that we can preserve permafrost by building windmills?? Evidently, you do.
So permafrost areas warming from -20C to -18C is a catastrophe. Got it.
Does your freezer contents thaw out at -18C? </sarc>
This is funny.
Paul Courtney clutches his pearls and accuses me of cherry-picking, while Bnice simultaneously rolls out a conveniently curated 1998–2015 window that delivers the cooling trend he was looking for.
The irony: classic WUWT.
Since any “permafrost” problems are likely to be far north of the “100-200km from USA border” where a vast majority of the population is.. we should only look at the far north
(not the urban regions where Ant’s temperatures come from, and that don’t go back to the 1930s,40s.)
Maybe see what Alaska is doing??
Oh look there’s that 1930,40s peak similar to now.
An up to date graph ….
Eldrosion,
Do you regard my article that is the basis of the thread here, to be “classic WUWT?”
If you do, then would you please admit that WUWT is rather high standard?
If you do not, what do you find “ironic” about other WUWT articles?
…
What is your purpose? Instead of commenting on my lead article that has hundreds of hours of work over 30 years, you choose a lesser comment by a reader for your stunning riposte? Why?
Geoff S
Eldrosion, before the ground froze, it was warmer than now.
It has cooled – due to natural causes. It may warm, also due to natural causes.
As Feynman said “Nature can’t be fooled.”
Unlike yourself, apparently. Maybe you have fooled yourself into believing that permafrost can be melted by the magical effect of CO2.
That would be really foolish, wouldn’t it?
The BoM, CSIRO and other government bureaucracies in Australia were captured by ideological “progressives” (socialists) many years ago.
The “long march through the institutions” as their Fabian Society plotted back in the 1890s.
AU is not an outlier in this capture by leftist ideology.
It has happened / is happening in most western democracies.
Communism ideology will never be eradicated.
For Australia, it’s like the cane toads invasion.
Well Mr, the cane toads have met their match in southern Queensland where the magpies and crows turn them over and gorge themselves. The
socialists' seem harder to eradicate, I agree.climategate’ saga of 2009. This is where Phil Jones and fellow alarmists, Santer, Hansen and Mann ofHowever, the climate catastrophists have almost been stymied by the reality of mild modern warming unrelated to any industrial emissions or CO2 from any source as defined by recent validated empirical research since the
hocky stick' ill fame tried to cover up their unscientific temperature data manipulations andheavying’ of peer reviewer and editors to eliminate inconvenient papers.The BoM are guilty of similar unscientific, bureaucratic elitist behavior and urgently need to be dealt with by a Royal Commission, along with their alarmist mates in the CSIRO. In Australia, we cannot sort out the irresponsible government inspired energy crisis until we sort out the crazy climate non-crisis that has justified the current stupid energy policies in vogue. Deal with the root source of the evil climate ideology and the political solutions as exemplified by One Nation and hopefully the LNP will become obvious. Otherwise, Australia has no guaranteed economic or security future as an independent nation, it’s that basic a problem.
A #2 iron also does the job on cane toads ☠️
So does a hockey stick or a lawn mower! The latter can be rather messy and usually happens because the damn toads hide in the longer grass around the base of trees. Satisfying though. Also explains the erratic behavior of Queensland drivers. they are all over the road trying to run over the cane toads.
https://www.youtube.com/watch?v=gi4mfYwQi84 enjoy!
Good illustration of the cane toad offing methods in this post!
…That’s my excuse and I am sticking to it.
Why did the cane toad cross the road?
To get to his flat mate.
Geoff S
I prefer a 3 iron, but that is just me.
For those not well versed in the Fabian concept, here is a speech from a former Australian Prime Minister in 1984. You do not have to be a skilled political analyst to see the relation to Communism, a political style that most genuine Australians want nothing to do with. The majority has voted against Communism wherever it has been an electoral candidate.
Geoff S
http://www.geoffstuff.com/fabian.pdf
Plenty of data and ACORN vers. Comparisons, adjustments, etc. have been pulled together here.
http://www.waclimate.net/acorn2/index.html
I just checked the website and was amazed at the amount of weather and climate data that was acquired and presented. It was determined that Oz has warmed by 0.6-7° C since ca. 1900.
A few years ago I had an email conversation with the BoM regarding their forecast Vs actual temps and why in Perth they seemed lopsided in summer. They claimed there was no bias. I had to pay them to buy a year’s worth of data to prove there was a +ve bias
Ken at Kens Kingdom has done a survey of BoM stations like our host did in the US, and Ray Sanders did in the UK.
My take is that Australia sites are not a whole lot better than the farcical Met Office sites.
“Adjustments™” have been just as rampant.
For clarity I did some thinking around this issue using the Spanish study between full-size and medium size Stevenson screens (”The results show that the medium-sized Stevenson screen tended to overheat daily maximum air temperatures (0.54 ∘ C on yearly average) and also air temperatures recorded at 1300 UTC. The differences on daily minimum air temperatures were negligible (−0.11 ∘ C on yearly average).” from https://www.researchgate.net/publication/272494986_Impact_of_two_different_sized_Stevenson_screens_on_air_temperature_measurements) and my ”educated quesese” a.k.a questions on ChatGPT 5.2. It
On the research question was whether a gradual instrument transition in a meteorological station network could create an artificial temperature shift without leaving a clear discontinuity in the time series.
The analysis used a simplified five-station network model representing a regional Australian climate regime comparable to southeastern Australia (e.g., Victoria), where observed interannual mean temperature variability is on the order of ~0.5–1.0 °C based on Bureau of Meteorology annual summaries from recent decades (illustrative years 2019–2025).
A station change was assumed to introduce a mean +0.5 °C measurement bias with variance reflecting weather-dependent effects. Stations were converted sequentially over a three-year interval and the regional mean together with relative homogenization behavior was evaluated analytically via expected values.
The results show that during the early phase the shift is masked by natural variability, during the intermediate phase the majority of modified stations forces adjustments in the remaining minority stations, and after full conversion the entire network mean shifts by approximately the bias magnitude without a persistent breakpoint in the series.
The conclusion is that relative homogenization alone cannot reliably detect a rapid network-wide systematic measurement change and that parallel measurements or external references are required to preserve absolute temperature level integrity
A review of the reference list associated with the discussed article by Geoff, indicates that while it contains sources addressing instrument bias, screen design effects, measurement uncertainty, and homogenization methods, it does not include material demonstrating that relative network comparison by itself can detect a near-simultaneous systematic shift affecting most stations.
This may seem neglible, but when discussing temperature changes at a resolution of 0.01°C or even 0.001°C, a value of 0.11°C overwhelms those changes.
In climate science all measurement uncertainties are random, Gaussian, and cancel. The −0.11 ∘ C on yearly average just disappears into the ether.
Congratulations Sherro. I gave up trying to get answers from them years ago. After years of looking I realised the BOM temperature record is useless for climate analysis.
Ken
Hi Ken,
I hope that I quoted you accurately and that I noted your important contribution to this topic over the years.
Plausibly, you and I agree that this historic temperature was designed to be collected for other purposes before global warming was a glint in the public eye and that it has shortcomings for the purpose of claiming catastrophe from climate change. Indeed, without criticism of the many patient observers over many years who produced the BOM record from day to day, it has to be recognised that the BOM record is “not fit for purpose” of global warming study. The problem that we have tried to highlight is that our BOM should know this, perhaps does know this, but fails to do the correct scientific solution of admitting to it and its limitations.
That is why I mention the need for a comprehensive, over-arching estimate of the total uncertainty, GUM style, conducted by BOM as part of its expected duties. Geoff S