Kevin Kilty
Early last summer, during some internet search, Google’s AI Overview told me that Earth’s energy imbalance (EEI), that is the difference between incoming solar irradiance and outgoing solar plus long wave infrared, is 0.5 ± 0.47 W/m2. Like most facts that AI Overview offers, I gave it little attention at the time, but filed it away in my memory.
Later, I began to ponder the number. Several things occurred to me. First, the uncertainty seemed to me an attempt to avoid writing the value as 0.5 ± 0.5 and thus potentially including zero in whatever confidence interval was being expressed. After all, it is difficult to speak of the world starting to cook, when a confidence interval on the source of heat includes zero – i.e. no heat source at all. Second, the number is very small considering that at any given time in the temperate zones incoming solar irradiance might be over 1,100 W/m2 and is highly variable to boot. I wondered “How does one measure this imbalance number and how well can we characterize its uncertainty?” What I expected to learn was that EEI is a very small number when compared to the variability of climate, and that its magnitude is greatly uncertain.
It’s difficult to get AI Overview to repeat things at a later time, and there was no use searching for the original source of the 0.5 ± 0.47 W/m2 figure using it. I eventually located the origin of this value but learned that it referred not to the EEI per se. Rather, it was a measure of decade to decade change in EEI. Along the search, however, I found an array of EEI values at different places stated in different ways.[1] Estimates ranged from 0.47 W/m2 to 1.0 W/m2.
Some of this dispersion apparent in all these values can be explained as figures covering different time periods. For example, figures of 0.47 W/m2 and 0.76 W/m2 quoted by Von Stuckmann et al [2] are for the time periods 1971-2018 and 2010-2018 respectively. Nonetheless, this is a rather large dispersion of estimates for energy imbalance considering that one person described it as “ the most fundamental metric that the scientific community and public must be aware of as the measure of how well the world is doing in the task of bringing climate change under control…”
In addition, the values pertaining to uncertainty for this very important number are unclear. Are they one-standard deviation values, 95% confidence intervals, 90% values or something else? It turns out they are a mixture. Since a number of people argue that the EEI is increasing with time, are these mean values over some time period plus uncertainty that are detrended or are they instantaneous at some particular time? Many questions to answer.
Let’s see what we may learn in some selected literature research.
Energy accounting
Earth’s energy accounting is not terribly different from double-entry financial accounting. There is a revenue account (incoming solar irradiance) and the expenses of outgoing longwave and reflected solar irradiance. Meanwhile in our energy ledger we have an assets account where I would place any energy additions to doing useful things like warming the planet or greening it, but which alarmists would call the liabilities account, perhaps, and a Earth’s energy equity account.
However we might name these accounts, our double entry system must balance, and this should provide two independent ways of determining energy balance or lack thereof. In one way we look at the flows of energy crossing a boundary at satellite level. In the other method we look at all the places where energy appears to have accumulated. In principle the two ought to produce the same numbers to within the uncertainty of each.
Magnitude and Uncertainty
Since energy imbalance is one of many metrics being employed to argue for changes to livestyles, products, and the economy as a whole, it had better be proven to be a highly certain menace. At a minimum this means being accurately measured.
Uncertainty of a measurement is not merely a matter of the randomness of a process being measured or randomness in sampling of a population. It has to include all aspects of a measurement, from the inherent randomness of the process being measured, through uncertainties of instrument construction, calibration, drift, installation (think of the surface stations project here), algorithms, and data reduction. If a stationary process is sampled in a truly random fashion, then the dispersion of measurements might encompass all of the components of uncertainty.[3] Exceptions to this include systematic errors and bias. Errors and bias are a particular concern when specialized or unique instruments and platforms are involved, because this equipment, consisting maybe of only a few units, will not produce a statistical ensemble, and the biases are unlikely to be recognized in the measurement statistics alone.
In particular, Henrion and Fischoff found that measurements of physical constants tended to group near one another even when later, and better, measurements showed this grouping to be extremely unlikely.[4] They suggested this to be an artifact of the sociology of physical science. In one sense there was a conflict between a desire to produce the definitive work (small uncertainty) and the surety of being correct (larger confidence bounds) with the former winning out often. Another finding was that scientists did a poor job of evaluating bias and that supposedly independent research groups could be nudged toward a value found by the most influential group among them (Bandwagon effect). In climate science a similar nudge is produced by a desire to produce shocking results – an effect one might call this the “it’s worse than we thought” bandwagon.
Direct measurements of energy flow
A direct measurement of energy imbalance is available from satellite measurements. Nasa’s Clouds and the Earth’s Radiant Energy System experiment (Ceres) involved seven different radiometers flying on five different satellites beginning in 1997. These radiometers enabled direct measurement of incoming solar radiation, outgoing solar radiation, and outgoing LWIR. Two of these satellites had an end of operational life in 2023, but continued to operate afterward. Terra and Aqua are both now experiencing fuel and power limitations. A new addition to this experiment, Libera, is scheduled to fly early in 2028. This program has discovered some interesting things over the years.
In a September 2022 conference, Norman G. Loeb summarized the CERES mission, its findings, and some consequences of these findings.[5] According to this presentation, Earth’s energy imbalance has doubled from 0.5 ± 0.2 W/m2 during the first 10 years of this century to 1.0 ± 0.2 W/m2 recently. The increase is the result of a 0.9 ± 0.3 W/m2 increase absorbed solar radiation that is partially offset by a 0.4 ± 0.25 W/m2 increase in outgoing longwave radiation.
What do these uncertainty estimates mean and how credible are they? To answer the “what” part of this question, slide number 20 in the powerpoint is reproduced in Figure 1.
Figure 1. From Loeb, N. 2022 [5].
This shows a comparison of direct measures of in and out radiation against planetary heat uptake, which I will address next, but which amounts to an independent, indirect measure of radiance imbalance. The two measures trend closely together at 0.5 ± 0.47 W/m2 per decade. This slide reveals the source of AI Overview’s summary! The uncertainty is identified as a confidence interval (5%, 95%); so, ±0.47 translates to ±0.29 as a standard deviation. What looks good in this graph is the agreement in what we suppose to be two independent estimates of energy imbalance.
Further research shows this agreement to not be all it seems. Whatever coverage factor these ± figures refer to (𝞼, 2𝞼, 90% CI, 95% CI, etc), a number built from change in solar energy minus outgoing longwave would calculate to 0.5 ± 0.39 W/m2; this combined with 0.5 ± 0.2 W/m2 of a previous decade would equate to 1.0 ± 0.44 W/m2, not 1.0 ± 0.2 W/m2. Unless of course there is some negative correlation between and among them which is never explained. I have had real trouble making the figures I read consistent with one another. There’s more.
I began my research on uncertainty of the energy imbalance by referral back to a special volume of JGR from May 1986 dealing with the instrumentation for the predecessor of CERES which was the Earth Radiation Budget Experiment (ERBE).[6] There we find
“These instruments are designed to provide radiant flux and radiant exitance measurements to better than 1% measurement uncertainty over a 2-year orbital lifetime.”[7]
This source of uncertainty does not cover the total that is possible, but what is important is to compare this source of uncertainty with attempts to rectify CERES satellite estimates of EEI with ones based on heat uptake. In a paper the title of which explains a lot,”Toward Optimal Closure of the Earth’s Top-of-Atmosphere Radiation Budget,” we learn that calibrations of the instruments involved are accurate to 1-2%.[8] The underlying uncertainty of the instruments, themselves, doesn’t seem to have improved from ERBE to CERES.
There are two tables in this paper that are pertinent to our question of uncertainty. Table 1 from this paper shows the very large variations in imbalance of tens of W/m2 that are due to internal climate variability – variations which are far larger than credible EEI due to forcing. Table 2 from this paper shows a list of known and unknown biases. The known ones run the range from -2 to +7 W/m2, and that the EEI obtained from satellite measurements is too large by a factor of perhaps five to be credible.
While it seems that the intrinsic uncertainty of the instruments has not improved significantly from ERBE to CERES, there are improvements to the measurements of the energy budget that have come from correlations between CERES instruments, and other space platforms and ground truth. Irrespective of the cleverness of these schemes they bring with them uncertainties from other instruments, platforms and algorithms which should be propagated into results. I wished to see a full analysis of uncertainty somewhere, but I did not find one.
More concerning, though, is the discussion on algorithms to improve this circumstance for purposes of climate modeling by, in one case, making adjustments to LWIR radiance measurement and also different adjustments to SW radiance measurements in order to produce an EEI compatible with estimates of planetary heat uptake. The adjustments made at this time were not small but rather a factor of two larger than EEI itself.[9] In other words, estimates of heat uptake in the oceans and atmosphere have nudged the satellite measurements in a preferred direction. The two methods of estimating EEI are not independent of one another. We don’t really have a double entry accounting system.
Planetary Heat Uptake
Those places on Earth where excess energy might be retained are as follows: 1) Temperature change of atmosphere below tropopause, 2) Temperature change of oceans above 2000m depth, 3) Temperature of ground surface/subsurface and 4) Change to cryosphere temperature and phase change. A recent review of all this data and results can be found in von Schuckmann et al (2023).[2] Some estimates are truly remarkable with a claimed relative uncertainty below 10% (uncertainty divided by expected value) considering that 2) through 4) are places hidden from direct measurement.
However, the authors appear to recognize their uncertainty estimates are not so rosy. They say
“The ensemble spread gives an indication of the agreement among products and can be used as a proxy for uncertainty. The basic assumption for the error distribution is Gaussian with a mean of zero, which can be approximated by an ensemble of various products. However, it does not account for systematic errors that may result in biases across the ensemble and does not represent the full uncertainty.”
Nonetheless, we are told in perhaps the most optimistic case that the present rate of heat uptake is 0.76 ± 0.1 W/m2 [2] or perhaps as good as 0.77 ± 0.06 W/m2[10]. These figures come from adding up the totality of contributions 1) through 4) in our list; though 89% is from the ocean alone. The indicated uncertainty should come at least from adding the respective uncertainties of the contributors in quadrature, just as though these data are uncorrelated, independent and unbiased estimates of the same thing.

Figure 2. Storage of energy imbalance in the ocean. From K. von Schuckmann et al. (2023) Heat stored in the Earth system 1960–2020, © Author(s) 2023. This work is distributed under the Creative Commons Attribution 4.0 License.
The gigantic effort of accounting for where all the energy imbalance goes done by von Struckmann et al [2] which involves 68 co-authors at 61 institutions is an avalanche of data stated in various units, over different time periods, and coming from numerous sources and subsets of sources. It is extremely difficult to sort out. Figure 2 shows just one example of what I see as discrepancies in the compilation. Compare, for instance, heat stored in the ocean from 1960 to 2020 to heat stored from 1993 to 2020. The depth range 0-2000m surely is made from the sum of 0-700m and 700-2000m as the sums in each case clearly show. Yet the uncertainty figures are impossible to reconcile. In one case uncertainties of 0.1 and 0.04 combine to 0.1 and in the other case the same combine to 0.2. Maybe there is covariance between the two depth ranges; though Figure 2 in Von Schuckmann et al [2] shows obvious positive correlation in all depth ranges – something that should amplify uncertainty not reduce it.
Acceleration of EEI
One prominent theme of all this work on energy imbalance is that the heat uptake is accelerating. Loeb et al [10] suggest that satellite observations show energy imbalance has doubled from 0.5±0.2 W/m2 during the 2000 to 2010 time period to 1.0±0.2 W/m2 during the 2010 to 2020 period – what I’d call decadal estimates. The increase is the result of a 0.9±0.3 W/m2 more absorbed solar radiation, partially offset by 0.4±0.25 W/m2 increase in outgoing longwave radiation.
Once again I face some troubles making all of this consistent. One would infer the result of increasing solar radiation and also increasing outgoing LWIR, here, by the rules of adding uncorrelated estimates, to be 0.5±0.4 W/m2, but which Loeb [5] characterizes as 0.5±0.47 W/m2, with a (5% – 95%) confidence interval. It’s puzzling.
Von Struckmann et al [2] suggest the equivalent decadal numbers to be 0.48±0.1 W/m2 and 0.76±0.2 W/m2 (which uncertainties they call a 90% confidence interval, 5%-95%), which are, if the stated uncertainties are to be believed, far smaller than what Loeb et al [10] calculates.
Neglecting all the differences, however, it is apparent all groups are presenting evidence of accelerated imbalance. Is this well established, though?
I decided to perform a little experiment with the uahncdn_lt_6.0 dataset (University of Alabama, Huntsville). I used an ordinary least squares linear regression. The month to month linear coefficient of increase is 0.0013C, and the 95% confidence interval runs (0.0012 to 0.0014 C/month). Figure 4 shows this. As Dr. Spencer has written on his blog, this 0.02C change per year is what results from increased CO2 forcing and anything departing from it is some influence of climate variability.
A close inspection of Figure 3 indicates the data are not randomly scattered about the best fit line. Their trend looks rather flat near the origin and at several places there is a bit more data below the line than above. Some extreme values occur at the end of the time series. This shape suggests some curvature like a squared term.

Figure 3.
So, I fit a quadratic function only to the data (small diagram in the upper left of Figure 3), and got what seems to be a remarkably good fitting curve that has zero slope at the origin and rises monotonically to the end. Ah! Evidence of an accelerating trend?
Not quite.
The fraction of variation explained by a line is 51% and by a quadratic is 53% – an insignificant difference. The standard error of residuals is no different (0.198 versus 0.195), and one suspects that the data don’t quite meet the requirement of being independent and identically distributed across its domain. In fact, it is simple to show there is correlation in this data across multiple time scales, and the correlation alone might fool a person into seeing a trend not there. This tendency toward a flattening followed by a brief but sharp rise appears in other data covering the entire past century as well, say 1920-1945. These temperature data are much too variable to establish a preference for the quadratic trend over the linear one.
Conclusions
When I began this effort I expected to discover that my search for energy imbalance would find a very small number with a very large uncertainty. This is exactly what I found. In addition, I was surprised to see that estimates of heat uptake provided a large nudge to satellite data, which leaves the estimates not independent. I also should have figured that whatever value of EEI that resulted would serve some purpose in promoting worry over climate change.
Something that all investigators in this EEI community do is tie energy imbalance to a litany of problems in the present world. To them a warming planet is nothing but trouble; there seems no upside at all. Von Schuckmann et al [2] even go so far as to calculate, from their estimates of imbalance, and even considering the acknowledged unknown uncertainties and assumptions involved, how much CO2 would have to be removed from the atmosphere to bring us back to year 1988, or to keep the Earth below 2C temperature change from the late 1800s. Why these two goals are important isn’t clear but we are told that they somehow will prevent climate problems.
But issues that cause panic are not related to puny temperature change. Heat waves do not arise from a 1C or even 1.5C background temperature rise. They arise from UHI approaching 10C and resulting from the way we have constructed our mega-cities, which are full of energy dissipating equipment and cut off from countryside breezes. Sea level rise, being puny, does not cause flooding – failures of civil engineering and people building in places they shouldn’t are what cause flooding. Knowing the EEI, which might be interesting in its own right, will aid in preventing none of this.
References:
1- See for example: Trenberth, et al, Earth’s Energy Imbalance, Journal of Climate, 27, 9, p.3129-3144, 2014 DOI: https://doi.org/10.1175/JCLI-D-13-00294.1; V. Schuckmann, et al, Heat stored in the Earth system: where does the energy go?, Earth Syst. Sci. Data, 12, 2013–2041, https://doi.org/10.5194/essd-12-2013-2020, 2020; Loeb, N.G., et al. Observational Assessment of Changes in Earth’s Energy Imbalance Since 2000. Surv Geophys (2024). https://doi.org/10.1007/s10712-024-09838-8;
2-v Schuckmann, et al, 2023, Heat stored in the Earth system 1960–2020: where does the energy go? System Science Data, 15(4), 1675-1709. https://doi.org/10.5194/essd-15-1675-2023
3 -Guide to Uncertainty in Measurement. Available online at https://www.bipm.org/documents/20126/2071204/JCGM_100_2008_E.pdf
4-Max Henrion and Baruch Fischoff, Am J. Physics. 54,791, 1989. See also Science. 289, 2260-2262, 29 September 2000.
5-Loeb, Norman, Trends in Earth’s Energy Imbalance, ISSI GEWEX Workshop, September 26-30, 2022, Bern, Switzerland. Powerpoint online at: https://ntrs.nasa.gov/api/citations/20220010511/downloads/Loeb_2022_compressed.pptx.pdf
6-B. Barkstrom, and G. Smith, The Earth Radiation Budget Experiment’ Science and Implementation, Reviews of Geophysics, v. 24, n 2, p. 379-390, May 1986
7-Kopia, L. Earth Radiation Budget Experiment Scanner Instrument, Reviews of Geophysics, v. 24, n 2, p. 400-406, May 1986
8-Loeb et al, 2018, Clouds and the Earth’s Radiant Energy System (CERES) Energy Balanced and Filled (EBAF) Top-of-Atmosphere (TOA) Edition-4.0 Data Product, Journal of Climate, DOI: 10.1175/JCLI-D-17-0208.1 “With the most recent CERES edition-4 instrument calibration improvements, the net imbalance from the standard CERES data products is approximately 4.3 , much larger than the expected EEI.“
9-Loeb, et al, 2009, Toward Optimal Closure of the Earth’s Top-of-Atmosphere Radiation Budget. J. Climate, 22, 748–766, https://doi.org/10.1175/2008JCLI2637.1.
10-Loeb, et al, 2024, Observational Assessment of Changes in Earth’s Energy Imbalance Since 2000. Surveys in Geophysics; https://doi.org/10.1007/s10712-024-09838-8
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.

“Knowing the EEI, which might be interesting in its own right, will aid in preventing none of this.” Well said.
Giving a precise number (3 significant digits) to a measurement only accurate to the nearest 1% of a four digit number is debating angels on the head of a pin (how big is an angel? Or a pinhead?).
AKA Wishful Thinking
Tom,
There is very little measurement data in any field that achieves a true accuracy of +_/- 1%
Most is advertising stuff people are making up.
There was an interesting moment way back in the Apollo moon program when soils were sent to 20 or so of the “best:” labs in the world for chemical analysis. IIRC, none achieved that 1%. Most were quite a distance from the mean of all labs (as the true answer was not known). George H Morrison authored some of the papers.
The basic assumption for the error distribution is Gaussian with a mean of zero,
=======
that only holds if the error is truly random. As soon as you start adjusting the data to fit the models the error is no longer random and your basic assumption fails.
Kevin: Very interesting post. It’s good to see a discussion from someone who has a sound understanding of measurement uncertainty for a change.
Given the basic numbers commonly quoted of Incoming SR = 340 W/m^2, Reflected (clouds and surface) SR = 100 and total outgoing LW = 240 the imbalance should be close to zero – 340-100-240 = 0. It’s hard to imagine that any instrumentation, sampling and data processing scheme could measure these variables to an uncertainty of even +/- 1%. But if you could the MUs for these measurements would be 3.4, 1.0 and 2.4 W/m^2 respectively. Computing the MU of the imbalance would thus be +/- 4.2 W/m^2. To get to the 0.47 MU claimed these measurements would have to have an uncertainty of 0.1%. In reality I suspect that the MUs of actual measurements are much larger. The claims of accelerating EEI made by the climate science industrial complex strike me as magical thinking.
There are two methods for measuring EEI. The first approach is by calculating ΔU = ΔT * (m*c) for each energy reservoir in the climate system and combining those. The second is by combining ASR and OLR. The first approach is the more traditional method and has a lower uncertainty. The second approach has a relatively high uncertainty. I’m not aware of anyone that uses the later method exclusively. CERES uses a method that is best described as a blend between the 1st and 2nd approaches. This is why Loeb et al.’s uncertainty is higher than Schuckmann et al. Both are still low enough to draw statistically significant conclusions though.
Nonsense, Rick’s comments are exactly right.
You will never understand real measurement uncertainty.
Wrong. The uncertainty in those variables carries through to the final calculation, making it impossible to draw any definitive conclusions.
He’s been told this many, many times yet refuses to learn anything about elementary data handling, much less measurement uncertainty.
dementia?
Good possibility.
The things I have refused to learn include…
. that addition (+) is equivalent to division (/).
. that d(x/n)/dx = 1.
. that Σx/n is equivalent to Σx.
. that PEMDAS rules of operator precedence can be ignored.
. that the NIST uncertainty machine does not provide correct answers.
. that a/b = b.
. that Σ(a^2) equals (Σa)^2.
. that sqrt[xy^2] = xy.
. that 78/2 = 78.
. that a/b = a/b * 100
. that an equation in the form of a/b = x is already solved for a.
. that W.m-2 is an extensive property.
. that intensive properties cannot be averaged.
. and more
The contrarian posse have been trying really hard to educate me on these points. Maybe it’s my dementia preventing me from “learning”? I don’t know. What were we talking about?
I demand to see a list.
Yep, you just refuse to believe two concepts. One, that uncertainty adds either directly or by RSS. Two, that “that Σx/n is equivalent to Σx” is not a proper form for uncertainty. Uncertainty is determined by analyzing each source individually.
Σx/n breaks down into “d/dx x” and “d/dx x” and are additive. “d/dx x =1” and “d/dx x = 0”.
This obviates most of your algebra errors when dealing with uncertainty.
Perhaps you can show us where sensitivity values are created using “n” observations.
Reread this and I made an error.
d/dx x =1” and “d/dx(1/2)= 0”.
Oh no! bozo-x has another entry for his enemies files!
Nice. Now tell the WUWT audience how those two facts combine and evaluate d(x/2)/dx.
For the lurkers: bwx here fancies himself as an expert in metrology and measurement uncertainty, and revels in pompously flaunting his nutty ideas about how averaging removes uncertainty and increase measurement resolution.
Not only “an expert” but the ONLY expert and anyone who disagrees is not only wrong but terribly wrong. He has denigrated people as not being smart enough to do calculus like PhD’s who have written books on the subject, analytic chemists with decades of experience, and people who have spent years meeting ISO measurement requirements.
Gut check – has anyone ever seen him respond with his experience in physical science and making measurements that meet science and legal requirements.
I have not seen him provide any credentials. As I recall, both he and bellman had no idea the GUM even existed until it was pointed out to them here in WUWT. 4-6 months later and they both because world-class experts.
and neither one has done anything but cherry pick bits and pieces that they think confirm their misconceptions!
It is obvious that never, not EVER, has bdgwx been involved in a project where the use of measurements and their uncertainty is entangled with legal liability, both civil and criminal.
When designing a bridge truss you simply do *NOT* assume the uncertainty of the shear strength of those beams is the AVERAGE UNCERTAINTY value of even 1000 measurements, not unless you want to be sued into poverty while you are spending 20 years in jail for criminal negligence.
Now tell the WUWT audience how those two facts combine and evaluate d(x/2)/dx.
THEY DO NOT COMBINE. Each element in a measurement equation has an uncertainty. Those uncertainties are additive in RSS.
You show a resource where constants like a count of variables have a δ other than zero.
You don’t even recognize that using measurements of similar dimensions and quantities does not require using partial differentials. Simple δxᵢ will suffice.
So for temperature measurements δq² = √[δa² + δb²]
If you want to see the uncertainty of “x/constant” it is, by the quotient rule u = √[(δx/x)² + (δ(1/2)/(1/2))²] = √(δx/x)²
You continually want the uncertainty of an actual measurement to be reduced by the number of counts in an average. δx is the uncertainty in THE MEASUREMENT of x. It is not the uncertainty of x reduced by a counting number.
Read this page from Dr. Taylor about using step step method to evaluate piece parts of an expression. I wasn’t joking about “x/n” breaking down into (δx/x + δn/n). When “n” is a constant, the uncertainty δn equals 0!
You can’t win this because you can’t show any examples to support your position from any texts or internet sources.
Unless you follow up with another post clarifying your position I don’t have much choice but to accept that you still don’t know how to evaluate derivatives like d(x/2)/dx. I’d typically guide you through the steps, but I’ve done it so many times already I’m losing interest at least for now. I guess I had moment there where I thought you may have had an epiphany which is why I followed up. Oh well.
BTW…it appears you attempted to answer δ(x/2), which I didn’t ask about, and ended up making yet another trivial algebra mistake since δ(x/2) = √(δx/x)² is not correct. The correct answer is δ(x/2) = δx/2.
Brave, brave Sir Robin! Sir Robin ran away!
You are incorrect. You may evaluate a function or expression that way, but UNCERTAINTY IS NOT COMPUTED BY THAT METHOD. Each component adds sepaarately to the uncertainty.
Look at Dr. Taylor on page 61 and the rule at 3.18. When,
q = x/u then,
δq/|q| = √[(δx/x)² + (δu/u)²] and,
If u is a constant value δu=0, and that term disappears from the uncertainty equation.
Did you not read what Dr. Taylor says about splitting a problem into pieces? “x/n” is a quotient. Quotient uncertainties add with fractional uncertainties. With simple variables, all you need are simple delta values. δq/q = √[(δx/x)² + (δ(n)/(n))²] = √(δx/x)² (δ(n) for a constant equals 0).
Here is another link. Look carefully at the highlighted text.
I’ve tried to be fair and not give just my opinion. Instead, I have given resource after resource from accepted sources. You have basically inferred that it is your opinion that they are all incorrect.
Until you can begin to show sources of your own that back up what you claim I will not play your game of one upmanship. I’ll warn you now, that you will not find any resources that confirm your method.
No dementia. bdgwx is just a silly goose. His reply to you shows that.
Thanks for the vote of confidence regarding my supposed dementia. Anyway, am I correct in assuming that anyone who struggles to learn one of the “truths” above is a “silly goose”?
CERES says the EEI has increased from 0.5 ± 0.2 W.m-2 to 1.0 ± 0.2 W.m-2 from 2000/03-2010/02 to 2013/01-2022/12. That gives us a statistically significant result of an increase in the EEI of at least 40% and as high 160% [Loeb et al. 2024]
I’m right, you will never learn anything.
bdgwx,
There is no evidence I have found for your assertion that “Both are still low enough to draw statistically significant conclusions though.”
Maybe I have not read all the appers. On which do you rely?
Gdoff S
I relied on the NIST uncertainty machine to support my statement that statistically significant conclusions can be made.
Its a computer program — garbage in, garbage out.
And ignored the statement on uncertainty quoted
However, it does not account for systematic errors that may result in biases across the ensemble and does not represent the full uncertainty.
The interesting thing here, and I hoped it would come though in what I wrote, is that these researchers know that their estimates of uncertainty are much too optimistic, that they don’t take into account biases and systematic errors, but still they repeat the much too optimistic uncertainty bounds and insist that their estimates are actually underestimates — things are worse than they are saying…
Can you post a link to a dataset that reports the correct uncertainty and systematic errors ?
Try John Clauser’s presentation, here:
http://www.sepp.org/science_papers/John_Clauser_ICSF_FINAL_May-8_2024.pdf
Kevin / bdgwx – Where do all these small uncertainty intervals come from? Per Clauser, they’re an order of magnitude greater than the so-called EEI.
From the way climate science typically operates these uncertainties are probably sampling error, i.e. the standard deviation of the sample means – many times mid-identified as the uncertainty of the mean. One problem is that they really only have one sample and assume that the standard deviation of that one sample is also the standard deviation of the “population”. They then assume that the parent and sample distribution is random and Gaussian so they can use the sampling uncertainty as the “measurement uncertainty”.
One factor in the uncertainty budget for all of this should be the fact that radiance is impacted by the transiting media, e.g. water vapor absorbing outgoing microwave energy. It creates a variability in the results that is typically not accounted for in the uncertainty budget.
I applaud Keven’s analysis of the data that is available. But until I see a *complete” uncertainty budget, including systematic bias of the measuring devices, I’m going to take all of this with a big grain of salt.
Plus the numbers are averaged to death, over the globe and over a year.
I agree with you.
Thanks. So per Clauser the “correct” imbalance and uncertainty is 9 ±10 W.m-2. Not wait it’s 0.5 ±3.9 W.m-2. No wait it’s 0 ± 5.0 W.m-2. No wait it is 0 ± 3.5 W.m-2. No wait it is 0 ± 5.6 W.m-2. No wait it is 3.5 ± 3.6 W.m-2. No wait it is 2.5 ± 3.0 W.m-2. Clauser provides a lot of “correct” answers in that document. Which one of those is the correct “correct” answer? And is this correct “correct” answer consistent with other data points? And where are his “correct” estimates using the method where EEI is measured directly as opposed to indirectly via the subtraction of OLR from ASR?
You still don’t understand that true values of a measurement are unknowable.
And of course, that error is not uncertainty.
‘Clauser provides a lot of “correct” answers in that document. Which one of those is the correct “correct” answer?’
Short answer is that none of them are correct. What Clauser has done is to take the data from each of the alarmists’ studies as given in order to point out that had any of them actually done the arithmetic correctly AND properly calculated the uncertainty of their results, their studies would not have supported their climate alarmism.
Why does Clauser use a suboptimal strategy for measuring EEI?
Why does Clauser ignore the fact that the 1LOT strongly hints at the possibility that the correlation r(ASR, OLR) > 0 and that this fact will alter his uncertainty calculations?
Why hasn’t Clauser published his results and own EEI measurement?
What makes you think Clauser and Clauser alone has a monopoly on doing arithmetic correctly?
And if none of Clauser’s attempts at a correct figure are actually correct then how could he possibly know that he’s right and everyone else is wrong?
‘Why does Clauser use a suboptimal strategy for measuring EEI?’
EEI = SW_in – SW_out – LW_out. Why is this a suboptimal ‘strategy’?
“And if none of Clauser’s attempts at a correct figure are actually correct then how could he possibly know that he’s right and everyone else is wrong?’
Again, Clauser isn’t attempting to provide the correct figure for EEI – he’s just demonstrating that the data used by each of each of the researchers cited does not support their estimates of EEI, nor are any of these estimates significant when their respective uncertainties are correctly calculated.
‘…the 1LOT strongly hints at the possibility that the correlation r(ASR, OLR) > 0…’
Interesting. I remember when climate alarmists used to state outright that r(ASR, OLR) < 0, i.e., that increasing GHGs decreased OLR causing ASR to increase. Perhaps a shift in the ‘canonical’ narrative was required to bail out the modelers?
SW_in, SW_out, and LW_out each have high uncertainty so when you combine them into EEI it too has a high uncertainty.
Clauser isn’t demonstrating that though. What he is demonstrating that his own understanding of how to estimate EEI is flawed and results in a very high uncertainty.
What are you talking about? You do realize this is the notation used in JCGM 100:2008 section 5.2 right?
That has nothing to do with this discussion.
Whenever you see bwx typing “1LOT” and “2LOT”, you know you are also going to see some really wacky stuff.
Nope. That isn’t right. Or at least I should probably clarify this statement. It’s not that measuring EEI via the relationship EEI = ASR – OLR only is flawed. It is actually a perfectly fine way to do it in theory. It’s just that in practice it results in a very high uncertainty. I think what is actually flawed here is in thinking it is a better way to do it than what was actually done by the scientists Clauser criticized.
And what you refuse to acknowledge is that all the methods have real uncertainty limits far greater than the tiny numbers typically claimed.
‘SW_in, SW_out, and LW_out each have high uncertainty so when you combine them into EEI it too has a high uncertainty.’
That’s rather the point, isn’t it? The only relevant / observable data we have in hand is too course to provide a meaningful estimate of EEI.
Your preferred indirect methods of computing EEI bring to mind Santer’s invocation of wind shear to ‘find’ the mid-tropospheric hot spot when the on-board radio-sonde thermometers failed to corroborate the GCMs.
The relevant / observable data we have in hand using Clauser’s naive method is too course to provide a meaningful estimate of EEI. It is a fact that all of those scientists who Clauser criticized in that publication had long figured before he did.
‘It is a fact that all of those scientists who Clauser criticized in that publication had long figured before he did.’
So, you’re saying that they knew a priori they had no data with which to produce meaningful estimates of EEI, but then went ahead and published in the peer-reviewed literature anyway?
What I’m saying is that they knew they didn’t have sufficient accuracy to produce a meaningful estimate of EEI using the trivial method Clauser used in that presentation so they used a different and better method instead. Clauser wasn’t calculating the uncertainty of the result these scientists provided. He was calculating the uncertainty of the result he provided.
‘What I’m saying is that they knew they didn’t have sufficient accuracy to produce a meaningful estimate of EEI using the trivial method Clauser used in that presentation so they used a different and better method instead.’
By ‘better method’ do you mean ‘…by calculating ΔU = ΔT * (m*c) for each energy reservoir in the climate system and combining those.’?
No uncertainty in doing that, I’m sure.
Yes that’s what I mean. Unfortunately there is still uncertainty though. It is unavoidable
They come from many different methods. For example, in my reply to bdgwx, above, Good et al took an interesting stab at the problem by first gridding data, then taking the time series at each grid value and subjecting it to a Kalman filter — pretty ingenious I thought. However, if you look at their estimates of uncertainty they vary greatly from coastal regions, especially near currents like the Gulf Stream where uncertainties are large, to the open ocean where the uncertainty is small. Small in the open ocean because the temperature change month to month is easy to project. If you know about the Kalman filter, the difference between a projection one time-step ahead and the actual measurements is an estimator of error.
V.Schuckmann et al was a review article and a person would have to go back to the original work to get an idea of what methods they tried and what they considered.
Thanks, Kevin. I would question, however, how ‘ingenious’ it is to take data that has been measured with error and then run it through several models to obtain one’s results. Maybe that works for signal processing of noisy data, e.g., seismic, that can be ‘stacked’ to better reveal a real underlying physical process, but I’m skeptical of its efficacy if said process isn’t already apparent from the raw data.
What assumptions does the Kalman filter make in this case? If it assumes that the noise (i.e. natural variation) and measurement uncertainty is random and Gaussian (e.g. white noise), then that is similar to what climate science does – assume all measurement uncertainty is random, Gaussian, and cancels.
If this is what the Kalman filter does then it understates the actual measurement uncertainty.
The standard bwx poser, he refuses to believe that literature papers can be filled with BS, especially if they agree with his alarmist ideas about the climate.
Do you mean one dealing with EEI, or just any example of one?
I have not found anything dealing with EEI that looks complete. I can say I was intrigued by the work that Good, et al did with the EN4 dataset. I found the whole exposition very difficult to follow until I realized they were simply gridding data (objective analysis) followed by what is effectively a Kalman filter. They didn’t mention Kalman, but that is what they are doing. Unfortunately this only works as well as it does because of the rather large set of Argo buoys. Their effort pre-Argo might not work so well.
For a good example of an experiment using a unique instrument, then the error budget put together by Schwarz et. al. (Science, 282,2230-2234, 18 Dec 1998) seem to me an example of very complete effort. They were out to measure the Newtonian gravitational constant, G, with a free-fall apparatus.
Sorry, I forgot the Good et al reference
JOURNAL OF GEOPHYSICAL RESEARCH: OCEANS, VOL. 118, 6704–6716, doi:10.1002/2013JC009067, 2013
We need to know the correct value and uncertainty for EEI to know how wrong Loeb et al., Schuckmann et al., etc. actually are.
The EN4 dataset is good comparison. It is consistent with the others though. See [Shuckmann et al. 2023] figure 3.
BTW…the fact that Schuckmann et al. use EN4 and that Loeb et al. use Schuckmann et al. as their in-situ anchor means that Good et al. cannot be used as independent cross check since both Schuckmann et al. and Loeb et al.’s EEI estimates are dependent on Good et al.’s EN4 dataset.
It’s even a bit worse. Planetary heat uptake has nudged the satellite measurements into agreement, so they aren’t independent of one another at all.
Did your mother drop you on your head? I’m really asking because you can’t, by definition, know uncertainty.
It is not possible to know error, but it is possible to know uncertainty. See JCGM 100:2008, NIST TN 1297, or equivalent literature for details on how this is accomplished.
I don’t know why you get down voted on this, you are pretty dead on. I’d only say it’s possible to estimate uncertainty, and in fact if a person doesn’t, then they have no idea if their method has the resolution or power that a measurement requires.
His only interest in the GUM is cherry-picking whatever he can to “prove” his claims that averaging air temperatures makes measurement uncertainty vanish and increases measurement resolution. He also believes that subtracting a baseline during an anomaly calculation cancels “error”.
If you put just 10 temperatures into the NIST machine where one set of five is surrounding 10C and the other five surrounding 20C, i.e. a multi-modal distribution, with the SD of the temps around 10C being 1.5C and the SD of the temps around 20C being 1C you get an average and standard deviation. This is like combining southern hemisphere and northern hemisphere temps, one in winter and the other in summer with difference variances for each.
Exactly what is the NIST machine telling you with the average and the standard deviation?
I did this with the following temps
8C +/- 1.5C
9C +/- 1.5C
10C +/- 1.5C
11C +/- 1.5C
12C +/- 1.5C
18C +/- 1.2C
19C +/- 1.2C
20C +/- 1.2C
21C +/- 1.2C
22C +/- 1.2C
The NIST machine gives an average of 15C with an SD of 0.43C.
0.43C appears to be the the SEM, i.e. the SD of the distribution divided by n.
The SEM is *NOT* the uncertainty of the average. The SD of the distribution is about 5C. Propagation of the uncertainty values gives a u(y) of about 4.8C.
So I’ll ask again. What does anyone think the NIST machine is telling you?
And I *still* don’t know what anyone thinks an average of 15C + /- 5C tells you about the physical reality of the “globe”. Doing anomalies doesn’t help. The anomaly will inherit that +/- 5C measurement uncertainty making it impossible to determine actual differences even in the tenths digit!
A really good example, JG.
Tim (not Jim) is a well known idiot here, he’s messing up everything. For that matter Jim is also one. He simply doesn’t get what the NIST machine is calculating here.
The NIST is *NOT* calculating measurement uncertainty. Anyone that actually understands metrology would see that right away. But, of course, that isn’t you.
Kevin, It’s not strictly the SEM though you do get the same result because that’s how the law of propagation of uncertainty and monte carlo simulations play out for this particular measurement model. It’s just that the NIST uncertainty machine isn’t actually using the SEM formula. It calculates u(y) where y is the measurement model specified. When y = (x0+x1+x2+x3+x4+x5+x6+x7+x8+x9)/10 then u(y) = 0.429. We report the measurement of the measurand y as 15 ± 0.4 C. Gorman’s report of u(y) = 4.8 C does not arise as a result of the propagation of uncertainty as he claims. I encourage you to use the NIST uncertainty machine and verify this yourself.
Here is the configuration that you can use by copying into a .um file and loading into the NIST uncertainty machine.
Tell what random variables you are using for each of those input values.
Tell how you get a full month of data into the UM.
You still don’t appear to understand that each “input variable” is associated with different random variables containing measurements of different characteristics.
Tell us why NIST didn’t calculate the monthly Tmax average and it’s uncertainty using your method of dividing everything by 22 in TN 1900!
Until you can refute NIST you are just blowing smoke at everyone and wasting bandwidth.
If you want to be the final determiner of what is correct, that is your prerogative. But, you also need to refute all the real expert sources that we post.
Dr. Taylor, Dr. Bevington and Dr. Kilty(?) that actually teach this stuff all refute what you are preaching.
You can start by giving a refutation of constants having no uncertainty. I’ve provided examples from NIST, Dr. Taylor, and Dr. Possolo. Instead of telling us we are wrong, address why you think these eminent sources are incorrect.
Yep!
Again, the average formula is not a measurement model!
Will you ever grasp this?
No, you’ll just keep plowing ahead making a fool of yourself, just like with your oven door heating the oven nonsense.
I will guarantee you that bdgwx *and* bellman with both say that 6051.73 +/- 30 is a valid measurement statement!
They will also say that a repeating decimal for the average value indicates an infinite resolution for the measurand property.
Of course, they would just omit the +/- 30, problem solved.
But at least they have a “measurement model” to yap about!
Yep. The old statistician and climate science meme of “all measurement uncertainty is random, Gaussian, and cancels”!
“Kevin, It’s not strictly the SEM though you do get the same result because that’s how the law of propagation of uncertainty and monte carlo simulations play out for this particular measurement model.”
” It calculates u(y) where y is the measurement model specified. When y = (x0+x1+x2+x3+x4+x5+x6+x7+x8+x9)/10″
This is *NOT* a measurement model. It is a model for calculating a STATISTICAL DESCRIPTOR of the distribution of the stated values of the measurements- namely the AVERAGE value of the stated values.
Similarly the formula u(Σx)/n is *NOT* a measurement model. It is finding a STATISTICAL DESCRIPTOR of the distribution of measurement uncertainty values – namely the AVERAGE of the uncertainty values.
And, as usual with statisticians doing climate science, all the other statistical descriptors for the distributions of the stated values and their uncertainty is ignored – i.e. the variance, the kurtosis, and the skewness.
The average value of the stated values is an ESTIMATE of a property of the measurand. The average value of the accompanying measurement uncertainties is MEANINGLESS. The average measurement uncertainty is *NOT* a measure of the accuracy of the measurements made of the measurand. The metric for the accuracy of the estimated value is the sum of the measurement uncertainties and not the average measurement uncertainty.
All you are doing is finding statistical descriptors for the distributions of the stated values and for the uncertainties associated with those stated values. That has *NOTHING* to do with the dispersion of reasonable values that can be assigned to the measurand. That dispersion is defined by the uncertainty interval and the uncertainty interval is *NOT* the average uncertainty!
Tim, not again, you messed it up just as you mess it up every time. These are measurements with a calibrated instrument. It means every measurement is just a random variable with a known distribution (and the SD listed). If you average them, you get another random variable with an SD of 0.43. This is it.
Malarky!
This was meant to show that the NIST machine is *NOT* evaluating measurement uncertainty.
Nor did you answer my question. What I put in is a MULTI-MODAL distribution!
What does an SD of 0.43 tell you about a multi-modal distribution?
Especially when the variance of the distribution is so wide. You seem to think that Var_total ≠ Var1 + Var 2 + … when combining random variables!
If the standard deviation of the random variables represent their measurement uncertainty then the measurement uncertainty GROWS as you combine measurements!
In addition, for measurements, the measurement uncertainty is a combination of random error and systematic bias.
u(total) = u(random) + u(systematic)
Since you simply cannot know the value of each component there is no way to reduce the measurement uncertainty. The best you can do is to add the measurement uncertainties in quadrature in case *some* cancellation occurs. This is the BEST case measurement uncertainty. If there is a possibility that the systematic bias is significantly larger than the random error then the measurement uncertainties should be added DIRECTLY.
When you have studied something other than Stat 101 in high school then come back and lecture us on measurement uncertainty.
Please come back when you have understood at last what a distribution is. You use a calibrated instrument so each measurement is just a random variable’s outcome and the distribution for that is known.
Nutter.
Yep!
You still only have one measurement with an uncertainty interval. The GUM defines the uncertainty interval as the dispersion of values that can reasonably be assigned to the measurand. That does *NOT* define a distribution, it remains a single measurement whose true value is part of the GREAT UNKNOWN.
If you make a single measurement, there is no distribution to judge the uncertainty. That requires a Type B evaluation. If you have multiple measurements of exactly the same thing under repeatable conditions the SDOM will provide a measure of the uncertainty in the calculation of the mean.
If you do not or can not measure the same exact thing, then the best you can do is reproducibility uncertainty which is the SD. Now you have two uncertainties to deal with, measurement uncertainty in each measurement and the uncertainty of the average. Do you think the total uncertainty decreases?
A fine assertion with no understanding or knowledge of measurements. Read the GUM again.
A random variable is used to contain multiple measurements of the same thing. That is, length, or width, or height or radius, or pressure, or temperature. The statistical descriptors (μ,σ) of a single random variable are defined as a stated value and a standard deviation (uncertainty) for that unique measurement.
When dealing with measurements that make up a measurand, you deal with only the stated values and the uncertainty of the stated value. The uncertainty of a measurand is the combination of all the uncertainties added together. Constants have no uncertainty therefore they fall out of the uncertainty equation.
I am sorry that your statistical education didn’t prepare you to deal with measurements properly.
Don’t be the idiot you usually are. If you make a single measurement, you do it with an instrument that has known characteristics. So the measurement is just an outcome of a random variable. Full stop.
Bullshit, from a bullshitter.
“known characteristics”
So what? You still only have one measurement with an uncertainty interval. That is *NOT* a distribution of multiple measurements under repeatability conditions.
While we’re on the subject of uncertainty, could somebody explain the theoretical basis for reducing the uncertainty to a value below the resolution limits of the individual measurements?
For a measurement model y = f(w1, …, wn) it is possible for u(y) < u(xi) when the partial derivative of ∂f/∂wi < 1/sqrt(n) and the covariance between wi and wj is zero. The partial derivative requirement is dependent on the covariance. The theoretical basis is the law of propagation [JCGM 100:2008, E.3].
It is important to reinforce the fact that this is only for u(y); not u(xi). The individual u(xi) uncertainties cannot be reduced.
Thanks. Under what circumstances would those criteria be met?
See? Prediction confirmed.
Many. For the case where the inputs are uncorrelated then y = (a+b)/2 is an example. For the case where the inputs are correlated then y = a-b is an example.
I don’t think that’s correct. If the instrumental resolution is a constant (R), then you have:
y = ((a +/- R/2) + (b +/- R/2) )/2
= ( a + b +/- (R/2 + R/2) / 2
= ( a + b) / 2 +/- R/2
y = ( a +/- R/2) – (b +/- R/2)
= a – b +/- R
You might get away with
y = (a – b) +/- R/2
but I don’t think so.
Transferring measurements always adds to the uncertainty. Would that such was not the case 🙁
You can verify this with the NIST uncertainty machine.
I’m not sure what transferring measurements means.
The NIST uncertainty machine didn’t tell you the answer? You have a very serious problem.
One of the classic examples is the use of a gage block to calibrate an external micrometer. The gage block dimension is typically known to at least an order of magnitude higher resolution than the micrometer is capable of displaying. In that case, the uncertainty is dominated by the micrometer.
To preempt karlomonte, both have been temperature stabilised 🙂
Another is measuring wear in a cylinder bore in an engine.
The nominal dimension (e.g. 3.000″) is set on an external micrometer, and used to set zero on an internal bore gauge. The bore gauge reading is added to (or subtracted from) the nominal dimension.
Hey, glad to see an engine guy. Most people have no clue about checking overall wear and eccentricity in a bore. Rings can only do so much to stop blow by in an eccentric cylinder.
Bore gauges are excellent for checking eccentricity and taper, even if they lose a bit due to transferring measurements.
At the other end of the scale, three-point internal micrometers are great for finished dimensions. They aren’t cheap, though 🙁
You get what you pay for, right?
Not always. There is some expensive junk out there.
The good stuff is rarely cheap, but expensive stuff isn’t always good. Not taking pot shots at SnapOff, or anything 🙂
But according to these guys, all you need is the micrometer: take 100 readings, average and divide by sqrt(100).
Wa La! An order of magnitude increase in the resolution!
Oh, you’ve read Morice et al (2012) and Lenssen et al (2019) as well?
No, I’ve not.
They are quite enlightening.
I found this little gem in Morice.
Does that surprise anyone? No wonder a calculated anomaly start with no uncertainty.
bdgwx is not going to understand what you are talking about. He’s a blackboard statistician with little to no real world experience.
The cylinder bore isn’t a lot different than what I do with making settings for jewels. The jewel may have a nominal diameter of 2mm or 3mm but are very seldom exactly that unless you pay out the wazoo for calibrated stones. It’s easier to take pair of dividers with fine tips to measure the stone diameter and then transfer that distance to a a caliper or micrometer to get a measurement readout. The divider distance is pretty exact, the reading of the caliper or micrometer isn’t. They both have uncertainty that dominate the uncertainty although the actual measurement isn’t all that important. You either use a round burr or setting burr that matches the divider distance to cut the seat for the stone.
You won’t get an explicit answer.
Ask yourself what ∂f/∂wi < 1/sqrt(n) really means. Compute the ∂f/∂x x¹. FYI it = 1. Read GUM 5.1.6 Note 2. It says:
More to the point is that GUM E.3 is a justification for treating all uncertainties the same.
If one considers E.3.4 you see that this applies to averages of temperatures for a given period of time. In essence,
z = f(w) = (1/n)Σ₁ⁿwₖ
Goodness only knows why some clown downvoted you 🙁
The partial derivative is *ONLY* a weighting factor for the addition of the measurement uncertainty of the individual elements. It is *NOT* a method of seeing things in a foggy crystal ball.
Significant digit rules simply don’t allow knowing things you can’t possibly know. Covariance doesn’t matter, it doesn’t let you know more than you can know.
u(y) can have no more significant digits or more decimal places than the u(x) with the worst resolution.
If what you are saying were to be true then a repeating decimal for u(y) would be considered to be have infinite resolution – an impossibility.
bgx will have a load of nonsense for you to digest, oc.
He was correct, if the criteria are met. I don’t know if it can apply to resolution bounds, which is why I asked.
He believes averaging increases instrument resolution.
He is dead wrong.
I’m not sure you’re absolutely correct on that point.
Averaging a bulk measurement can give resolution of the individual items better than the instrument resolution.
The example in an earlier discussion was measuring the thickness of a new ream (500 sheets) of 90gsm printer paper (nominal 112 micron) with a micrometer with a resolution of 0.01mm (10 micron) vs measuring the individual sheets with the same instrument.
When it comes to combining measurements and taking an average, my educational background says the resolution provides a floor on the uncertainty. That’s only to undergrad level, so there are lots of things I don’t know.
“Averaging a bulk measurement can give resolution of the individual items better than the instrument resolution.”
Nope.
Taylor has two rules for this.
If these rules are not followed then you can get results like 6051.73 +/- 30 meters per second. If your uncertainty is 30 meters/second then how can you know what the values of the unit, tenths, and hundreds digit is?
The true value can be *anywhere* in the uncertainty interval. Giving a stated value with more resolution than the measurements provide for is saying you know exactly where in the uncertainty interval the true value lies.
If this concept of statisticians and climate scientists were true then you wouldn’t need expensive micrometers. Just use a meter stick marked in centimeters to measure everything and average the results. You might have to make a lot of measurements but you would save a lot of money in equipment.
Here is the problem with doing that. Dr. Taylor made this point in his example. You must be absolutely sure that every piece in that collection is exactly the same.
By measuring the collection, you are bypassing a critical step. You don’t know what the standard deviation of the entire collection actually is. All you know is what the average size and average uncertainty is. If the individual pieces vary, you will never know what the actual distribution looks like.
You could end up with two sheets taking up half the stack and the other 498 sheets taking up the other half, and you’ll never know.
Quality control deals with this every day. What sample size is necessary to spot when things are going awry? It is why many processes have sieves to first eliminate the small items and then one the right size to eliminate those that are too large.
That’s why I said a freshly opened ream of 90 gsm printer paper.
Unless it’s come from the Yum Cha Paper Company, each sheet should be within spec.
Agreed.
When we discussed this earlier, I had the second step of measuring each individual sheet with a 0.01mm micrometer (yes, using a micrometer has its own pitfalls).
The potted summary was that the sheets measure at 0.11mm +/- 0.005mm each, while the ream is 56.02mm +/- 0.005mm.
It basically comes down to well posed questions. Obviously paper companies have good control of their product and average width is a good choice. However, portraying a monthly average along with an average uncertainty as typical of everyday in la la land is inappropriate. The days are far from being the same. Uncertainty is supposed to tell the dispersion surrounding the mean. Average uncertainty doesn’t really tell you anything.
He is not correct. I do not think or have ever advocated for the absurd assertion that averaging increases instrument resolution.
Then why do you support GAT anomalies in the milli-Kelvin range from temperature measurement devices recording temperatures in the units digit?
Averaging can give resolution better than the instrumental resolution, but only for bulk measurements.
Correct. It just doesn’t in anyway change the instrumental resolution.
“bulk measurements”
I.e., those employed for every evaluation of averages, trends, accelerations now under discussion?
Averages are statistical descriptors of a distribution of measurements.
Bulk measurements are NOT an average of a distribution. One can determine an common value of the constituent items but one must assume all the parts are identical. In other words the average of each part is the same and the uncertainty (standard deviation) of each part is equal.
This does not replace actual measurement of each piece
No. A bulk measurement is a single measurement of multiple items.
Yes. If you mean that the data used for monthly GAT anomalies for example is not a “bulk measurement”, because there are different temperatures used in the valuation from different places, ok. But how are they less useful than the so called “bulk measurements”.
In general, there seems to be lots of hand waving over the possible changes in results from using data that has unknown, systematic errors. For the scope of any eval, these errors can be:
The “Oh, we can’t use this data, because it has Big Foot systematic errors in it”, whine, without an honest assessment of how they could actually affect results, can be just as easily applied to any data evaluation.
Still the fact remains that GATs are meaningless metrics that cannot tell anything about climate.
And your #3 is an out-and-out lie.
Provide me with even an implausible example of nearly constant changes in systematic temp measurement errors, over a climactically significant period, large enough and long enough to qualitatively change trend and/or acceleration expected values.
Or deflect to your usual “word salad” whine. Gosh folks, wonder which….
Pat Frank demonstrated just such in liquid thermometers built before 1900.
And you still have to believe 10 milli-Kelvin changes can be discerned in integer degree-F data.
And since it was detectable – as any such trend over time would be – it is correctable.
Bullshit, blob—you still have zero understand of non-random effects, that cannot be treated with statistics.
The referenced, totally uncited, article makes no mention of trend expected values changing. The closest Dr. Frank come is by displaying his uncertainty “whiskers” and throwing up his hands, yelling “too much uncertainty”.
https://www.mdpi.com/1424-8220/23/13/5976
And has been demo’d here and elsewhere, he boned up his statistics.
AGAIN, it’s your little coterie agin the world. You, the Gormans, Dr. Frank, need to pay a little attention to the rule of Raylan.
“If you run into an asshole in the morning, you ran into an asshole. If you run into assholes all day, you’re the asshole.”
The point is, you stupid git, that the thermometer calibrations drift with time — you cannot blindly correct for effects like this. Only insane climate pseudoscientists believe it is possible.
Not unless you have a time machine. You have no way to assess the any of the uncertainty from that far in the past. Correcting data to make it agree with current temperatures is a fools errand.
Climate pseudoscientists don’t need time travel, they are psychics.
Here is a table from the ASOS user manual. ASOS has been around since the early 1980’s. That is about 44 years which meets the warmists requirement for a climate assessment period.
Now it’s your turn to show a study, any study, that has propagated the uncertainty shown in this table from single measurements to daily averages and on to monthly averages.
Then show a study that includes that uncertainty through an anomaly calculation. As you should know, the subtraction of two random variables requires the variances to be added. Even if the baseline average had zero variance, LOL, the anomaly would end up with a substantial variance from just the monthly average.
All the ruler monkeys can manage in reply is to push the red button, no surprise.
This doesn’t even justify an answer. Error is deprecated now for almost half a century. Get with the program and discuss your questions in terms of uncertainty.
He can’t even comprehend that jamming northern hemisphere and southern hemisphere temps (including anomalies) together results in a multi-modal distribution – and the average is useless in describing the distribution!
Nope!
Another inconvenient little detail that is just swept under the carpet.
It’s not a matter of usefulness, it’s the resolution limits.
1 bulk measurement of 1,000 items to a resolution of 0.1 x 10 ^1 gives much better resolution than measuring each of those 1,000 items to a resolution of 0.1 x 10^1.
Should those resolution uncertainties be added in quadrature or directly? That’s above my pay grade, but adding directly seems correct for resolution bounds. That way, averaging brings you back to the same range as the individual measurements.
I agree with your 1st half. And your discussion of the increased resolution available makes common sense.
But since I have never seen the term “bulk measurement” used to describe those collected and discussed in climactic evaluations, I misunderstood what you were referring to. The closest I’ve seen here are the boards in the back of a Gorman pickup truck, laid end to end.
Yer still a clown, blob, pushing the climate crazy agenda.
I see where you’re coming from, but those are multiple measurements rather than a bulk measurement.
The bulk measurement only has 1 uncertainty to worry about, and that can be scaled down by averaging.
Multiple measurements each have an associated uncertainty, which all have to be added before scaling down for the average.
The quote from Dr. Taylor’s book which Tim posted below covers the bulk measurement much better than I could.
Nbr 2. Random is not sufficient. It would also have to be Gaussian or at least symmetric. Even very skewed distributions can be random and the measurement uncertainty won’t cancel.
You’ve had this pointed out to you several times in the past. Does it just go in one ear and out the other?
Nbr 3. If you don’t know what is happening because of measurement uncertainty then you can’t know if a trend developed solely using stated values is correct or not. E.g. UHI *does* happen in one direction – and is more than sufficient to affect trends. The same thing happens with the measurement device enclosure. UV *will* affect the reflectivity of the enclosure over time which will also change the ambient temperature inside the enclosure and is more than sufficient to prevent distinguishing changes in the milli-Kelvin.
This has also been pointed out to you several times in the past. Does it just go in one ear and out the other?
Here are some quotes from the GUM.
0.4: The ideal method for evaluating and expressing the uncertainty of the result of a measurement should be:
The actual quantity used to express uncertainty should be:
Examine the bolded part closely. Uncertainty propagates and adds throughout a series of calculations.
The values used in in measurement have international agreement in order to present common results. The determination and propagation of uncertainty have exacting methods so that people reviewing them can reliably understand what has occurred.
The fundamental statistical analysis of uncertainty relies on probability distributions when sufficient measurement information is available. The basic category of measurement is random variables which have a mean and a standard deviation. In measurement terms, a stated value and uncertainty.
Like it or not the uncertainty in measurements is real and scientific use of measurements requires following internationally accepted protocols. Creating something as simple as a linear regression must include a proper assessment of how uncertainty affects the end result.
He believes that non-random “errors” cancel if you cram enough of them together.
Just another verse from the “random, Gaussian, and cancels” chorus.
Sorry OC, that just isn’t true. Don’t construe that a bulk measurement provides better resolution. Of each individual piece. Again, the assumption must be that each piece is EXACTLY the same.
I think it was you that pointed out that the air gaps are included in the bulk measurement.
Lastly, you are going from a larger value and apportioning to each unit in the collection. That is not the same as taking a single measurement and extrapolating it to a larger collection by assuming it is representative of additional pieces.
Here is a page from Dr. Taylor’s book. Two things to notice, one the requirement for all pieces to be identical. Two is that the significant digits remain the same in the calculations. Consequently, you don’t really gain any resolution.
Yep, that’s interpolation vs. extrapolation.
Rule 0 – thou shalt not extrapolate.
Fair enough. Averaging doesn’t change the resolution. It just decreases the exponent.
That’s where I was headed until I took a wrong turn.
The rules apply equally to any type of uncertainty. The issue with resolution is that it can dominate over the random component of uncertainty resulting in high correlation between measurements. Think of a true value of 1.75 which is measured repeatedly with a resolution of ±0.5. The value will almost always get rounded up to 2 leaving you with ~0.25 error on each measurement attempt. You can still use the law of propagation of uncertainty but you’ll probably want to set your correlation value r(xi, xj) to a high enough value to compensate.
You still don’t grasp the basics of the subject, yet you pontificate (for the lurkers, hehehe) as if you do.
“Think of a true value of 1.75 which is measured repeatedly with a resolution of ±0.5.”
The rest of the world moved away from “true value +/- error” almost 50 years ago. But not bdgwx.
That 1.75 is a BEST ESTIMATE! It is *NOT* a “true value”. It will *never* be a true value unless you have a measuring instrument with infinite resolution (and if you have one of those how much do you want for it?).
Too bad you don’t ever show this in your assertions.
Really? Read this document.
https://www.isobudgets.com/calculate-resolution-uncertainty/#:~:text=When%20converting%20resolution%20uncertainty%20to
If you have a digital device that rounds the last digit and reads 1.75, your resolution uncertainty (with lots of assumptions) is 0.01/2 = ±0.005.
If your resolution is ±0.5, how do you justify estimating 1.75?
With a ±0.5 uncertainty, a digital device only reads to the units digits! And, that is only if it rounds accurately!
Similarly, an analog scale will have a resolution (graduations) of units.
And, both of these are for readings, not measurements where placing the index adds uncertainty as does recognizing where the mark lies on the device.
Watch this and see if you can reduce your ignorance.
https://m.youtube.com/watch?v=ul3e-HXAeZA&t=258s&ab_channel=ScienceShorts
That equally applies to a true value of 2.24.
All you can know about d = 2.0 +/- 0.25 is that 1.75 <= d < 2.25.
If it’s important to have a higher resolution, you need a higher resolution instrument.
<pedantry> With a resolution of .5, decimal readings give spurious precision. 2 +/- 1/4 would be preferable)</pedantry>
Actually with a ±0.5 the graduations on an analog scale will be in units, i.e., 2 ±0.5. Same for a rounding digital device. The smallest displayed digit wll be in the units digits and provide a ±0.5 uncertainty due to the rounding. You’ll never know where the actual value was in the interval of 1.5 to 2.49.
Technically to get ±0.25, you would need graduations at the 0.5 positions. Then you would be estimating whether the reading was more than halfway to the next 1/2 graduation. Digital devices can’t do that. By adding one decimal digit (tenths) you move to a ±0.05 for a rounding device.
Yes, I misread “with a resolution of ±0.5.” as “with a resolution of 0.5.” Mea culpa.
Yes, I misread “with a resolution of ±0.5
No problem. With my eye problems sometimes it is hard to read numbers correctly. Thank God for autocorrect on text.
Again, the metrology experts here believe you can average your way to higher resolutions, regardless of the the instrument.
For integer resolution it is ±0.5 so it is 2 ± 0.5 with the ±0.5 being rectangular. So all you can know it is that it is 1.5 <= d < 2.5.
Yes, especially if measuring the same measurand. Since measurements of the same measurand will be highly correlated perhaps even approaching r(xi, xj) = 1 if the resolution uncertainty is the dominating component then averaging isn’t going to do much, if anything, for you here.
If, however, you are averaging different measurands then the measurements will be loosely correlated perhaps even approaching r(xi, xj) = 0. In that case the uncertainty of average will be lower than the u(xi) = 0.29 (not that 0.5 rectangular is 0.29 standard). You can prove this out with a trivial monte carlo simulation in Excel.
That’s why I used the ream of 90 gsm paper as an example.
The spec says the sheet thickness is112 micron, a bulk measurement confirmed this, but all of the individual measurements were 0.11 mm.
Granted, there should be a high degree of correlation within a single ream, but any in-spec sheet of 90 gsm paper from any batch from any manufacturer should read 0.11mm even though bulk measurement of the reams they came from are in the range of 55.00mm – 57.00mm.
It is the correlation of the measurement uncertainties that affects the propagation of measurement uncertainties, not the correlation of the estimated stated values.
If you are measuring sheets of paper from different reams using different devices its not obvious how the measurement uncertainties would be correlated. If you are measuring different sheets of paper from the same ream with the same device then that would be considered to be an experimental data set and the standard deviation of the data would be considered to define the measurement uncertainty (Type A).
The resolution uncertainty is +/- the half intervals (or +/- half the final recorded digit).
Come to think of it, that is always the case, so probably means that the resolution uncertainty should always be propagated by addition rather than RSS.
In this case, we have hit the resolution limit of the measuring device, and all of the readings are 0.11mm (1.1e-4 m). The variance and s.d. are 0. Does the measurement uncertainty exclude the resolution uncertainty?
Consider using digital voltmeter in a measurement: in an uncertainty analysis of whatever the procedure it is part of, you need to quantify its uncertainty. The manufacturer doesn’t give you this on a platter, instead buried in the manual are tables of “error specs”, that depend on operating temperature (delta T from its calibration T), time interval since calibration (they drift), the particular voltage range you are using (the smaller the range, the larger the error band), and how close you are to full scale on the range. There are also specs for the A-D conversion, which should include the digit resolution. After constructing intervals for all of these, you then use RSS to get a combined uncertainty interval for the instrument.
After all this, then you can decide if the instrumental uncertainty is small enough in comparison with other uncertainty elements in the measurement procedure that you can neglect it in the final analysis.
Yep.
Resolution determines the decimal place where the uncertainty begins.
Assuming an analog scale the resolution uncertainty is half the resolution. So a LIG thermometer marked in integers would have a resolution uncertainty of +/- 0.5. To convert this to standard measurement uncertainty you divide it by sqrt(3). This *would* add to your measurement uncertainty budget for each measurement and so would get “added” by root-sum-square as part of the overall measurement uncertainty in a group of measurements. (You do similar things with digital scales but I’m not going to try and explain that here).
This is one reason why temperature measurements will *never* be totally random, Gaussian, and cancel. There will always be some resolution uncertainty with any instrument. The big question is whether or not the resolution limit is better than the differences you are trying to discern. To discern milli-Kelvin differences your resolution limit needs to be down the scale at the .001deg point. Otherwise your resolution uncertainty, when propagated to standard measurement uncertainty, is going to mask any differences in milli-Kelvin.
The average uncertainty of a set of measurements is *not* the uncertainty of the average. You can’t increase resolution by averaging, it’s a fools errand since the resolution uncertainty will always be the limit. You can only increase resolution by using better instrumentation.
Increasing resolution may not even be a panacea if the natural variation legislates against it, i.e. the standard deviation of your data. You just find the edges of the standard deviation interval to a decimal point further along the scale. Percentage-wise it may be meaningless. That’s why using a $5000 micrometer to measure the diameter of a piece of house wiring is a waste of money!
Show how averaging different measurands will reduce the uncertainty. Have you not learned that the uncertainty in an average of different things is based upon the standard deviation of the average?
Resolution uncertainty also controls the number of significant digits that can be used. You don’t seem to have discovered that either.
But hey, he knows how to stuff stuff into the NIST uncertainty machine.
The covariance term has r(xi.xj) defined as
r(xi,xj) = u(xi,xj) / [u(xi)u(xj)]
It is the correlation of the measurement UNCERTAINTES that is the issue, not the correlation of the stated values. It’s not obvious how the measurement uncertainties of measuring different measurands (temperature) using different devices would have significant correlation.
“If, however, you are averaging different measurands then the measurements will be loosely correlated”
How will they be loosely correlated? How are temperatures in the SH in winter loosely correlated to temperatures in the NH in summer? Are you trying to say that just because they are both temperatures that they are somehow correlated?
Remember, the covariance term in the propagation of measurement uncertainty is the covariance of the UNCERTAINTIES, not the covariance of the stated values. In other words the uncertainties have to be correlated. See Section 5.2.2, Eq 14 of the GUM.
r(xi,xj) = u(xi,xj) / [u(xi)u(xj)]
I would agree that there is typically some correlation between the uncertainties associated temperature measurement stations because of similar designs but how significant it is is difficult to assess. In fact, I have yet to find any literature in climate science that even addresses the issue, mainly because climate science always assumes that the measurement uncertainty is random, Gaussian, and cancels. My considered opinion is that the correlation between the measurement uncertainties will be so close to zero as to actually be zero. Microclimate differences alone will ruin any correlation between the measurement uncertainties.
The typical example used is putting two different thermometers in your house, each at a different end of the house. The temperature readings will be correlated but the measurement uncertainties of the instruments will *not* be. When calculating the measurement uncertainty of the average value the uncertainties of each will add in quadrature with no covariance term. This same relationship holds when you have one thermometer at Location A and another one at Location B 100 miles away. The temperatures may show seasonal correlation but their uncertainties will remain uncorrelated and will add in quadrature.
Some people take the time and make an effort to study the subject they are making assertions about.
Others just throw stuff at the wall to see what might stick. You’d think they would get tired of seeing everything slide to the floor! These folks never, ever show any resource material to support their assertions. Pretty much trolling with red herrings.
Averaging can not increase resolution beyond what was actually measured.
You quickly fall into the trap that dividing by √n makes the standard deviation of the mean smaller and smaller, thereby justifying the inclusion of more and more decimals to the value of the mean. This is one reason for resolutely holding to the practice of significant figures.
At one time in the past Washington University at St. Louis had a lab course document discussing this and it was so profound I saved it.
Every Equation in the GUM adds uncertainties. The squaring of uncertainties insures there are no minus signs. Relative uncertainties use absolute values to insure ratios are always positive.
If you see a value of 12 ±0.0001, alarm bells should go off. If you see a result of an average of 12.001 ±0.0001 when the data has no decimals, alarm bells should go off. That is what made my bells ring many years ago about climate science.
The implication of using 1 / √n is that u approaches zero as n goes to infinity. This is clearly absurd, except in climate science.
Anytime you see a dividing by n or the sqrt(n) in measurement uncertainty analysis you know someone is fiddling with the accuracy of their measurements.
Even for repeated measurements of the same thing using the same device under the same conditions the average value of the measurements is *still* just the BEST ESTIMATE of the measured property of the measurand. It is *not* a true value. It will *never* be a true value no matter how many measurements you make because as Bevington points out the more measurements you make the higher the chance of encountering outliers.
Nor is the average uncertainty (i.e. dividing by n) the uncertainty of the average. The average measurement uncertainty is nothing more than the value that, when multiplied by n, will give the same total uncertainty as propagating the element uncertainties. The average uncertainty is nothing more than one element of a statistical description of the measurement uncertainty distribution. It is *NOT* the accuracy of the average value.
Dividing the total uncertainty by sqrt(n) is no better. The total measurement uncertainty is what tells others about the accuracy of your determination of the property of a measurand, it is the dispersion of the reasonable values that can be assigned to the measured properties of the measurand. And it goes with the BEST ESTIMATE of the value of the measurand. That dispersion of reasonable values that can be attributed to the measurand is not reduced by the number of measurements you take, if that were true there would be no need for micrometers, only meter sticks marked with centimeter tick marks. You could get any accuracy you want for any measurement by just taking enough measurements.
This is why a great deal of climate science is basically fraudulent.
Yep.
That might depend on how the average was derived.
If the 12.001 is derived from 1000 measurements where 999 of them were 12 and 1 was 13, the average may be legit.
The uncertainty is 3 orders of magnitude too low, though.
There are any number of combinations which will give an average of 12.001, so one would want at least the s.d. Sample size would be handy, as would the mean and mode.
A mean is just numerator/denominator, so it’s legitimate to have the number of decimal places in line with the order of magnitude of the denominator. All that 12.001 is saying is 12001/1000 (or 24002/2000, or 60005/5000 or …)
The only way this could occur is if someone ignores significant digits.
“If the 12.001 is derived from 1000 measurements where 999 of them were 12 and 1 was 13, the average may be legit.”
Only to a statistician or climate scientist. If the measurement data is recorded in the units digit then the average should have no more decimal places than the units digit. You simply don’t *know* anything beyond the units digit, it’s part of the Great Unknown.
If you have 999 measurements at 12 and one at 13 then serious, VERY serious, consideration should be given to classifying that data point as an outlier and dropping it from the distribution. As Bevington points out the more measurements you make the higher the chances are of generating outliers. Both Bevington and Taylor have sections on how to identify and eliminate spurious data resulting in outliers. If I remember correctly anything outside 3 or 4 standard deviations should be considered for elimination IF you are sure your distribution is Gaussian. I say “considered” because it may very well be legitimate data but that must be evaluated and an informed judgement made. I would also point out that dropping he outlier will have a larger impact on the variance than on the mean since it’s difference is squared when calculating variance.
Bevington also points out that in a large number of measurements, even of the same thing using the same instrument, it’s hard to eliminate all systematic uncertainty. Not all systematic uncertainty is calibration bias. Even a particle of dust in the bearings of a balance beam can generate a systematic bias for those measurements made between the time the dust particle lands and when it continues on its travels. A momentary draft affecting the clock frequency in a high resolution frequency counter can generate a temporary systematic bias because of clock jitter.
Again, you can’t know what you don’t know to begin with. No amount of averaging will lift the veil and let you see into the Great Unknown.
It is statistically valid.
If you are treating this as multiple measurements of a single object, I will agree. If it is the average of measurements of O(10^3) objects, I don’t.
That is precisely why you don’t want to discard the information contained in the average.
If the average is rounded to 12, it won’t tell you that there is something further to investigate.
If we have 999 widgets which measure 12 units, and 1 which measures 13, we want to know that we need to look more closely.
Even more so if we have 500 at 10, 499 at 14 an 1 at 15.
That is also why we want the 3 Ms and the variance or s.d.
Just like taking measurements with a 0.0001″ micrometer 🙁
“It is statistically valid.”
Not in the real worlds of metrology and physical science. It gives the impression of resolution that is simply fraudulent.
“If you are treating this as multiple measurements of a single object, I will agree. If it is the average of measurements of O(10^3) objects, I don’t.”
Nope. It applies to both. It isn’t an issue of what you are measuring, it is an issue of the measurement process and the instrumentation being used. You simply can’t know what you don’t know. If the measurements were being made with an instrument capable of resolution in the thousandths digit then the recorded values should be given out to the thousandths digit – you would have things that you *know* in that case and the statement of the average would be legit out to the thousandths digit.
“That is precisely why you don’t want to discard the information contained in the average”
If the data doesn’t provide that information then the average cannot provide it either. The *only* reason to go one more digit past what you know is to reduce rounding error in interim calculations. The final answer, however, should have no more resolution than that provided by the data.
“If the average is rounded to 12, it won’t tell you that there is something further to investigate.”
That’s because there is nothing further to investigate. Going further is claiming you can see through the veil separating reality from the Great Unknown. It’s how carnival fortune tellers make their living.
“If we have 999 widgets which measure 12 units, and 1 which measures 13, we want to know that we need to look more closely.”
Probably not. Again, as Bevington specifically points out, outliers are a fact of life, especially when you are making a large number of measurements. That’s how you can justify looking to eliminate outliers that are clearly spurious. That outlier can legitimately be caused by systematic bias (temporary) that will never be able to be quantified. This is actually *more* probable in the measurement of different things using different instruments than it is in multiple measurements of the same thing using the same instrument.
“Even more so if we have 500 at 10, 499 at 14 an 1 at 15.”
The proper place to indicate this is in the dispersion of reasonable values that can be attributed to the measurand, i.e. the uncertainty interval, and not in showing decimal places in the stated value that you can’t possibly know. The proper given value would be 12 +/- 2.
And in the world of process control charting, those 1000 widget measurements are made to determine if the widget manufacturing process is under control, and if the widget measurement procedure is under control. 1 out of 1000 being 8% high is a flag which tells you there is something out-of-control, and that it needs to be investigated. Outlier determination can be a tough nut to crack in control charts.
Averaging those 1000 values tells you nothing.
“something out-of-control, and that it needs to be investigated.”
There should be a method for something like that to be pulled off the line for failure analysis. Otherwise it is impossible to determine if it is a true outlier or not. It may not even be a problem on the line but in the material being fed into the line – meaning it would need to be worked out with a supplier and not by troubleshooting the production line.
This is the world of the process control engineer, it hinges on charting on an on-going regular basis, otherwise something can change and be missed.
And yes, on a line, the engineer needs access to all the information about anything that can affect the process: materials, personnel, history, etc.
On the measurement side, a lab should have control samples that are regularly put through the measurement process and the results plotted versus time. These can tell you a lot, but the temptation is to pass because they can be seen as not contributing to the bottom line.
ASTM has a bunch of good standards on control charts, tells you how to draw limit lines, for one. Outlier determination can be quite difficult, and also requires honest estimates of the uncertainty limits. The climate pseudoscientists who go to any lengths to get uncertainty numbers as small as possible would not survive.
That’s probably what Edward Lorenz thought 🙂
I don’t think that’s how it’s usually done. The mean is still 12 (or 12.001) +/- 0.5, with an s.d. of 2 (or 2.001).
Should the s.d. be 2 +/- 0.5, since the measurements are all +/- 0.5?
It also has a mode of 10 +/- 0.5 and a median of 14 +/- 0.5.
There are lots of warning bells there.
Getting back to the earlier 999 at 12 and 1 at 13, the median would be 12.5, which would also alert us.
It all boils down to just giving the mean is only slightly more useful than a chocolate teapot.
“I don’t think that’s how it’s usually done.”
That is exactly how it is done if you are measuring the same thing multiple times using the same instrument under repeatability conditions. In that case the sd is the measurement uncertainty.
Remember what the purpose is behind the statement of the measurement value. It’s to communicate to others what they can expect to see if they perform the same measurement.
If you are measuring 1000 different things using different measuring devices under different conditions then the uncertainty is the root-sum-square of the measurement uncertainty.
In neither case is the uncertainty of the mean the average measurement uncertainty of the measurements.
“Should the s.d. be 2 +/- 0.5, since the measurements are all +/- 0.5?”
Not with the problem formulated as it is. The sd is the measurement of the dispersion of the data values. If you are measuring different things with different instruments then the resolution uncertainty should be part of the measurement uncertainty budget for each individual measurement which then get added in root-sum-square.
“It also has a mode of 10 +/- 0.5 and a median of 14 +/- 0.5.
There are lots of warning bells there.”
Yep. It means the distribution is *NOT* Gaussian. The average and sd are not sufficient to describe the distribution. The kurtosis and skewness should be calculated or the 5-number statistical description should be used.
If I looked at a histogram of this distribution; 500 at 10, 499 at 14, and one a5; on a graph I would probably tell you that you have a multi-modal distribution with one mode at 10 and a second mode at 14. The first thing I would as is whether or not you were measuring the same thing or different things – e.g. were you measuring the heights of a mix of 25 gallon and 30 gallon drums?
“It all boils down to just giving the mean is only slightly more useful than a chocolate teapot.”
Yep. Which is why the GAT is so useless without a propagation of the associated variances right along with the averages.
Thanks. I’ll file that one away.
This as well. So, the s.d. is calculated from the stated values?
Yeah, charting the actual data set tells you a lot more than the summary statistics do.
Another one might be eccentricity and taper of a cylinder, or thicknesses through the axes of a rectangular cube. Context is useful as well.
That reminds me of an old joke.
Two veterinarians are arguing about the merits of their favourite beer while enjoying a few of them.
They decide to send samples to the lab for analysis to prove which is best.
A few days later, the results come back – both horses have diabetes.
“This as well. So, the s.d. is calculated from the stated values?”
If you are measuring the same thing multiple times using the same instrument (assumed to be calibrated) under the same conditions. The only uncertainty should be random. If that uncertainty can be assumed to be Gaussian, e.g. same person reading the same analog dial for each measurement, then that random measurement uncertainty should cancel leaving only the dispersion of the measurements as the only indicator of the measurement uncertainty. See Possolo’s treatment of 20 maximum temperatures. He assumed no systematic uncertainty in the measuring station and that any variation from random fluctuations would cancel. Thus the dispersion of the stated values would describe the measurement uncertainty.
This is a *lot* of assumptions to make. Possolo was using this as a *teaching example” of how to handle experimental data. If the assumptions cannot be justified then you must handle the measurement uncertainty differently. Temperature measurements from around the globe simply don’t meet any of the necessary assumptions to justify assuming all measurement uncertainty is purely random and cancels.
“A few days later, the results come back – both horses have diabetes.”
Lovely! I haven’t heard that one! Yep, context is everything!
Many folks read the GUM and think since it shows how to calculate an experimental standard deviation of the mean that informs them it is an appropriate uncertainty value.
As Tim points out, if you read it closely and really study it and other documents, one realizes that it is only appropriate under repeatable conditions.
Repeatable conditions INCLUDE using the SAME THING as the measurand. Things like weighing a single mass, the length of the same rod, or the thickness of a single sheet of paper.
Once you pick up another thing, even if similar, assessing repeatability uncertainty from a statistical analysis of measurements goes out the window. You are now into reproducibility uncertainty. You are left with assessing the repeatability to a Type B evaluation.
It is worth noting that ISO requires all three of these items,
to be included in an uncertainty budget (among other items).
It is why reproducibility uncertainty in daily temperatures is usually the dominate uncertainty.
I think it is important to recognize that NOAA has assessed the repeatability uncertainty of ASOS devices to be ±1.8° F. If one truly assesses uncertainty, that uncertainty should be added to the reproducibility uncertainty.
I have not been able to find any information on what goes into the 1.8 figure, but I suspect it is a sum of all the uncertainties that can be allocated to single readings of temperature.
On further thought, that doesn’t seem right.
If, instead, all 1000 readings were 12, the s.d. would be 0, and you would have 12 +/- 0. Unless the resolution uncertainty of +/- 0.5 is implicit, of course.
Is this a case where RSS should be used to propagate the uncertainty from the 2 sources?
The 12 +/- 2 would still be 12 +/- 2 (well, 2.06), but the 12 +/- 0.5 would be unchanged.
“If, instead, all 1000 readings were 12″
If all the readings were exactly the same then resolution uncertainty doesn’t really apply does it?
Or are you saying that some readings were actually 11.9 rounded to 12 and some were 12.3 rounded to 12? If that is the case and the instrument actually reads in the tenths then the readings should have been recorded in the tenths and the measurement uncertainty propagated appropriately.
This is why all the tomes on metrology emphasize that a measurement plan on critical items should be laid out in detail including both the instrumentation, the environment, any measurement uncertainty factors, and what is to be done to minimize systematic measurement uncertainty.
Taylor says the stated value should have no more decimal places than the measurement uncertainty and that measurement uncertainty should *usually* only have one significant digit. If you have +/- 0.5 measurement uncertainty (i.e. resolution uncertainty is the only component) then the stated value would typically be stated as 12.0 +/- 0.5. A stated value of 12 would usually imply a measurement uncertainty in the units digit, e.g. 12 +/- 1, at least in my experience.
If all the readings are the same, then you have reached the precision limit of the device. In order to detect the actual variation of the measurand, you will need a better device that will indicate smaller changes.
Here is a good document.
WHAT DO YOU MEAN BY ACCURACY?
Microsoft Word – D_Basic.doc (gagesite.com)
My guess is that there is not a single statistician on here that is trying to justify the measurement uncertainty as being the “average” measurement uncertainty that has even the slightest idea of what this document is covering. The same applies to those climate scientists trying to justify sampling error as being the measurement uncertainty of the GAT.
Some important excerpts:
“Precision (also known as repeatability), is
the ability of a gage or gaging system to produce
the same reading every time the same dimension
is measured. A gage can be extremely precise
and still highly inaccurate. Picture a bowler who
rolls nothing but 7-10 splits time after time. That
is precision without accuracy.”
This is *NEVER* considered in climate science or statistics. Everything is always 100% accurate.
“Closely related to precision is stability,
which is the gage’s consistency over a long
period of time. The gage may have good
precision for the first 15 uses, but how about the
first 150? All gages are subject to sources of
long-term wear and degradation.”
This is *NEVER” considered in climate science. Once calibrated the measuring station remains calibrated forever.
“The fixture establishes the basic
relationship between the measuring instrument
(that is, a dial indicator) and the workpiece, so
any error in the fixture inevitably shows up in the
measurements.”
This is totally ignored in climate science although Hubbard and Lin had an important paper on this clear back in 2002.
“Annual calibration is
considered the minimum, but for gages that are
used in demanding environments, gages that are
used by several operators or for many different
parts, and gages used in high-volume
applications, shorter intervals are needed.
Frequent calibration is also required when gaging
parts that will be used in critical applications,
and where the cost of being wrong is high.”
Attempting to identify differences in temperature down to the mili-Kelvin means that frequent calibration is needed. Another requirement that is ignored by climate science and statisticians.
“Closely related to precision is stability,
which is the gage’s consistency over a long
period of time. The gage may have good
precision for the first 15 uses, but how about the
first 150? All gages are subject to sources of
long-term wear and degradation.”
Blasted US spelling. A gage is a static reference, such as a gage pin or a gage block.
A gauge is a measuring instrument, such as a dial gauge.
I can accept a gauge going out of whack, but my set of gage blocks shouldn’t be made of plasticene.
Why not? We can’t know what’s below the unit figure measured, so anything 11.5 <= dimension < 12.5 will be read as 12.
The instrument used to take the measurement can only read to the units place. Like the ream of paper example, an instrument which can read to the tenth of a unit may well read something in the first decimal place. That doesn’t alter the actual readings, which were to the un its place.
That’s another one to file away. I would have thought that 12.0 implies 12.05 to 12.15 if the +/- isn’t stated.
D’oh!
11.95 to 12.05
Another piece of evidence for a lack of scientific training. ’12’ by itself indicates a measurement to the units digit. ‘12.0’ indicates a measurement to exactly ‘.0’. it is no different than a measurement of ‘12.2’.
That’s a good point. We’ve been stumbling about in a fog of imprecise terminology.
If we use proper notation like civilised human beings:
12 should be 1.2 x 10^1, and 12.0 should be 1.20 x 10^1. Similarly, 12.2 is 1.22 x 10^1.
That makes the implicit resolution bounds readily apparent.
Averaging doesn’t improve resolution – my bad.
What it does is decrease the exponent.
Scientific notation has many excellent things to recommend it. Not the least important is the recognition of significant digit rules and requirements.
So, with the example of the average of 1,000 (a count, so exact number) entries measured to the units place which sum to 12,001 (1.2001e4), should the average be expressed as 1.2001e1, or to the resolution of the measurements (1.2e1)?
The first retains the significant digits, the second retains the measurement resolution.
If I read your problem correctly, you measured 1000 things to the units digit, i.e., plus or minus 0.5.
Then you want to find the average by summing the measurements.
When summed, you have a computation of 12001. Which is 1.2001e4.
Dividing by 1000 to obtain an average would give 1.2001e1. You are not adding or subtracting significant digits in this calculation.
However, the uncertainty of the 1.2001e1 would be calculated by propagating each individual uncertainty to the total.
You see, every average has a standard deviation.
A standard deviation can be 0, if the items are all exactly the same or if the resolution of the measuring device is not capable of distinguishing the very small changes.
This is a good point. The uncertainty is not directly related to the stated value. If you measure each item to ±0.5, that is the uncertainty you must propagate (through RSS probably).
In a thread with Tim and bdgwx above, Tim pointed out that the correlation is tied to the measurement uncertainty. Once you hit the instrument resolution limit, the correlation of the measurement uncertainty will be 1. That seems to indicate that the uncertainty should be propagated directly rather than through RSS.
The correlation of the RESOLUTION UNCERTAINTY may be 1. The total measurement uncertainty will not be correlated.
The resolution uncertainty is folded into each individual measurement’s measurement uncertainty and that is propagated RSS.
It’s just like systematic uncertainty in a measurement. It gets folded into the total measurement uncertainty and propagated RSS. In fact, it would seem that resolution uncertainty *is*, in fact, a systematic uncertainty that cannot be eliminated. u(total) = u(random) + u(systematic). It is u(total) that gets propagated, not u(random) or u(systematic) separately.
All you can do about resolution uncertainty is reduce it by using instruments with a higher resolution. You can’t eliminate it but you can make it smaller than the other measurement uncertainties in the budget so it has little impact. I believe Taylor and Bevington both address this but I don’t have time right now to look up their quotes.
Doesn’t that depend on which source of uncertainty dominates?
In the ream of paper example, we know that the resolution limit of the instrument has been reached because all the readings are identical.
The s.d. is 0, so that doesn’t seem to be an appropriate uncertainty of the mean.
Adding the resolution uncertainties directly gives a total uncertainty of 500 * +/- 0.005mm (2.5 mm). Averaging brings it back to 0.005mm.
Using RSS, we have a total uncertainty of sqrt (500 * 0.000025) =sqrt (0.0125) = 0.11mm, which is coincidentally the same as the measurement.
Averaging that, we have 0.00022mm.
We know from other measurements that the average paper thickness is 0.112mm, so 0.11mm +/- 0.005mm seems to cover that just as well as the 500 measurements of 0.11mm +/- 0.005mm.
The total RSS uncertainty seems way too broad, and the average RSS uncertainty way too narrow.
That can’t be done with measurements which have already been recorded. They are what they are and that’s all what they are.
Those could be quite enlightening.
After thinking about bulk measurements such as paper thickness, I believe it is appropriate to discuss what a measurement is.
A measurand is something that can be measured and assigned a quantity along with an uncertainty value associated with the measurement.
Some phenomena are sufficiently small that a bulk measurement can be made. Are they of value? If you are comparing bulk measurements, that is probably of value.
Is trying to interpolate from a bulk measurement to individual pieces of value? Only if the assumption is made that all the pieces are exactly the same. That may be appropriate in some circumstances such a ream of paper.
However, if the measurand is the width of each piece, that can only be assessed by measuring each piece. The estimate obtained from the bulk measurement will be useful in determining the requirements of the measuring device used for each piece. Measuring each piece will provide a distribution of the whole which can be statistically analyzed to obtain a mean and a standard deviation. Then one can determine if the aggregation of pieces has an appropriate mean and total uncertainty.
Lastly, the bulk measurand must be measurable. This will rule out many phenomenon. For example, how do you measure bulk temperatures? Say for a month? You are basically stuck with individual measurements!!
That will be the best case. Anything else will have greater uncertainty.
This is what Bob and I discussed earlier. What you’re describing is multiple measurements rather than a bulk measurement. A bulk measurement of width would be the width of the roll of paper before it goes through the knives. Of course, there will be some loss due to cutting. It’s not applicable to a ream in any case.
Yep, the aggregation must be measurable as a unit. Temperatures for a month are multiple individual measurements like your paper widths.
The average paper thickness benefits from the reasonable assumption that the relative uncertainty for each individual sheet is the same as for the ream.
“Doesn’t that depend on which source of uncertainty dominates?”
Sort of. You have to use an appropriate instrument for the differences you are trying to discern. The total measurement uncertainty is going to include all of the foibles of the instrument and testing procedure, e.g. calibration drift, temperature changes, lead resistances, pressure of the contact heads, etc. If you are using an appropriate instrument the resolution uncertainty should not dominate.
“The total RSS uncertainty seems way too broad, and the average RSS uncertainty way too narrow.”
Taylor describes direct addition as the worst possible case and RSS as the best possible case. The actual uncertainty is usually somewhere in between in the real world.
“That can’t be done with measurements which have already been recorded. They are what they are and that’s all what they are.”
And if they aren’t sufficient to discern the difference you are looking for then they aren’t fit for purpose. Either get different data or find something else to do. It’s why climate science always assuming that measurement uncertainty cancels is such a joke. Substituting the standard deviation of the sample means as a metric for measurement uncertainty is only fooling yourself. Trying to say the average uncertainty is the measurement uncertainty of the average is the same thing. The measurement uncertainty of the average is the measurement uncertainty of the sum since it is the sum that is used to find the average and the number of data elements has no uncertainty to add.
if q = Σx then
q/n = Σx / n is the average
[u(q/n)/(q/n)]^2 = [(1/n) * u(Σx)/Σx ]^2 + [u(n)/n]^2
==> [u(q/n)/q ]^2 = (n^2) * [u(Σx/Σx]^2 / (n^2)]
= u(q/n)/q = u(Σx)/Σx
u(q/n) = q * u(Σx)/Σx = Σx * [u(Σx)/Σx]
u(q/n) = u(Σx)
The n”s cancel. you can’t just redefine q/n as being just “q” like bdgwx and bellman like to do.
The uncertainty of the mean is just the uncertainty of the sum, not the average uncertainty. This matches with the variance of a distribution being the metric for the uncertainty of the average, i.e. the range of values that can reasonably be attributed to the mean. The measurement uncertainty of the average is *not* how closely you can calculate the mean.
The GUM Section 4.2 is where much attention is given. It covers Type A uncertainty analysis. The analysis is based on multiple observations that form a random variable which has a mean and a standard deviation.
Taking a series of observations and defining the measurand to be an average of these observations is nothing more than finding the mean of a series of observations. Like it or not, the uncertainty of that series of observations is the standard deviation of the series.
You end up with a Type A evaluation of that series of observations which you have already declared as a measurand!
And then accuse you of “algebra mistakes”!
Resolution uncertainty is the worst possible case.
Even if there are no other sources of uncertainty, all that is known is that the measurement is somewhere within 1/2 resolution unit either side of the stated value. The only way to do better is to use a higher resolution measuring instrument.
It comes back to your example of taking 1,000 measurements of a big end journal using a yardstick.
Shouldn’t the relative uncertainty of the mean be the same as the relative uncertainty of the sum?
Yes, if relative uncertainty is the appropriate measurement. If you are connecting beams using fish plates to span a distance it is the absolute measurement uncertainty that is of most importance. If q = Σx then u(q) = u(Σx). Relative uncertainty winds up being u(q)/q = u(Σx)/Σx. With substitution this is u(q)/Σx = u(Σx)/Σx ==> u(q) = u(Σx).
It’s the same for adding grains of gunpowder into a shell casing. u(q) = u(Σx).
Relative uncertainty becomes important when the total has components with vastly different sizes or different units. E.g. when calculating density, mass divided by volume. You have to use the relative uncertainties (which are percentages and are dimensionless) in this case. The relative uncertainty of the density is the sum of the relative uncertainties of each component.
Don’t get confused by the names “uncertainty” and “relative uncertainty”. The total uncertainty is still a sum of the component uncertainties.
Shouldn’t you be able to losslessly switch between absolute and relative uncertainty? They’re just different ways of showing the same thing – like
how do you do that with components that have different dimensions?
Consider Possolo’s analysis of the volume of a barrel, v = πHR^2. The dimensions are H in m and R^2 in m^2. You can’t directly add components where they have different dimensions. Since percentages are dimensionless you can convert to percentages (i.e. relative uncertainties) and add those.
u(v)/v = u(H)/H + u(R)/R + u(R)/R = P
u(v) = P * v.
You’ve answered your own question.
I’ll disagree with Tim.
The uncertainty of the mean is the standard deviation of the sample means (or what it would be if you had multiple samples). It is the dispersion of a sample means distribution which describes where the mean may lay.
The uncertainty of the sum is the standard deviation of the sum.
They are two different things whether you are discussing full values or relative values.
Perhaps I am misunderstanding your definitions. If let me know.
I didn’t think of the SEM being involved. Just the measurement uncertainty of the mean, not the sampling error of the mean (i.e. the SEM).
I think you just agreed with The Mathematicians(R)
It seems you are.
Tim said:
which prompted the question about the relative uncertainty vs absolute uncertainty.
and:
Not really. The mean is one of the measures of centrality of the (sample|population). The standard deviation is the equivalent measure of dispersion. They are complementary measures.
You are assuming everything is Gaussian. What does the standard deviation tell you abut a skewed distribution? About a multi-modal distribution?
For a Gaussian:
standard deviation = measure of dispersion
variance = measure of dispersion
measurement uncertainty = dispersion of values that can reasonably be assigned to the measurand.
If the average is the estimated value of the measurand then the variance of the distribution has a direct impact on the values that can reasonbly be assigned to the average.
It tells you about as much as the mean does. Neither are much use in a skewed or multi-modal distribution.
The derivation of the variance and hence standard deviation make this rather readily apparent.
As I said:
The bare minimum set of summary statistics is mean, median, mode, standard deviation or variance, and sample size.
I would add the quartiles. The 5-number statistical description is min, max, median, quartile 1, and quartile 3 and will work for almost all distributions, even skewed ones. The mode is needed if the distribution is triangular at all.
The *big* problem with climate science statistics is that jamming two different populations together many times generates a bimodal distribution. E.g. cold southern hemisphere temps with warm northern hemisphere temps. The mean of that bimodal distribution is not very revealing nor is the standard deviation. Using anomalies doesn’t help much because the variance of cold temps is different than the variance of warm temps. Jamming the anomalies together generates a skewed distribution where the mean and standard deviation are not very useful as descriptive statistics. This doesn’t even begin to address the problem of jamming monthly averages together into an annual average where some months are cold and some are warm and therefore have different variances.
I can’t begin to express how dismaying it is to see climate science assume everything is Gaussian and then even ignore standard deviation of the data while focusing only on the “average”.
Why quartiles 1 and 3?
On a skewed distribution, the inter-quartile range is similar to the standard deviation for a Gaussian distribution. The inter-quartile range is the middle half of the data. Quartile 1 is the 25% mark and Q3 is the 75% mark.
Ah, of course. Thanks.
Ah, of course. Thanks.
That’s one of the reasons for using anomalies, but that’s a bit shaky. It might be useful if everything is warming, but probably not if the variability is increasing.
Yeah, the current derivation of anomalies leaves something to be desired. They really need to be normalised or standardised as well as the current re-baselining..
Don’t forget changes 2 orders of magnitude smaller than the resolution bounds of the underlying measurements.
“Yeah, the current derivation of anomalies leaves something to be desired. They really need to be normalised or standardised as well as the current re-baselining..”
yep.
“Don’t forget changes 2 orders of magnitude smaller than the resolution bounds of the underlying measurements.”
Yep.
Oh, cool. I’m -1.
We must have a statistics denier 🙂
Relatively simple rules that climate science goes to great lengths to ignore.
You can’t. Doing so is nothing more than seeing a swirl in the fog of a crystal ball.
You don’t know what you don’t know. You *can’t* know what you don’t know.
There is a reason why physical science has the significant digit rules. It’s only statisticians and climate scientists that claim they can see into the Great Unknown and discern what they can’t know.
So you believe that ASOS stations have a characteristic repeatable uncertainty of ±1.8°F and CRN has ±0.3°C, right?
Do your devices have environmental measuring devices to make sure that you are meeting the calibration standard environment.
It is interesting, KM, that here we have an example of the belief that uncertainty is like the standard error of the mean. It grows smaller with more iid data. Of course it is no such thing. Nick Stokes I think presents a different misunderstanding. He has uncertainty confused with stability of a numerical method, which will prevent round-off errors from accumulating, but this isn’t uncertainty either.
You are exactly right, Kevin. When Pat Frank went through the wringers trying to get his paper on the uncertainty of the GC models, which shows the uncertainty of the outputs grow with each iteration, the #1 objection used by the reviewers was that “the error can’t be this big!” This same objection is used to claim that the uncertainty of GAT calculations cannot grow with every temperature location that is added.
They don’t understand that uncertainty is not error. And in fact they refuse to understand the difference, even though they try to use the GUM to justify using the SEM.
They also refuse to acknowledge that temperature measurements are time series, which means it is impossible to measure the same quantity more than once, and N is always equal to one.
The problem is that temperature data is never IID. It is foremost single measurements at different times. Temperature measurements never meet GUM requirements for repeatability which is required to use the standard error of the mean.
Here is a paper describing the difference in clear terms.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1255808/
You can not KNOW uncertainty. You can estimate what it might be thru an uncertainty budget.
I have yet to see a climate science paper address an uncertainty budget in detail.
Items like repeatability, reproducibility, resolution, drift. These are all items that affect each and every measurement and the uncertainties add.
Repeatable measurements versus non-repeatable measurements are never discussed and how it affects uncertainty.
Here is an article on resolution uncertainty. It is never discussed. https://www.isobudgets.com/calculate-resolution-uncertainty/
You can not reduce uncertainty by adding the uncertainty values via RSS and then dividing by the number of items. That operation is not found in any metrological text.
The “statisticians” on here don’t seem to understand that even a 95% coverage factor is *NOT* 100% certainty! There is still a 5% chance of the true value being outside the 95% interval. The uncertainty interval is truly only an indicator to others about what they might expect to measure in similar circumstances, it is not a guarantee that what they measure will lie in that interval. If their measurement lies outside the uncertainty interval then it becomes a mission to reconcile the differences in the two measurements and the uncertainty intervals.
Kevin,
Does Argo really have enough buoys to sample each “different” portion of the ocean? I suspect not.
The fall-back position is that all Earth Science data are unknowable. That most are subjected to adjustments. That not all data are able to be adjusted. I gave a case for the latter, the CO2 from Mauna Loa, here –
Geoff S
https://wattsupwiththat.com/2022/04/20/sorry-but-hard-science-is-not-done-this-way/
Let me explain my statement about Argo a bit more completely. There are a fair number of Argo buoys (3800 at present) and even this is small compared to the size of the oceans. However, the buoys are supposedly built/calibrated to a standard. Thus, even though they all possess errors of some sort, they at least produce a statistical ensemble of errors.
Now the buoy data goes through an objective analysis where the drifting buoys are used to determine temperature profiles on a fixed grid. I don’t know the exact form of the objective function they seek to minimize, but this is a good way to make the limited data cover the ocean as long as they don’t underspecify badly — i.e. too many grid nodes for the number of buoys.
With this done they project forward a month to estimate what the grid values should be. Then they compare these to actual measurements next month. It’s a Kalman filter of sorts, built from a state space kind of stance. The difference is an statistical ensemble that should include the instrument errors of the buoys, plus the errors introduced by the drifting of the buoys into new locations, plus modification of the ocean over the past month. So far not a bad job. It is a lot like Gage R&R in manufacturing.
Here is where things now went wrong, I think. They get an uncertainty that is very location dependent — understandably so. Large uncertainty near major currents where temperature gradients are greatest — completely reasonable. Small uncertainty out in the open ocean — again reasonable. Then they average it all just as the NIST machine you and KM are speaking of does, which is wrong. They are getting an average uncertainty for the whole of the oceans which makes no sense.
Yes! An average uncertainty tells you nothing about the dispersion of data around the mean.
I can take 10,000 2×4’s varying in length from 1 foot to 16 feet, average the lengths, calculate the SD, then divide the SD by 10,000 to obtain an average uncertainty per board, then divide that number by √10,000 to get an SEM and tell you the average length is 8 feet ± 1 inch all the while knowing that doesn’t describe the actual dispersion.
Have you found ANY papers that assemble an uncertainty budget to ISO standards regarding temperature.
I have not, but this discussion will be part of Part II when I get to it.
If you are referring to [Good et al. 2013] at no time do they take all of the uncertainty estimations from their 1°x1°x42 grid and average them. In fact, this is just a methods paper introducing EN4. They don’t provide any global scale processing of the grid to convert it a form (like OKC) in which it can be directly used to estimate EEI. They (more recently Dr. Killick) do so in other publication; just not this one.
What many scientists do fit into the type A evaluation classification in which they take the gridded uncertainties and bootstrap them to estimate the uncertainty of their OHC measurement model. [JCGM 6:2020 reference 53] and [Efron & Tibshirani 1993: An Introduction to the Bootstrap]. For multiple dataset survey studies they will then refine the uncertainty via [JCGM 100:2008, section 4.2] or [JCGM 6:2020, section 11.10.4] not unlike what [Wang et al. 2017] do.
I’ve not seen anyone use the average of the gridded uncertainties as a means of propagating uncertainty or assessing the final uncertainty of a spatially processed grid of ocean measurements.
How would you know? You don’t understand propagation of uncertainty at all.
I can’t say that Good et al did this, but look at the result of their work when put to use. It is weighted heavily toward the very small open ocean values. This suggests to me they, someone, ended up with a average of some sort.
Kevin, A must read reference to add to your work and cited references should include Liang etal 2019 “Remote Sensing of Earths Energy Budget: Synthesis & Review” : https://doi.org/10.1080/17538947.2019.1597189
I have not read it, but have added it to the library…
One sentence from the paper (which looks like a massive review paper):
The authors seem to ignore significant digit rules, no way albedo is known to four digits.
Do they really even know it accurately in the tenths digit?
Albedo? Not a chance.
As the average temperature of the entire planet is some 2700° K (2400°C, 4400°F) I would hope that the net flow is outward lest we reach similar temperatures here at the surface.
Where does the heat from the molten core of iron at the centre of the Earth get accounted for..?
At the rim of active volcanoes?
Sure, and underwater geysers and possibly air temperature changes in sufficiently deep mines or caves. But I don’t remember seeing anything like that in any energy budget diagram – unless it’s shoehorned into the “emitted by the surface” numbers.
You bring up an interesting point that is discussed here at WUWT occasionally. First, the heat flow coming from the core itself is pretty small because the worldwide average is in the neighborhood of 90 mW/m^2, and there is a substantial generation of this heat flow from radioactive decay in the crust and mantle. So, that from the core is very small. In fact, I have wondered if there is enough to maintain the dynamo producing the magnetic field — that the magnetic field might result from continued accretion and phase changes rather than heat flow.
I digress.
I know that magmatism at plate boundaries and a few places intraplate is impressive, but it still amounts to very little when averaged over the whole earth and over all time.
Ok, makes sense. I figured that 5000+ deg C has to be making itself known somehow, but if it’s reduced that much by the mantle then it’s not much more than a rounding error in the total budget.
Don’t know if this is pertinent or not, but I found this in a reference i ran across a few years ago.
Have a paper title but not link:
Satellite and Ocean Data Reveal Marked Increase in Earth’s Heating Rate, Norman G. Loeb et al., 2021.
At least corroborates your memory but they have 0.50, not 0.5.
Also, the first sentence of the abstract of the paper states:
The above article discusses an EEI of 0.5 ± 0.47 W/m2 . . . but what you reference is a rate of change in EEI of 0.50 ± 0.47 W/m2/decade.
If the two statements are both true, it would be an unbelievable coincidence!
Most of the references appear to be rather sloppy in their use of units, particularly time.
I originally misread the figure 0.5 ± 0.47 W/m2 which AI Overview provided as an Imbalance when it is in fact a change in Imbalance per decade. I explained this in the text.
Now I’m really confused. So when do we start applying rate of change in EEI?
If we start it, say, 200 years ago after the Little Ice and and at the approximate start of the Industrial Revolution, that would amount to the uncertainty in EEI slope now being 20 * 0.47 = ± 9.4 W/m2. Alternatively, it could be interpreted to mean the magnitude (independent of uncertainty) is now 20 * 0.5 = +10 W/m2.
However, if the rate of change in EEI is only meant to apply to, say, the last five years, what was happening to EEI rate of change in the preceding 175 or even 500 years? And in this interpretation, what is the explanation for the shift in the rate of change of EEI?
ROTFL.
According to the cartoon version the EEI was zero until we began driving SUVs. Now it goes up at an increasing rate according to them all because of additional CO2 and the feedbacks and so forth and so on. Maybe another 1/2 watt per meter squared every decade. They are proposing a big effort to monitor it on a continuing basis. But why is it more useful than just using satellites to monitor lower troposphere temperature? It’s difficult to substantiate is what I conclude here because it is a small number swamped by bigger ones involved in its measurement.
What about the transfer of heat from our rocky planet’s core, mantle, and crust to the atmosphere due to radioactive decay? If I do my math right it comes to around 0.092 w/m2, which is not insignificant. The transfer is not evenly distributed of course, with differing transfer rates between oceans and land surface, and volcanic activity having strong localized effects. But over long timeframes, it’s not insignificant compared to the transfer between space and the Earth’s atmosphere.
It gave Lord Kelvin problems.
Compared to an average incoming solar insolation of 1361 W/m^2, with say 70% of that being absorbed by Earth’s atmosphere and surface (i.e., 953 W/m^2) and thus needed to be equaled by Earth’s thermal emissions so as to have nearly constant average surface temperature, 0.092/953 = 0.000097 ~ 0.01%.
IMHO, one-hundredth of one percent is insignificant.
But the net heat transfer is vastly less than the total influx of solar heat transfer, is the point. The heat generated by our rocky core to its atmosphere is greater than 10% of the estimated net value as described in this post. And even the net value of heat transfer could be as little as zero, given the error band. So the known heat flux generated by the earth is NOT insignificant, and could exceed the net heat transfer resulting from solar inputs.
“. . . could exceed the net heat transfer resulting from solar inputs.”
By definition, the word “net” implies adding two or more numbers or subtracting at least one number from one or more other numbers.
I could be wrong, but I’m pretty sure there is no such thing as net transfer resulting from a solar input (i.e, a single numerical value).
See my reply to pariahdog above.
Duane,
This was triggered by the recent Pat Frank paper.
Geosciences 2024, 14, 238. https://doi.org/10.3390/geosciences14090238
Pat argued that past big flood basal eruptions were adequately energetic to change globa sea temperatures significantly. There are various ways to model this thermal process, but I seem to be missing one of the time factors. From what I understand of conventional wisdom, a flood basalt does not cease to be hot once it has erupted. It is still attached to your deeper, rather hot source and so thermal conductivity equations apply. Think not of a concealed magma with its slow contibutio at the surce, but add a “wick” now and then that taps rather more heat from below. That magma might be hotter for rather longer tat the surface than some models have it.(I might have not read adequately on this – been too ill for some months now).
Geoff S
These huge flood basalt provinces are a different creature from ordinary interplate magmatism — huge. I don’t a lot about the volumes involved, but it might be a good topic for Part II.
Whoops, guess I should have read more thoroughly before commenting. You already referenced a bunch of Loeb’s stuff.
The upper 2000 meters ocean heat content barely changed 1965-1995, during the cold AMO phase, the bulk of the rise is since 1995, because of the warm AMO reducing low cloud cover. But critically, the AMO warmed due to weaker solar wind states since 1995 causing negative North Atlantic Oscillation regimes.
Would love to see where they measured the ocean heat content in 1960 😉
“Heat waves do not arise from a 1C or even 1.5C background temperature rise.”
Heatwaves regularly arise from discrete solar forcing:
https://docs.google.com/document/d/e/2PACX-1vQemMt_PNwwBKNOS7GSP7gbWDmcDBJ80UJzkqDIQ75_Sctjn89VoM5MIYHQWHkpn88cMQXkKjXznM-u/pub
Ulric,

Here is how hot a “severe” heatwave can get.
Hottest evah for Bourke and for Sydney, 650 km apart NW-SE of each other.
You can now pose your assertion more like “Heatwaves at 15 deg C above other hot heatwaves do not arise from a 1C or even 1.5C background temperature rise.”
Geoff S
Ah, back to the famous Oz 1896 heatwave.
So Geoff, would you advocate that the BOM revert back to the meteorological observation standards of the 19th ct ? (Actually before 1910).
https://www.abc.net.au/news/2019-12-21/1896-heatwave-killed-435-climate-scientists-cant-compare-today/11809998
“Methods of recording temperature were not standardised until the early 1900s, leading to inflated temperature readings before then.
The global standard for temperature measurement includes the use of a Stevenson screen, which is a white louvred box allowing ventilation and ensuring thermometers inside are never exposed to the sun.
A Stevenson screen was not installed in Bourke until August 1908, meaning temperature readings from before that could be inflated by as much as 2C.
University of Melbourne climate researcher Linden Ashcroft said thermometers in Bourke were likely placed in sub-standard conditions in 1896.
“Some thermometers were under verandahs, or they were against stone buildings,” she said.
“So sometimes the thermometer would get exposure to the sun, and that doesn’t mean that you’re capturing the temperature of the ambient air, you’re capturing the temperature of what it would feel like standing in the sun.
“Before about 1910 for the country, temperature observations are a little bit higher — they are unnaturally higher than we would expect.”
The Bureau of Meteorology noted in a 2017 report the 1896 data “cannot be easily compared with modern recordings”.
“Detailed study has shown that extreme temperatures recorded at Bourke during the 1896 heatwave were likely suspect due to non-standard exposure, and likely around two degrees warmer than temperatures recorded with standard instrumentation.”
https://theconversation.com/factcheck-was-the-1896-heatwave-wiped-from-the-record-33742
Hot history?
So did the different ways of exposing the thermometers seriously bias the 19th-century observations, relative to modern readings? The answer is yes, and we have Charles Todd (of overland telegraph fame) to thank for answering this question.
In 1887, Todd set up what must be one of the longest-running scientific experiments ever, when he installed thermometers in a Stevenson Screen and on a Glaisher Stand at Adelaide Observatory (as seen in the illustration here). Observations were taken in both exposures until 1948.
The results of this 61-year experiment show that summer daytime temperatures measured using the Glaisher Stand are, on average, 1C warmer than in the Stevenson Screen. And this was at a well-maintained station – if a Glaisher Stand is not used properly, direct sunshine can fall on the thermometers, dramatically increasing the warm bias (and this was probably what happened at some stations, given that we know equipment was not always well maintained).
So, for much of Australia, temperatures recorded before Hunt’s insistence on standardising weather stations in about 1908 would be biased towards warmer temperatures, relative to modern observations. The poor maintenance noted by Hunt in 1907 would also lead to biased temperature recordings compared with a modern, well-maintained site.
This meant that when my former Bureau colleagues and I began work in the early 1990s on producing a credible Australian temperature record, we quickly realised that it would be very difficult to compare the 19th-century temperature data with modern observations. We thought it safer to start our record at a date when we knew thermometer exposure had been standardised.”
Anthony,
The temperature deviations that have been discussed for years have had differences questioned when they are from a tenth to a couple of degrees out of kilter. Here we are talking (for Bourke) ten degrees apart. Different concepts of investigation are needed for these “severe” excursions. There seems little realization that the global warming/heatwave scare is about (say) 2 degrees or less above expected, which is not really severe. Severe has to go into double figures of degrees C because it is recorded adequately. There is slim chance that these Bourke temperatures in Jan 1896 should be discarded because even giving them a generous uncertainty, they are still head and shoulders about “non-severe”. Birds dropped dead in flight, many people died, special trains were supplied to get people away from the centres of heat.
Twenty-two consecutive days above 100F are real, whether you use the raw data (yes, errors were likely to be large) or whether you knock 2C off them all. Question is, what was the heat solurce?
Geoff S
Tony Heller has about 60 posts on the heat of 1896;
https://realclimatescience.com/?s=1896#gsc.tab=0
All, as far as I could see, US centric.
“Here we are talking (for Bourke) ten degrees apart.”
Than Sydney?
Sorry, why do you think they are directly comparable?
And which station in Sydney?
Stations next/near the sea will have (almost always) a lower temp due to induced sea breezes.
Yes, and easily by 10C.
What makes you think they should be similar/the same?
Do you have a link to the synoptic charts over that period?
The summer of 1960, that you take Sydney from produced …
”The highest temperature ever recorded in Australia is 50.7 °C (123.3 °F), which was recorded on 2 January 1960 at Oodnadatta, South Australia, and 13 January 2022 at Onslow, Western Australia.”
Just 1C behind Bourke 1896 which was likely 2C to high due none modern staff exposure.
”Birds dropped dead in flight, many people died, special trains were supplied to get people away from the centres of heat.”
Again not comparable. The world of 1896 was vastly different for people that today …
”University of New South Wales climate researcher Sarah Perkins-Kirkpatrick said people in 1896 were largely unprepared for extreme heat, meaning they were more vulnerable to its effects.”
Back then everyone wore a lot more clothing than what they do now, there was no air conditioning, people worked outside, they moved outside a lot,” she said.”
”Twenty-two consecutive days above 100F are real…”
Not the same thing as an all time max temp tho.
I don’t dispute that the summer was exceptional.
Is there anyone working for or responsible for preparing the various reports the IPCC cerates and issues that could pass first year college mathematics or even Senior year High school mathematics?
Even my older brother had to take a two term course in Calculus (500 series) and receive a “B” or better to receive his MBA. He never used anything he learned in that course. However, he did advance to VP Operations for a well known trans am trucking co.
When alarmists talk about increasing Earth energy imbalance, I ask them how they know our CO₂ emissions are the cause. The conclusion is always by fiat. Some go so far as to list known “forcings,” eliminate them all, and then say that CO₂ is the only possible explanation.
But the whole question is ill-posed. Knowing causal sources requires having a valid physical theory of the climate.But there is no such theory. All causal sources of tropospheric sensible heat are not known. CO₂ is not known to be the unique causal driver. it’s not even known to be a possible causal driver at all. Radiation physics plus insistence is the entire source of CO₂ guilt.
That all said, I am beside myself to read your scholarly discussion of uncertainty Kevin; and on a complicated subject. 🙂
When one looks at the published Earth energy budget, such as Stephens 2015, one finds the uncertainties in all the fluxes are much, much larger than the purported TOA 0.5 Wm² imbalance. Even TOA flux itself is uncertain by ±3.3 W/m². And the surface budget is ±17 W/m².
So, I’ve never understood how these folks can claim to detect a 0.5±0.5 W/m² imbalance.
Anyone familiar with radiometric instrumentation and measurements immediately knows that claims of 0.5 W/m2 uncertainties are … optimistic.
Nowhere have I seen the uncertainties of individual radiometric constituents of EEI calculations combined into a realistic total uncertainty.
That is what prompted me to look into this, and I’d hoped to find a very full discussion of uncertainty, but without success.
This is sad, but not surprising.
Kevin,
Tom Berger and I had a go in 3 WUWT articles.
Here is a link.
Geoff S
https://wattsupwiththat.com/?s=sherrington+uncertainty
I recall this.
Well said.
Pat Frank underscores, in my opinion, the pretentiousness of claiming to understand the earth’s energy balance. Maybe a hundred, five hundred, or a thousand years from now that will be possible.
“Knowing causal sources requires having a valid physical theory of the climate.But there is no such theory.” The proverbial Elephant in the Room. It’s so obvious, but few people seem to get it.
“Radiation physics plus insistence is the entire source of CO₂ guilt.” Agreed. Radiative Transfer is a handbag of mathematical idealizations invented 150 years ago. The world is not a gas of photons in thermal equilibrium with the walls of a box.
Just because the baby food we all learned in graduate physics is too feeble to allow us to understand the climate, it doesn’t mean we shouldn’t keep trying. We just have to manage our expectations.
Pat, I am forever impressed with people working on measurements of physical constants because their work has to be both accurate and precise. They have to be able to characterize the uncertainty in unique instrumentation platforms.
Having said that, they sometimes fail. if you’ve never read the Henrion/Baruch paper, you should sometime. There is some amazing discussion about metrology in there that we all should ponder.
Kevin, I read the Henrion/Fischhoff paper long ago. But in a funny/serious context.
Back in the mid-1990’s I participated in long-running debates about science and religion, specifically Creationism.
One of the arguments of young-Earth creationism is that the speed of light declined with time. Faster in the past, slower now.
Distances that require light to travel billions of years now, were traversed in 4000 years then. Hence, the age of the universe is an artifact of a declining ‘c,’ Creation is recent, Genesis is literally true, and the Flood really happened. Creation proved by science. 🙂
The argument was all in a 1981 paper The Velocity of Light and the Age of the Universe by B. Setterfield.
He shows points taken from Figure 1 in Henrion/Fischoff, which provide the experimental estimates of lightspeed made from 1870-1960.
Setterfield fit the points with a declining polynomial and called it real.
One can still find it online at Answers in Genesis, the flagship Creationist site.
Oh, my. And then frozen in print forever.
And thanks for correcting my reference as it is Baruch Fischoff and Max Henrion. My 72 year old brain gets things backward sometimes.
There are two fatal problems with the idea:
Real measurement uncertainties are ignored, and the measured values of c are treated as exact numbers. In the 1800s c could only be estimated.
Because c = 1 / sqrt(epsilon_0 * mu_0), a variable speed of light requires the electric and magnetic permittivities also change over time. The implications of this are immense in that the fundamental structure of the universe must change, including the nature of matter.
“But the whole question is ill-posed.”
Exactly. This is important for folks to understand.
The “forcing” + “feedback” framing of the investigation of a climate system response to incremental non-condensing GHGs was inherently unsound all along, as I see it. Why even bother to compute an average EEI for the whole planet over time? Only to go along with the assumed GHG “forcings.”
The usual EEI diagrams are really nothing but cartoons.
Here is something along the lines of what I have said about metrology, below, but which I figured to be a pretty daunting topic, so I’ll try it out on you first. The uncertainty measurements are almost all eventually turned into (5%,95%) intervals. So, let’s assume that V. Schruckmann et al have a best value and uncertainty that is definitive, 0.76 ± 0.1 W/m2. Then Loeb et al’s best value ,1.0, is 4 sigma away. This would happen with a probability less than one in ten thousand. Now do the same switching the two values. 0.76 is 1.97 standard deviations away from Loeb’s best estimate and according to their (5%,95%) interval which means one would get such a result with probability less than 2.4%. Not as bad as the other way around only because Loeb et al have a broader interval of uncertainty. At any rate, the two results are not very likely in view of one another. Until a better estimate of uncertainty is made, I have trouble interpreting anything.
Frankly, Pat we could have sat down with a few solutions of heat transfer, a table of heat capacities, a few measurements of temperature change over the years, and been able to predict EEI to about the same precision as all this expensive field work has accomplished, and still been faced with the question of “How does it help us to know this?”
In a real measurement uncertainty analysis, all individual contributions to the uncertainty of a single measurement result have to be analyzed and quantified, and then the combined uncertainty calculated. The contributions will include those that can’t be quantified with standard statistics, i.e. with standard deviations of repeated measurements of the same quantity. These non-statistical contribution therefore must be obtained using other methods.
What this means is that the probability distribution of a combined uncertainty nearly always unknown. Standard texts call for calculating the expanded uncertainty by multiplying by 2, and it is commonly assumed this provides a 95% coverage factor. But this really isn’t true.
There is momentum toward reporting only the uncertainty interval endpoints, which reflects how a true value is expected to fall anywhere inside the interval with equal probability.
Climate science is a long way away from this as they normally just ignore non-statistical contributions. This is unethical.
Multiplying by two you mean a coverage factor of two? And then there is the issue of correlation. And that presumes we are working with gaussian errors, Fat tailed/skewed distributions complicate things. It’s hard work as you know.
When you advocate just end points with equal probability between this occurs in manufactiring. I had advised my students to start thinking about Monte Carlo at this point.
Yes, on all points. 2 is the coverage factor specified by ISO 17025 for measurements reported by certified laboratories. Because it is close to 1.96 student’s t, people assume it gives 95%, but it is really just an arbitrary number without a solid link to a probability distribution.
Falling anywhere within the uncertainty interval doesn’t necessarily mean each value has an equal probability. This would imply an uniform probability but many measurement instruments have an asymmetric uncertainty, e.g. the uncertainty in a LIG thermometer is different if the temperature is rising than it is if it is falling. Anything that does digital sampling has a different uncertainty if the value is going up than it does if the value is going down simply because of the “recognition threshold” interval for seeing a change.
Gahhh. I think you should do the experiment, Kevin. Write it up and send it to System Science Data, written in grave scientific prose.
Add as much complex math as you can. Maybe a complex uncertainty analysis. An abuse of your considerable talents, but maybe worthy of Sokal/Bricmont (pdf) 🙂
I dunno. Sokal would be a hard act to emulate. In fact, it was Mermin’s response to Sokal that caused me to drop my AIP membership. Pandering to the inventors of “woke” was more than I could take.
Climate science wouldn’t know an uncertainty budget if it bit them on the butt. Yet they claim to be experts in assessing and calculating “uncertainty”. It is basically done in a backwards fashion. What can they get away with without people pillorying them.
From the above article:
“Second, the number is very small considering that at any given time in the temperate zones incoming solar irradiance might be over 1,100 W/m2 and is highly variable to boot.”
Yes, but it goes even further than that.
As explained at https://en.wikipedia.org/wiki/Solar_irradiance :
“Total solar irradiance (TSI) is a measure of the solar power over all wavelengths per unit area incident on the Earth’s upper atmosphere. It is measured perpendicular to the incoming sunlight. The solar constant is a conventional measure of mean TSI at a distance of one astronomical unit (AU) . . . The average annual solar radiation arriving at the top of the Earth’s atmosphere is about 1361 W/m2. This represents the power per unit area of solar irradiance across the spherical surface surrounding the Sun with a radius equal to the distance to the Earth (1 AU).”
So, first off, does anyone really believe that considering all the variables involved with calculating both the average incoming power flux and the average outgoing power flux, the power balance (it’s a scientific misnomer to say it’s an energy balance) of Earth can be calculated (independent of magnitude) to an uncertainty of ± 0.47 W/m2? Heck that’s a total uncertainty band of 94 parts out of 136,100, or 0.07%! I daresay none of the instruments used to obtain numerical data used in calculating the energy balance has that level of accuracy. And the overall accuracy in a chain of calculations can never be better that worst accuracy-of-measurement involved in that chain!
Next, because the Earth is in an elliptical orbit around the Sun, there are only two instants in a given year when the Earth and Sun are exactly 1 AU apart. Otherwise, the Earth-Sun distance varies continuously from a minimum of 0.983 AU (at perigee) to 1.017 AU (at apogee). This means that there never really is such a thing as an “energy balance” on Earth in terms of incoming energy versus outgoing energy. There is, and always will be, an energy imbalance due to the fact that the Earth has various heat capacitances (primarily the oceans and ice sheets) and a hydrological cycle that together create an unavoidable time phase delay in Earth’s emitted thermal radiation power “attempting” to achieve balance with its incoming net absorbed solar power. If one appeals to averaging over time to obtain numerical comparisons, then such a process will accumulate errors over that same time that drive uncertainties well beyond the claimed ± 0.47 W/m2. The climate forcings on Earth (including its clouds and snow/ice surface coverage, and resulting variable albedo) are just not that repeatable day-to-day, year-to-year, or millennium-to-millennium.
Finally, to first order, Earth’s thermal power emissions are dependent on its radiation temperature raised to the fourth power. I terms of small difference calculation that means a given percentage change in temperature will create about four times the change in thermal radiation power, all other things (such an Earth’s effective emissivity) considered unchanged. The UAH temperature trending (graph in right hand column of the webpage) shows that Earth’s GLAT has increased by about 0.6 °C from the 1980–1985 span to the 2016–2021 span. The global average temperature of the lower atmosphere is about 15 °C, so that 0.6 °C difference represents a 4% change over that time, equivalent to a linear change of 0.1% per year. Multiple that by 4 and we get an equivalent of 0.4% increase in Earth’s radiated power per year. In turn, that’s more than 6 times the above-discussed total uncertainty band of 0.07% that is claimed for a so-called “energy balance” calculation uncertainty. So now look at the variability in that UAH graph of GLAT and tell me over what time span a claim of 0.07% total uncertainty band is credible!
IT’S JUST NOT THERE . . . it’s a totally bogus uncertainty value.
Where the instantaneous flux varies with the square of the inverse-distance. Therefore, using the average distance gives the wrong answer. It effectively treats the flux as varying linearly with distance. To get the correct answer for 365.2 days, requires one to integrate the inverse-distance squared.
Clyde, I never implied or stated that using the average distance gives “an answer”. I am aware that radiation power flux decreases by square of the distance between emitter and receptor . . . I believe that is now taught in high school.
BTW, to get the correct “answer” requires factoring in the TSI variation with the approximate 11-year sunspot cycle (amounting to about 0.1% total variation, or about 1.0 W/m^2). That, and about a dozen other independent variables need be integrated over the interval for which one wants to try to calculate a net power flux imbalance for Earth.
It’s a fool’s errand to try to do such with an end uncertainty of less than about ±5 W/m^2 IMHO.
“That, and about a dozen other independent variables need be integrated over the interval for which one wants to try to calculate a net power flux imbalance for Earth.”
And once you have the flux imbalance you still need to evaluate a polynomial relationship to temperature – where the polynomial factors are mostly unknown!
Ooops, my mistake . . . in second-to-last paragraph I forgot to use absolute temperature scale in calculating delta-change in temperature and related delta change in Stefan-Boltzmann radiation power. The correct calculation is 0.6 K/288 K = 0.21%, yielding a calculated 0.83% change in radiated power. Spread of the 36 years of interval of that change, the equivalent linear rate of change is 0.02% per year. Therefore, even considering the variability seen in the UAH graph, it is not a good argument by itself to dismiss a claimed total uncertainty band of 0.07% on Earth’s calculated power flux imbalance.
My apologies for my mistake. But my bottom line conclusion above remains the same.
if you find an imbalance then your data is wrong … your example of double entry accounting is slightly off becasue you CAN end up with an inbalance … in the case of the planet we know inbound HAS TOO equal outbound … the only way there is an inbalance is bad data entry/fradulent data entry or you aren’t measuring what you think you are measuring …
I understand the whole argument is that the “finding” of an EEI of magnitude +0.5 W/m^2 supports the scientific observations that Earth has been warming for the last 50 or more years (see the UAH plot of GLAT in the right column of this webpage).
It is not true that for Earth (or any other planet for that matter) the inbound power flux has to equal the outbound power flux. How else could Earth cycle through glacial and interglacial periods, let alone Ice Ages?
Drawing a quadratic through those data points is meaningless, there is nothing fundamental that indicates what the shape of a time series should or would be. A sinusoid could also give low fit residuals, and be just as meaningless.
The only reason to do this is to claim an “acceleration” is occurring.
Agreed.
It has been my experience that climatologists do a poor job of providing any uncertainty information, let alone defining what the number means. It seems that when an uncertainty is provided, it is usually 1-sigma rather than the 2-sigma more commonly used in other disciplines.
And the uncertainties are based on the small numbers at the end of the calculation chain. All the uncertainties from the beginning are simply dropped rather than doing the work to propagate them.
I read this and have a hard time separating heat from energy from radiation. The three are not equal. We account for incoming SW that can do work and equate it with outgoing LW which can’t do work.
There is 40 gigatons of CO2 mass being added but that never seems to be accounted for.
This is incorrect. The actual figure is 0.5 ± 0.47 W.m-2.decade-1.
Note that the correct figure has units of decade-1 include since it is the rate of change of EEI and not the EEI itself.
The correct figure for the EEI itself is 0.41 ± 0.48 W.m-2 in mid 2005 and 1.12 ± 0.48 W.m-2 in mid 2019.
[Loeb et al. 2021]
From my discussion…
Not sure what your complaint is.
If it was using Loeb et al. 2021 then why wouldn’t it report 1.12 ± 0.48 W.m-2?
Could very well, but I was using Loeb’s talk from 2022. I don’t know the reason for the evolution from 1,12 to 1.0 between 2021 and 2022, but in speaking about propagation of uncertainty I wanted to use the 0.5 EEI, then the 0.5 change in EEI, to arrive at the new 1.0 EEI — do you see? It’s pedagogical.
Oh yeah, maybe the AI wasn’t using the Loeb et al. 2021 paper and found a different source. It still got the units wrong though.
Most climatology time-series data is non-stationary. It looks similar to drift in noisy electronic instrumentation as it warms up or ages, although that is an unlikely source of the variance. However, that suggests a way to treat the data. If the time-series data are de-trended, it provides some insight on the short-term natural variation (noise) from the calculation of the residual standard deviation or standard error of the residual mean. (That, unfortunately, doesn’t address the problem of the common practice of erroneously claiming to improve the precision by using a large number of measurements from different thermometers measuring different air masses.) The long-term trend is more problematic because what appears to be a linear trend may be a very long period cyclic variation that will eventually reverse. But, calculating the standard deviation of the trend line separately from the original measured data provides a sense of the magnitude of the non-random variations. Working with anomalies doesn’t address this two-part problem.
+100
Harold the Organic Chemist Asks:
How much of the earth’s increase in temperature is due to the black rubber dust and particles from tires? Since ca. 1900, how many billions of pounds of this black rubber material has been released into the environment? The rubber in tires is non biodegradable.
Another source of black carbon is wildfires. In Canada last year, there was a record setting number of wild fires. Presently, in California there are many fires.
Non-biodegradable, but the particles are so small (large surface to mass ratio) aren’t they subject to chemical degradation by reactive oxygen in sunlight?
Rubber tires contain anti-oxidants, UV-protectants, and anti-microbial and
anti-fungal compounds. Once the rubber particles are released into the environment, they are there forever.
Yes, I should have known they’d contain stabilizers of various sorts. Then shouldn’t the shoulders of roads be covered in rubber particles? I don’t know that I observe such.
As an undergrad I attended a seminar where a visiting chemist talked about tire rubber dust. “Somewhere, it has to go,” he said.
Turns out, most of it goes into gutters and washes down the sewers. From there, eventually out to sea. So, I doubt it has much impact on the climate.
Well, in Wyoming its a long way to the sea, so it can’t do much else than blow into Nebraska.
uncertainty?
There’s no uncertainty in climate science
Uncertainty is for old fogie scientists and losers.
Everything is certain and stated with great confidence.
There is always an energy imbalance. That’s why earth is always cooling or warming.
The measurements are not precise. There are almost no measurements of half the ocean volume (over 2000 meters deep). But everyone “knows” the official number.
The planet is going to be warmer in 100 years, unless it is cooler. We ought to move on to solving real problems such as wars, illegal immigration, crime, people trying to assassinate Trump. a dementia president and a dingbat running for president. They are all more important than warmer winters from CO2 emissions, which is not even a problem!.
“a dementia president”
You are a very confused little child, aren’t you, RG
Biden got kicked out.
Please try to keep up. !!
Warmer winters come from URBAN warming, not CO2.
who exactly do you think is the current president of the US? It is Biden and he has not been kicked out.
There a HUGE errors in the process defined. What about the energy from the sun that goes into plants and not into heating the Earth?
As an example. A gas pipe goes past an industrial estate. You can measure how much is in the pipe on the way in , (energy from the sun). You can measure how much remains in the gas main after the industrial estate, (reflected energy, including LWIR).
The assessments offered in the science papers makes assumptions that any energy absorbed by the Earth must be used to make it warmer and hence an imbalance is a bad thing.
Now, look back at the industrial estate, one small factory is taking the gas and making polyethylene pellets, they are fabricated into long storage products and not burned to make heat. If they make more this year than last year, it would be very WRONG to assume that the estate is getting hotter.
And the Earth is getting greener, more incoming energy is being used to make long term stored carbon, not heat.
So why the panic. The Earth is greening, that seems to be provable. Where is the energy calculation about that? And how does it align with the reported EEI? What if the Earth is greening at a rate that suggests cooling? ie EEI is negative when you count the extra greening?
I had thought about looking at increasing primary production as another reservoir, but photosynthesis is not very efficient and probably amounts to less than geothermal heat which is itself miniscule compared to SW and LWIR flux. Part II, when I get to writing it will look at some issues like this.
Wiki says that photosynthesis is between 3 and 6% efficient. So let’s say the plants get 300W of sunlight on a cloudy day, that means 10 to 20 watts of energy is being absorbed by the process.
So even on a cloudy day, this process is 20 to 40 times the EEI value as noted in the article. If the amount of green matter is not being assessed, especially noting the enhanced greening, then surely this CAN be responsible for the entire EEI value as reported, (and maybe more).
I’d say that the issue needs to be resolved and not just dismissed. A greening world has the potential to be very significant with regard to the EEI.
Looking forward to how you cover this.
Yes, except this energy ends up being lost to seasonal decay of greenery and is the source of that little wiggle atop the Mauna Loa curve of CO2. Is the amplitude of that wiggle getting larger? I’d like to know.
Remember all that coal and oil we’re burning. That was in excess when it was growing too.
Still think it’s trivial?
I think you may be hoping/trying to say that all green matter is “steady state” and hence can be ignored. I think this is a gross over simplification. Who knows, we could be in a growth stage now in the same magnitudes as the carboniferous. The greening of the Earth is real. It doesn’t all wash out in autumn/winter.
If you look at the areas that are greener now, it’s huge, the 0.5 out 1000 is 1 part in 2000. I’d be surprised if the Earth’s surface AND it’s oceans aren’t showing growth in advance of 1 part in 2000. Even rudimentary science shows that plant growth is massively increased in a high CO2 environment.
The EEI is one part in 2000. Greening measurements swamp that. Ignoring it or sweeping it under the carpet because it’s not your pet subject is just ignoring the elephant in the room. Is that the plan. Just ignore it and hope it’s noise?
The little wiggle is assumed to be due to grow and decay of flora. No where is the ocean temperature measured and the amount of CO2 absorbed/released by the ocean is fundamentally defined by Henry’s Law, basically temperature.
I have been in Hawaii in all 4 seasons and the island temperatures are not constant from season to season. My conclusion is that at least some of the seasonal wiggle is due to the ocean.
Very nice Kevin. I have a problem with things like this:
“More concerning, though, is the discussion on algorithms to improve this circumstance for purposes of climate modeling by, in one case, making adjustments to LWIR radiance measurement and also different adjustments to SW radiance measurements.”
I don’t understand how they can make adjustments to measurements for purposes of modeling. If the measurement was correct why would it need to be adjusted?
To ask the question is to answer it, I think. There are many things I learned that have made me leery of the value of these “very important numbers”.
Someone buy this man a beer. He deserves it. I would if I knew him and then we would laugh at the idiotic comments by the warmunistas.