What do you mean by “mean”: an essay on black boxes, emulators, and uncertainty

Guest post by Richard Booth, Ph.D

References:

[1] https://wattsupwiththat.com/2019/09/07/propagation-of-error-and-the-reliability-of-global-air-temperature-projections-mark-ii/

[2] https://wattsupwiththat.com/2019/10/15/why-roy-spencers-criticism-is-wrong  

  1. Introduction

I suspect that we can all remember childish arguments of the form “person A: what do you mean by x, B: oh I really mean y, A: but what does y really mean, B: oh put in other words it means z, A: really (?), but in any case what do you even mean by mean?”  Then in adult discussion there can be so much interpretation of words and occasional, sometimes innocent, misdirection, that it is hard to draw a sound conclusion.  And where statistics are involved, it is not just “what do you mean by mean” (arithmetic?, geometric?, root mean square?) but “what do you mean by error”, “what do you mean by uncertainty” etc.etc.?

Starting in the late summer of 2019 there were several WUWT postings on the subject of Dr. Pat Frank’s paper [1], and they often seemed to get bogged down in these questions of meaning and understanding.  A good deal of progress was made, but some arguments were left unresolved, so in this essay I revisit some of the themes which emerged.  Here is a list of sections:

B. Black Box and Emulator Theory – 2.3 pages (A4)

C. Plausibility of New Parameters – 0.6 pages

D. Emulator Parameters – 1.5 pages

E. Error and Uncertainty – 3.2 pages

F. Uniform Uncertainty (compared to Trapezium Uncertainty) – 2.5 pages

G. Further Examples – 1.2 pages

H. The Implications for Pat Frank’s Paper – 0.6 pages

I. The Implications for GCMs – 0.5 pages

Some of those sections are quite long, but each has a summary at its end, to help readers who are short of time and/or do not wish to wade through a deal of mathematics.  The length is unfortunately necessary to develop interesting mathematics around emulators and errors and uncertainty, whilst including examples which may shed some light on the concepts.  There is enough complication in the theory that I cannot guarantee that there isn’t the odd mistake.  When referring to [1] or [2] including their comments sections, I shall refer to Dr. Frank and Dr. Roy Spencer by name, but put the various commenters under the label “Commenters”.

I am choosing this opportunity to “come out” from behind my blog name “See – Owe to Rich”.  I am Richard Booth, Ph.D., and author of “On the influence of solar cycle lengths and carbon dioxide on global temperatures”.  Published in 2018 by the Journal of Atmospheric and Solar-Terrestrial Physics (JASTP), it is a rare example of a peer-reviewed connection between solar variations and climate which is founded on solid statistics, and is available at https://doi.org/10.1016/j.jastp.2018.01.026 (paywalled)  or in publicly accessible pre-print form at https://github.com/rjbooth88/hello-climate/files/1835197/s-co2-paper-correct.docx . I retired in 2019 from the British Civil Service, and though I wasn’t working on climate science there, I decided in 2007 that as I had lukewarmer/sceptical views which were against the official government policy, alas sustained through several administrations, I should use the pseudonym on climate blogs whilst I was still in employment.

  • Black Box And Emulator Theory

Consider a general “black box”, which has been designed to estimate some quantity of interest in the past, and to predict its value in the future.  Consider also an “emulator”, which is an attempt to provide a simpler estimate of the past black box values and to predict the black box output into the future.  Last, but not least, consider reality, the actual value of the quantity of interest.

Each of these three entities,

  •  black box
  • emulator
  • reality

can be modelled as a time series with a statistical distribution.  They are all numerical quantities (possibly multivariate) with uncertainty surrounding them, and the only successful mathematics which has been devised for analysis of such is probability and statistics.  It may be objected that reality is not statistical, because it has a particular measured value.  But that is only true after the fact, or as they say in the trade, a posteriori.  Beforehand, a priori, reality is a statistical distribution of a random variable, whether the quantity be the landing face of the die I am about to throw or the global HadCRUT4 anomaly averaged across 2020.

It may also be objected that many black boxes, for example Global Circulation Models, are not statistical, because they follow a time evolution with deterministic physical equations.  Nevertheless, the evolution depends on the initial state, and because climate is famously “chaotic”, tiny perturbations to that state, lead to sizeable divergence later.  The chaotic system tends to revolve around a small number of attractors, and the breadth of orbits around each attractor can be studied by computer and matched to statistical distributions.

The most important parameters associated with a probability distribution of a continuous real variable are the mean (measure of location) and the standard deviation (measure of dispersion).  So across the 3 entities there are 6 important parameters; I shall use E[] to denote expectation or mean value, and Var[] to denote variance which is squared standard deviation.  What relationships between these 6 allow the defensible (one cannot assert “valid”) conclusion that the black box is “good”, or that the emulator is “good”? 

In general, since the purpose of an emulator is to emulate, it should do that with as high a fidelity as possible.  So for an emulator to be good, it should, like the Turing Test of whether a computer is a good emulator of a human, be able to display a similar spread/deviation/range of the black box as well as the mean/average component.  Ideally one would not be able to tell the output of one from that of the other.

To make things more concrete, I shall assume that the entities are each a uniform discrete time series, in other words a set of values evenly spaced across time with a given interval, such as a day, a month, or a year.  Let:

  X(t) be the random variable for reality at integer time t;

  M(t) be the random variable for the black box Model;

  W(t) be the random variable for some emulator (White box) of the black box

  Ri(t) be the random variable for some contributor to an entity, possibly an error term.

 Now choose a concrete time evolution of W(t) which does have some generality:

  • W(t) = (1-a)W(t-1) + R1(t) + R2(t) + R3(t) where 0 ≤ a ≤ 1

The reason for the 3 R terms will become apparent in a moment.  First note that the new value W(t) is partly dependent on the old one W(t-1) and partly on random Ri(t) terms.  If a=0 then there is no decay, and a putative flap of a butterfly’s wings contributing to W(t-1) carries on undiminished to perpetuity.  In Section C I describe how the decaying case a>0 is plausible.

R1(t) is to be the component which represents changes in major causal influences, such as the sun and carbon dioxide.  R2(t) is to be a component which represents a strong contribution with observably high variance, for example the Longwave Cloud Forcing (LCF).  Some emulators might ignore this, but it could have a serious impact on how accurately the emulator follows the black box.  R3(t) is a putative component which is negatively correlated with R2(t) with coefficient -r, with the potential (dependent on exact parameters) to mitigate the high variance of R2(t).  We shall call R3(t) the “mystery component”, and its inclusion is justified in Section C.

Equation (1) can be “solved”, i.e. the recursion removed, but first we need to specify time limits.  We assume that the black box was run and calibrated against data from time 0 to the present time P, and then we are interested in future times P+1, P+2,… up to F. The solution to Equation (1) is

  • W(t) = ∑i=0t (1-a)i(R1(t-i) + R2(t-i) + R3(t-i)) + (1-a)t W(0)

The expectation of W(t) depends on the expectations of each Rk(t), and to make further analytical progress we need to make assumptions about these.  Specifically, assume that

  • E[R1(t)] = bt+c, E[R2(t) = d], E[R3(t)] = 0

Then a modicum of algebra derives

  • E[W(t)] = b(at + a-1 + (1-a)t+1)/a2 + (c+d)(1 – (1-a)t)/a + (1-a)t W(0)

In the limit as a tends to 0, we get the special case

  • E[W(t)] = bt(t+1)/2 + (c+d)t + W(0)

Next we consider variance, with the following assumptions:

  • Var[Rk(t)] = sk2, Cov[R2(t),R3(t)] = -r s2 s3, all other covariances, within or across time, are 0, so
  • Var[W(t)] = (s12+s22+s32-2r s2 s3)(1 – (1-a)2t)/(2a-a2)

and as a tends to zero the last two parentheses tend to t (implying variance increases linearly with t).

Summary of section B:

  • A good emulator can mimic the output of the black box.
  • A fairly general iterative emulator model (1) is presented.
  • Formulae are given for expectation and variance of the emulator as a function of time t and various parameters.
  • The 2 extra parameters, a, and R3(t), over and above those of Pat Frank’s emulator, can make a huge difference to the evolution.
  • The “magic” component R3(t) with anti-correlation -r to R2(t) can greatly reduce model error variance whilst retaining linear growth in the absence of decay.
  • Any decay rate a>0 completely changes the propagation of error variance from linear growth to convergence to a finite limit.
  • Plausibility Of New Parameters

The decaying case a>0 may at first sight seem implausible.  But here is a way it could arise.  Postulate a model with 3 main variables, M(t) the temperature, F(t) the forcing, and H(t) the heat content of land and oceans.  Let

  M(t) = b + cF(t) + dH(t-1)

(Now by the Stefan-Boltzmann equation M should be related to F1/4 , but locally it can be linearized by a binomial expansion.)  The theory here is that temperature is fed both by instantaneous radiative forcing F(t) and by previously stored heat H(t-1).  (After all, climate scientists are currently worrying about how much heat is going into the oceans.)  Next, the heat changes by an amount dependent on the change in temperature:

  H(t-1) = H(t-2) + e(M(t-1)-M(t-2)) = H(0) + e(M(t-1)-M(0))

Combining these two equations we get

  M(t) = b + cF(t) + d(H(0) + e(M(t-1)-M(0)) = f + cF(t) + (1-a)M(t-1)

where a = 1-de, f = b+dH(0)-deM(0).  This now has the same form as Equation (1); there may be some quibbles about it, but it shows a proof of concept of heat buffering leading to a decay parameter.

For the anti-correlated R3(t), consider reference [2]. Roy Spencer, who has serious scientific credentials, had written “CMIP5 models do NOT have significant global energy imbalances causing spurious temperature trends because any model systematic biases in (say) clouds are cancelled out by other model biases”.  This means that in order to maintain approximate Top Of Atmosphere (TOA) radiative balance, some approximate cancellation is forced, which is equivalent to there being an R3(t) with high anti-correlation to R2(t).  The scientific implications of this are discussed further in Section I.

Summary of Section C:

  • A decay parameter is justified by a heat reservoir.
  • Anti-correlation is justified by GCMs’ deliberate balancing of TOA radiation.
  • Emulator Parameters

Dr. Pat Frank’s emulator falls within the general model above.  The constants from his paper, 33K, 0.42, 33.3 Wm-2, and +/-4 Wm-2, the latter being from errors in LCF, combine to give 33*0.42/33.3 = 0.416 and 0.416*4 = 1.664 used here. So we can choose a = 0, b = 0, c+d = 0.416 F(t) where F(t) is the new GHG forcing (Wm-2) in period t, s1=0, s2=1.664, s3=0, and then derive

  • W(t) = (c+d)t + W(0) +/- sqrt(t) s2

(I defer discussion of the meaning of the +/- sqrt(t) s2, be it uncertainty or error or something else, to Section D.  Note that F(t) has to be constant to directly use the theory here.)

But by using more general parameters it is possible to get a smaller value of the +/- term.  There are two main ways to do this – by covariance or by decay, each separately justified in Section C.

In the covariance case, choose s3 = s2 and r = 0.95 (say).  Then in this high anti-correlation case, still with a = 0, Equation (7) gives

  • Var[W(t)] = 0.1s22t  (instead of s22t)

In the case of decay but no anti-correlation, a > 0 and s3 = 0 (so R3(t) = 0 with probability 1).  Now, as t gets large, we have

  • Var[W(t)] = (s12+s22)/(2a-a2)

so the variance does not increase without limit as in the a =0 case.  But with a > 0, the mean also changes, and for large t Equation (4) implies it is

  • E[W(t)] ~ bt/a + (b+c+d-b/a)/a

Now if we choose b = a(c+d) then that becomes (c+d)(t+1), which is fairly indistinguishable from the (c+d)t in Equation (8) derived from a=0, so we have derived a similar expectation but a smaller variance in Equation (10).

To streamline the notation, now let the parameters a, b, c, d, r be placed in a vector u, and let

  • E[W(t)] = mw(t;u),  Var[W(t)] = sw2(t;u)

(I am using a subscript ‘w’ for statistics relating to W(t), and ‘m’ for those relating to M(t).)  With 4 parameters (a, b, c+d, r) to set here, how should we choose the “best”?  Well, comparisons of W(t) with M(t) and X(t) can be made, the latter just in the calibration period t = 1 to t = P.  The nature of comparisons depends on whether or not just one, or many, observations of the series M(t) are available.

Case 1: Many series

With a deterministic black box, many observed series can be created if small perturbations are made to initial conditions and if the evolution of the black box output is mathematically chaotic.  In this case, a mean mm(t) and a standard deviation sm(t) can be derived from the many series.  Then curve fitting can be applied to mw(t;u) – mm(t) and sw(t;u) – sm(t) by varying u.  Something like Akaike’s Information Criterion (AIC) might be used for comparing competing models.  But in any case it should be easy to notice whether sm(t) grows like sqrt(t), as in the a=0 case, or tends to a limit, as in the a>0 case.

Case 2: One series

If chaotic evolution is not sufficient to randomize the black box, or if the black box owner cannot be persuaded to generate multiple series, there may be only one observed series m(t) of the random variable M(t).  In this case Var[M(t)] cannot be estimated unless some functional form, such as g+ht, is assumed for mm(t), when (m(t)-g-ht)2 becomes a single observation estimate of Var[M(t)] for each t, allowing an assumed constant variance to be estimated.  So some progress in fitting W(t;u) to m(t) may still be possible in this case.

Pat Frank’s paper effectively uses a particular W(t;u) (see Equation (8) above) which has fitted mw(t;u) to mm(t), but ignores the variance comparison.  That is, s2 in (8) was chosen from an error term from LCF without regard to the actual variance of the black box output M(t).

Summary of section D:

  • Pat Frank’s emulator model is a special case of the models presented in Section B, where error variance is given by Equation (7).
  • More general parameters can lead to lower propagation of error variance over time (or indeed, higher).
  • Fitting emulator mean to black box mean does not discriminate between emulators with differing error variances.
  • Comparison of emulator to randomized black box runs can achieve this discrimination.
  • Error and uncertainty

In the sections above I have made scant reference to “uncertainty”, and a lot to probability theory and error distributions.  Some previous Commenters repeated the mantra “error is not uncertainty”, and this section addresses that question.  Pat Frank and others referred to the following “bible” for measurement uncertainty

https://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf ; that document is replete with references to probability theory.  It defines measurement uncertainty as a parameter which is associated with the result of a measurement and that characterizes the dispersion of the values that could reasonably be attributed to the measurand.  It acknowledges that the dispersion might be described in different ways, but gives standard deviations and confidence intervals as principal examples.  The document also says that that definition is not inconsistent with two other definitions of uncertainty, which include the difference between the measurement and the true value. 

Here I explain why they might be thought consistent, using my notation above.  Let M be the measurement, and X again be the true value to infinite precision (OK, perhaps only to within Heisenberg quantum uncertainty.)  Then the JCGM’s main definition is a parameter associated with the statistical distribution of M alone, generally called “precision”, whereas the other two definitions are respectively a function of M-X and a very high confidence interval for X.  Both of those include X, and are predicated on what is known as the “accuracy” of the measurement of M.  (The JCGM says this is unknowable, but does not consider the possibility of a different and highly accurate measurement of X.)  Now, M-X is just a shift of M by a constant, so the dispersion of M around its mean is the same as the dispersion of M-X around its mean.  So provided that uncertainty describes dispersion (most simply measured by variance) and not location, they are indeed the same.  And importantly, the statistical theory for compounding variance is the same in each case.

Where does this leave us with respect to error versus uncertainty?  Assuming that X is a single fixed value, then prior to measurement, M-X is a random variable representing the error, with some probability distribution having mean mm-X and standard deviation sm.  b = mm-X is known as the bias of the measurement, and +/-sm is described by the JCGM 2.3.1 as the “standard” uncertainty parameter.  So standard uncertainty is just the s.d. of error, and more general uncertainty is a more general description of the error distribution relative to its mean.

There are two ways of finding out about sm: by statistical analysis of multiple measurements (if possible) or by appealing to an oracle, such as the manufacturer of the measurement device, who might supply information over and beyond the standard deviation.  In both cases the output resolution of the device may have some bearing on the matter. 

However, low uncertainty is not of much use if the bias is large.  The real error statistic of interest is E[(M-X)2] = E[((M-mm)+(mm-X))2] = Var[M] + b2, covering both a precision component and an accuracy component.

Sometimes the uncertainty/error in a measurement is not of great consequence per se, but feeds into a parameter of a mathematical model and thence into the output of that model.  This is the case with LCF feeding into radiative forcings in GCMs and then into temperature, and likewise with Pat Frank’s emulator of them.  But the theory of converting variances and covariances of input parameter errors into output error via differentiation is well established, and is given in Equation (13) of the JCGM.

To illuminate the above, we now turn to some examples, principally provided by Pat Frank and Commenters.

Example 1: The 1-foot end-to-end ruler

In this example we are given a 1-foot ruler with no gaps at the ends and no markings, and the manufacturer assures us that the true length is 12”+/-e”; originally e = 1 was chosen, but as that seems ridiculously large I shall choose e = 0.1 here.  So the end-to-end length of the ruler is in error by up to 0.1” either way, and furthermore the manufacturer assures us that any error in that interval is equally likely. I shall repeat a notation I introduced in an earlier blog comment, which is to write 12+/-_0.1 for this case, where the _ denotes a uniform probability distribution, instead of a single standard deviation for +/-.  (The standard deviation for a random variable uniform in [-a,a] is a/sqrt(3) = 0.577a, so b +/-_ a and b +/- 0.577a are loosely equivalent, except that the implicit distributions are different.  This is covered in the JCGM, where “rectangular” is used in place of “uniform”.)

Now, I want to build a model train table 10 feet long, to as high an accuracy as my budget and skill allow.  If I have only 1 ruler, it is hard to see how I can do better than get a table which is 120+/-_1.0”.  But if I buy 10 rulers (9 rulers and 1 ruler to rule them all would be apt if one of them was assured of accuracy to within a thousandth of an inch!), and I am assured by the manufacturer that they were independently machined, then by the rule of addition of independent variances, the uncertainty in the sum of the lengths is sqrt(10) times the uncertainty of each.

So using all 10 rulers placed end to end, the expected length is 120” and the standard deviation (uncertainty) gets multiplied by sqrt(10) instead of 10 for the single ruler case, an improvement by a factor of 3.16.  The value for the s.d. is 0.0577 sqrt(10) = 0.183”.   

To get the exact uncertainty distribution we would have to do what is called convolving of distributions to find the distribution of the sum_1^10 (X_i-12).  It is not a uniform distribution, but looks a little like a normal distribution under the Central Limit Theorem. Its “support” is not of course infinite, but is the interval (-1”,+1”), but it does tail off smoothly at the edges.  (In fact, recursion shows that the probability of it being less than (-1+x), for 0<x<0.2, is (5x)10/10!   That ! is a factorial, and with -1+x = -0.8 it gives the small probability of 2.76e-7, a tiny chance of it being in the extreme 1/5 of the interval.)

Now that seemed like a sensible use of the 10 rulers, but oddly enough it isn’t the best use.  Instead, sort them by length, and use the shortest and longest 5 times over.  We could do this even if we bought n rulers, not equal to 10.  We know by symmetry that the shortest plus longest has a mean error of 0, but calculating the variance is more tricky.

The error of the ith shortest ruler, plus 0.1, times 5, say Yi, has a Beta distribution (range from 0 to 1) with parameters (i, 101-i).  The variance of Yi is i(n+1-i)/((n+1)2(n+2)), which can be found at https://en.wikipedia.org/wiki/Beta_distribution .  Now

  Var[Y1 + Yn] = 2(Var[Y1]+Cov[Y1,Y100 ]) by symmetry.

Unfortunately that Wikipedia page does not give that covariance, but I have derived this to be

  • Cov[Yi,Yj] = i(n+1-j) / [(n+2)(n+1)2] if i <= j, so
  • Var[Y1 + Yn] = 2(n+1) / [(n+2)(n+1)2] = 2 / [(n+2)(n+1)]

Using the two rulers 5 times multiplies the variance by 25, but removing the scaling of 5 in going from ruler to Yi cancels this.  So (14) is also the variance of the error of our final measurement.

Now take n = 10 and we get uncertainty = square root of variance = sqrt(2/132) = 0.123”, which is less than the 0.183” from using all 10 rulers.  But if we were lavish and bought 100 rulers, it would come down to sqrt(2/10302) = 0.014”.

Having discovered this trick, it would be tempting to extend it and use (Y1 + Y2 + Yn-1 + Yn)/2.  But this doesn’t help, as the variance for that is (5n+1)/[2(n+2)(n+1)2], which is bigger than (14). 

I confess it surprised me that it is better to use the extremal rulers rather than the mean of them all. But I tested the mathematics both by Monte Carlo and by calculating the variance of the sum of n sorted rulers via (13) with the sum of n unsorted rulers, and for n=10 they agreed exactly.  I think the effectiveness of the method is because the variance of the extremal rulers is small because those lengths bump up against the hard limit from the uniform distribution.

That inference is confirmed by Monte Carlo experiments with, in addition to the uniform, a triangular and a normal distribution for Yi, still wanting a total length of 10 rulers, but having acquired n=100 of them.  The triangular has the same range as the uniform, and half the variance, and the normal has the same variance as the uniform, implying that the endpoints of the uniform represent +/-sqrt(3) standard deviations for the normal, covering 92% of its distribution.

In the following table 3 subsets of the 100 are considered, pared down from a dozen or so experiments.  Each subset is optimal, within the experiments tried, for one or more distribution (starred).  A subset a,b,c,… means that the a-th shortest and longest rulers are used, and the b-th shortest and longest etc. The fraction following the distribution is the variance of a single sample.  The decimal values are variances of the total lengths of the selected rulers then scaled up to 10 rulers.

               v  a  r  i  a  n  c  e  s      

dist\subset    1         1,12,23,34,45 1,34

U(0,1)    1/12 0.00479*  0.0689        0.0449

N(0,1/12) 1/12 0.781     0.1028*       0.2384

T(0,1)    1/24 0.0531    0.0353        0.0328*

We see that by far the smallest variance, 0.00479, occurs if we are guaranteed a uniform distribution, by using a single extreme pair, but that strategy isn’t optimal for the other 2 distributions.  5 well-spaced pairs are best for the normal, and quite good for the triangular, though the latter is slightly better with 2 well-spaced pairs.

Unless the manufacturer can guarantee the shape of the error distribution, assumption that it is uniform would be quite dangerous in terms of choosing a strategy for the use of the available rulers. 

Summary of Section E:

  • Uncertainty should properly be thought of as the dispersion of a distribution of random variables, possibly “hidden”, representing errors, even though that distribution might not be fully specified.
  • In the absence of clarification, a +/-u uncertainty value should be taken as one standard deviation of the error distribution.
  • The assumption, probably through ignorance, that +/-u represents a sharply bounded uniform (or “rectangular”) distribution, allows clever tricks to be played on sorted samples yielding implausibly small variances/uncertainties.
  • The very nature of errors being compounded from multiple sources supports the idea that a normal error distribution is a good approximation.
  • Uniform Uncertainty (compared to Trapezium Uncertainty)

As an interlude between examples, in this section we study further implications of a uniform uncertainty interval, most especially for a digital device.  By suitable scaling we can assume that the possible outputs are a complete range of integers, e.g. 0 to 1000.  We use Bayesian statistics to describe the problem.

Let X be a random variable for the true infinitely precise value which we attempt to measure.

Let x be the value of X actually occurring at some particular time.

Let M be our measurement, a random variable but including the possibility of zero variance.  Note that M is an integer.

Let D be the error, = M – X.

Let f(x) be a chosen (Bayesian) prior probability density function (p.d.f.) for X, P[X’=’x].

Let g(y;x) be a probability function (p.f.) for M over a range of integer y values, dependent on x, written g(y;x) = P[M=y | X’=’x]  (the PRECISION distribution).

Let c be a “constant” of proportionality, determined in each separate case by making relevant probabilities add up to 1.  Then after measurement M, the posterior probability for X taking the value z is, by Bayes’ Theorem,

  • P[X’=’x | M=y]  =  P[M=y | X’=’x] P[X’=’x] / c = g(y;x) f(x) / c

Usually we will take f(x) = P[X ‘=’ x] to be an “uninformative” prior, i.e. uniform over a large range bound to contain x, so it has essentially no influence.  In this case,

  • P[X’=’x | M=y] = g(y;x)/c where c = int g(y;x)dx (the UNCERTAINTY distribution).

Then P[D=z | M=y] = P[X=M-z | M=y] = g(y;y-z)/c.  Now assume that g() is translation invariant, so g(y;y-z) = g(0;-z) =: c h(z) defines function h(), and int h(z)dz = 1.  Then

  • P[D=z | M=y] = h(z), independent of y (ERROR DISTRIBUTION = shifted u.d.).

In addition to this distribution of error given observation, we may also be interested in the distribution of error given the true (albeit unknown) value.  (It took me a long time to work out how to evaluate this.)

Let A be the event {D = z}, B be {M = y}, C be {X = x}.  These events have a causal linkage, which is that they can simultaneously occur if and only if z = y-x.  And when that equation holds, so z can be replaced by y-x, then given that one of the events holds, either both or none of the other two occur, and therefore they have equal probability.  It follows that:

  P[A|C] = P[B|C] = P[C|B]P[B]/P[C] = P[A|B]P[B]/P[C]

  • P[D = z = y-x | X = x] = P[D = y-x | M = y] P[M = y]/P[X = x]

Of the 3 terms on the RHS, the first is h(y-x) from Equation (17), the third is f(x) from Equation (15), and the second is a new prior.  This prior must be closely related to f(), which we took to be uninformative, because M is an integer value near to X.  The upshot is that under these assumptions the LHS is proportional to h(y-x), so

  • P[D = y-x | X = x] = h(y-x)/∑i h(i-x)

Let x’ be the nearest integer to x, and a = x’-x, lying in the interval [-1/2,1/2).  Then y-x = y+a-x’ = a+k where k is an integer.  Then the mean m and variance s2 of D given X=x are:

  • m = ∑k (a+k)h(a+k) / ∑k h(a+k); s2 = ∑k (a+k-m)2h(a+k) / ∑k h(a+k)

A case of obvious interest would be an uncertainty interval which was +/-e uniform. That would correspond to h(z) = 1/(2e) for b-e < z < b+e and 0 elsewhere, where b is the bias of the error. We now evaluate the statistics for the case b = 0 and e ≤ 1.  The symmetry in a means that we need only consider a > 0.  –e < a+k < e implies that -e-a < k < e-a.  e < ½ implies there is an a slightly bigger than e such that no integer k is in the interval, which is impossible,so e is at least ½.  Since h(z) is constant over its range, in (20) cancellation allows us to replace h(a+k) with 1.

  • If a < 1-e then only k=0 is possible, and m = a, s2 = 0.
  • If a > 1-e then k=-1 and k=0 are both possible, and m = a – ½ , s2 = ¼ .

When s2 is averaged over all a we get 2(e-1/2)(1/4) = (2e-1)/4.

It is not plausible for e to be ½, for then s2 would be 0 whatever the fractional part a of x was. Since s2 is the variance of M-X given X=x, that implies that M is completely determined by X.  That might sound reasonable, but in this example it means that as X changes from 314.499999 to 314.500000, M absolutely has to flip from 314 to 315, and that implies that the device, despite giving output resolution to an integer, actually has infinite precision, and is therefore not a real device.

For e > ½, s2 is zero for a in an interval of width 2-2e, and non-zero in two intervals of total width 2e-1.  In these intervals for a (translating to x), it is non-deterministic as to whether the output M is 314, say, or 315.

In Equations (21) and (22) there is a disconcerting discontinuity in the expected error from 1-e at a = (1-e)- to (1/2-e) at a = (1-e)+.  This arises from the cliff edge in the uniform h(z).  More sophisticated functions h(z) do not exhibit this feature, such as a normal distribution, a triangle distribution, or a trapezium distribution such as:

  (23)  h(z) =

{ 2(z+3/4) for -3/4<z<-1/4

{ 1 for -1/4<z<1/4

{ 2(3/4-z) for 1/4<z<3/4

For this example we find

  (24)  if 0<a<1/4, m = a and s2 = 0,

           if 1/4<a<3/4, m = 1/2-a, s2 = 4(a-1/4)(3/4-a) <= 1/4

Note that the discontinuity previously noted does not occur here, as m is a continuous function of a even at a=1/4. The averaged s2 is 1/12, less than the 1/8 from the U[-3/4,3/4] distribution. 

All the above is for a device with a digital output, presumed to change slowly enough to be read reliably by a human.  In the case of an analogue device, like a mercury thermometer, then a human’s reading of the device provides an added error/uncertainty.  The human’s reading error is almost certainly not uniform (we can be more confident when the reading is close to a mark than when it is not), and in any case the sum of instrument and human error is almost certainly not uniform.

Summary of section F:

  • The PRECISION distribution, of an output given the true state, induces an ERROR distribution given some assumptions on translation invariance and flat priors.
  • The range of supported values of the error distribution must exceed the output resolution width, since otherwise infinite precision is implied.
  • Even when that criterion is satisfied, the assumption of a uniform ERROR distribution leads to a discontinuity in mean error as a function of the true value.
  • A corollary is that if your car reports ambient temperature to the nearest half degree, then sometimes, even in steady conditions, its error will exceed half a degree.
  • Further examples

Example 2: the marked 1-foot ruler

In this variant, the rulers have markings and an indeterminate length at each end.  Now multiple rulers cannot usefully be laid end to end, and the human eye must be used to judge and mark the 12” positions.  This adds human error/uncertainty to the measurement process, which varies from human to human, and from day to day.  The question of how hard a human should try in order to avoid adding significant uncertainty is considered in the next example.

Example 3: Pat Frank’s Thermometer

Pat Frank introduced the interesting example of a classical liquid-in-glass (LiG) thermometer whose resolution is (+/-)0.25K.  He claimed that everything inside that half-degree interval was a uniform blur, but went on to explain that the uncertainty was due to at least 4 things, namely the thermometer capillary is not of uniform width, the inner surface of the glass is not perfectly smooth and uniform, the liquid inside is not of constant purity, the entire thermometer body is not at constant temperature.  He did not include the fact that during calibration human error in reading the instrument may have been introduced.  So the summation of 5 or more errors implies (except in mathematically “pathological” cases) that the sum is not uniformly distributed.  In fact a normal distribution, perhaps truncated if huge errors with infinitesimal probability are unpalatable, makes much more sense.

The interesting question arises as to what the (hypothetical) manufacturers meant when they said the resolution was +/-0.25K.  Did they actually mean a 1-sigma, or perhaps a 2-sigma, interval?  For deciding how to read, record, and use the data from the instrument, that information is rather vital.

Pat went on to say that a temperature reading taken from that thermometer and written as, e.g., 25.1 C, is meaningless past the decimal point. (He didn’t say, but presumably would consider 25.5 C to be meaningful, given the half-degree uncertainty interval.)  But this isn’t true; assuming that someone cares about the accuracy of the reading, it doesn’t help to compound instrumental error with deliberate human reading error.  Suppose that the error variance of the instrument actually corresponds to 2-sigma, as the manufacturer wanted to give a reasonably firm bound, then the variance is ((1/2)(1/4))2 = 1/64.  If t2 is the error variance of the observer, then the final variance is 1/64+t2

The observer should not aim for a ridiculously low t, even if achievable, and perhaps a high t is not so bad if the observations are not that important.  But beware: observations can increase in importance beyond the expectations of the observer.  For example we value temperature observations from 1870 because they tell us about the idyllic pre-industrial, pre-climate change, world!  In the present example, I would recommend trying for t2 = 1/100, or as near as can be achieved within reason.  Note that if the observer can manage to read uniformly within +/-0.1 C, then that means t2 =  1/300.   But if instead she reads to within +/-0.25, t2 = 1/48 and the overall variance is multiplied by (1+64/48) = 7/3 ~ 1.52, which is a significant impairment of accuracy precision. 

The moral is that it is vital to know what uncertainty variance the manufacturer really believes to be the case, that guidelines for observers should then be appropriately framed, and that sloppiness has consequences.

Summary of Section G:

  • Again, real life examples suggest the compounding of errors, leading to approximately normal distributions.
  • Given a reference uncertainty value from an analogue device, if the observer has the skill and time and inclination then she can reduce overall uncertainty by reading to a greater precision than the reference value.
  • The implications for Pat Frank’s paper

The implication of Section B is that a good emulator can be run with pseudorandom numbers and give output which is similar to that of the black box.  The implication of Section D is that uncertainty analysis is really error analysis and good headway can be made by postulating the existence of hidden random variables through which statistics can be derived.  The implication of Section C is that many emulators of GCM outputs are possible, and just because a particular one seems to fit mean values quite well does not mean that the nature of its error propagation is correct.  The only way to arbitrate between emulators would be to carry out Monte Carlo experiments with the black boxes and the emulators.  This might be expensive, but assuming that emulators have any value at all, it would increase this value.

Frank’s emulator does visibly give a decent fit to the annual means of its target, but that isn’t sufficient evidence to assert that it is a good emulator.  Frank’s paper claims that GCM projections to 2100 have an uncertainty of +/- at least 15K.  Because, via Section D, uncertainty really means a measure of dispersion, this means that Equation (1) with the equivalent of Frank’s parameters, using many examples of 80-year runs, would show an envelope where a good proportion would reach +15K or more, and a good proportion would reach -15K or less, and a good proportion would not reach those bounds.  This is just the nature of random walks with square root of time evolution. 

But the GCM outputs represented by CMIP5 do not show this behaviour, even though, climate being chaotic, different initial conditions should lead to such variety.  Therefore Frank’s emulator is not objectively a good one.  And the reason is that, as mentioned in Section C, the GCMs have corrective mechanisms to cancel out TOA imbalances except for, presumably, those induced by the rather small increase of greenhouse gases from one iteration to the next.

However the real value in Frank’s paper is first the attention drawn to the relatively large annual errors in the radiation budget arising from long wave cloud forcing, and second the revelation through comments on it that GCMs have ways of systematically squashing these errors.

Summary of Section H:

  • Frank’s emulator is not good in regard to matching GCM output error distributions.
  • Frank’s paper has valuable data on LCF errors.
  • Thereby it has forced “GCM auto-correction” out of the woodwork.
  1. The implications for GCMs

The “systematic squashing” of the +/-4 W/m^2 annual error in LCF inside the GCMs is an issue of which I for one was unaware before Pat Frank’s paper. 

The implication of comments by Roy Spencer is that there really is something like a “magic” component R3(t) anti-correlated with R2(t), though the effect would be similar if it was anti-correlated with R2(t-1) instead, which might be plausible with a new time step doing some automatic correction of overshooting or undershooting on the old time step.  GCM experts would be able to confirm or deny that possibility.

In addition, there is the question of a decay rate a, so that only a proportion (1-a) of previous forcing carries into the next time step, as justified by the heat reservoir concept in Section C.  After all, GCMs presumably do try to model the transport of heat in ocean currents, with concomitant heat storage.

It is very disturbing that GCMs have to resort to error correction techniques to achieve approximate TOA balance.  The two advantages of doing so are that they are better able to model past temperatures, and that they do a good job in constraining the uncertainty of their output to the year 2100.  But the huge disadvantage is that it looks like a charlatan’s trick; where is the vaunted skill of these GCMs, compared with anyone picking their favourite number for climate sensitivity and drawing straight lines against log(CO2)?  In theory, an advantage of GCMs might be an ability to explain regional differences in warming.  But I have not seen any strong claims that that is so, with the current state of the science.

Summary of Section I:

  • Auto-correction of TOA radiative balance helps to keep GCMs within reasonable bounds.
  • Details of how this is done would be of great interest; the practice seems dubious at best because it highlights shortcomings in GCMs’ modelling of physical reality.
Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
184 Comments
Inline Feedbacks
View all comments
1sky1
February 12, 2020 3:40 pm

The real elephant in the room here is the lack of solid recognition that Booth’s Eq. 1 is a valid analytic representation of an ARIMA process where only the recursive (1-a)-term is a genuine parameter specifying the system response. We don’t have the situation that von Neumann delineated. The three additive R terms simply specify the input that gives rise to the output. And it’s only in the case where a = 0 that we get an output that behaves in accordance with Frank’s presumption of a random walk with ever-expanding variance. But that mathematically unstable case is a physically unrealistic representation of any Hamiltonian system in which finite energy is preserved.

Reply to  1sky1
February 15, 2020 2:28 pm

1sky1, “Frank’s presumption of a random walk

I make no such presumption.

The variance expands as an indication of increasing ignorance. Not as a measure of increasing distance between true and predicted.

February 13, 2020 6:18 am

1sky1: thank you for those supportive words, but we don’t have a closed Hamiltonian system on Earth, because radiation in and out can vary.

Rich.

1sky1
Reply to  See - owe to Rich
February 13, 2020 1:13 pm

What is germane to the behavior of GCM outputs is the analysis of time-varying Hamiltonian systems, such as treated by: https://www.sciencedirect.com/science/article/pii/S0898122198800321.

Reply to  1sky1
February 15, 2020 2:30 pm

What is germane to the behavior of GCM outputs is comparison with well-constrained observations.

Reply to  1sky1
February 16, 2020 2:54 am

1sky1: Wow, that paper is deep mathematics! I’m afraid the 9-line Conclusions didn’t make me any the wiser. Do you have an “elevator speech” to explain the paper?

Rich.

1sky1
Reply to  See - owe to Rich
February 18, 2020 6:01 pm

The point of referencing that paper was not to delve into its arcane treatment of some mathematical properties, but to point out simply that variability of energy inputs and outputs does not exclude Hamiltonian systems. Nor, in the customary thermodynamic sense, does a closed system require non-varying energy levels.

February 14, 2020 3:29 am

I previously wrote (Feb12 5:07am) an apology, regarding uncertainties of means, about not distinguishing between two cases, the first being repeated measurements of the same variable under apparently identical conditions, and the second being single measurements of many different variables. I can now give some more detail on this.

The second case is far the easier, as follows. We have n pairs (X_i,M_i) where X_i is the true unknown value of the measurand and M_i is the measurement. M_i is not equal to X_i because of both random variation in the measurement process and because it is quantized digital output, which by appropriate scaling we assume to be an integer. We assume that the error D_i = M_i-X_i has a probability distribution, reflecting our ignorance. Some people prefer to assign a uniform distribution, and that is the easiest case to analyze, so I assume each D_i is uniform in [-e,+e] with e at least a half because of the quantization. Note that Var[D_i] = e^2/3.

Then given the measurements M_i, each X_i is uniform in [M_i-e,M_i+e]. The difference between the mean of the sample M_i’s and the true values X_i is the mean, D*, of the D_i’s, D* = sum_{i=1}^n D_i/n. Var[D*] = (n e^2/3)/n^2 = e^2/(3n). The standard uncertainty of the mean of the X_i’s is the square root of that, decreasing with sqrt(n) in the denominator.

So while the uncertainty in the sum of the X’s increases with n, the uncertainty in the mean decreases.

Returning to the first case, there is only one X. I now constrain e to be at most 1, so there are at most 2 possible values for each M_i, thereby simplifying the problem. We assume an uninformative prior for X, in which the probability that X lies between x and x+d is a tiny number ud.

Consider X in the unit interval (-1+e, e). If -1+e < X < 1-e then each M_i must be 0. We can write:

P[M_1=…M_n=0, -1+e<X<1-e] = 2(1-e)u

But if 1-e < X < e then each M_i can be either 0 or 1. Because the error distribution for M_i-X is uniform, any legal value of M_i is equally likely, so M_i = 0 with probability ½, and

P[M_1=m_1,…M_n=m_n, 1-e<X<e] = (2e-1)u/2^n

where each m_i is 0 or 1. In this case the probability that each M_i is identical is 2/2^n, which diminishes rapidly to 0 as n grows. So for large n, we can assert that if each M_i = 0 then -1+e < X < 1-e. The variance of X in the interval (-1+e,1-e) is (1-e)^2/3.

If on the other hand the M_i’s are a mixture of 0’s and 1’s, we know that 1-e < X < e and the variance of X in that interval is (2e-1)^2/12.

Overall, the mean variance of X, taking into account the width of the 2 intervals, is

V = 2(1-e)(1-e)^2/3 + (2e-1)(2e-1)^2/12 = (e-3/4)^2 + 1/48

Here is a table of the e, V, and the variance e^2/3 arising from a single observation (uncertainties are the square roots of these). At e = ½ all the variances equal 1/12 which is the variance of the output resolution; Section F explains why such an e is implausible as it implies infinite precision which gets discarded. At e = ¾, sqrt(V) is one half of the s.d. of the output resolution, contradicting statements that it is impossible to go below that bound.

__e___ e^2/3__ V
0.500 0.0833 0.0833
0.750 0.1875 0.0208
0.866 0.2500 0.0343
1.000 0.3333 0.0833

I have also, with even greater difficulty, worked out V_2 for the case n=2, but that may be of marginal interest. I would like to do calculations for normal errors, but that would require computer calculation, and I see this topic as a distraction from the more important question of the evolution of uncertainty.

Rich.

Reply to  See - owe to Rich
February 14, 2020 8:19 am

Rich,

How do you know what X_i actually is since any value within the uncertainty interval can be the true value?

If the true value is at one extreme or the other then you no longer have: “Then given the MEASUREMENTS M_i, each X_i is uniform in [M_i-e,M_i+e].” (capitalization mine, tim).

1. The true value could be at the very extreme end of the uncertainty interval so there is no +/- e but only +e or -e.
2. You are working with a SINGLE measurement with an uncertainty interval, not multiple measurements of the same thing using the same device. There is no “measurments”.

You simply can’t assume that X_i, i.e. the mean, is the true value.

You keep falling back into the same old central limit theory assuming you have multiple measurements that can be combined to more accurately calculate the mean. I.e. “M_i is not equal to X_i because of both random variation in the measurement process”. This assumes you have multiple measurements that form a “random variation in the measurement process”.

There *is* no random variation in the measurement process. There is SINGLE measurement that has an uncertainty. When you calculate the mean temperature for a day at a measuring station you use a SINGLE measurement for the maximum temperature, a measurement that has an uncertainty interval. You don’t take multiple measurements at that station at the same point in time that can be used to generate an accurate mean using the central limit theory. The same thing applies to the minimum temperature. You have a *SINGLE* measurement, not multiple measurements.

The plus and minus interval for uncertainty is *not* based on multiple measurements so there is no actual random probability distribution of measurements. The only reason for assuming a uniform distribution is to try and develop a way to add the uncertainty intervals based on known mathematics for random probability distributions. But since there are *not* multiple measurements you can’t take the similarity past figuring out how to handle the combining of the intervals.

There is nothing wrong with your math, only with the assumptions that an uncertainty interval is a probability distribution of multiple measurements.

“So while the uncertainty in the sum of the X’s increases with n, the uncertainty in the mean decreases.”
“the probability that X lies between x and x+d is a tiny number ud.”
“Because the error distribution for M_i-X is uniform”

All of these assume a probability distribution for multiple measurements of the same thing. An uncertainty interval associated with a single measurement is *NOT* a probability distribution, not even a uniform one. It is only useful to consider as such in order to figure out a way to combine multiple separate single measurements of different things – root-sum-square.

You continue to ignore how to combine a minimum temperature measurement of 60deg +/- 0.5deg and 72deg +/- 0.5deg and focus instead on how you can say that 60deg or 72deg is the “true value” using the central limit theory based on the uncertainty interval of each being a probability distribution of multiple measurements.

The fact is that when you combine two separate measurements, i.e. try to calculate a mean between the two, that mean will have a variance (i.e. uncertainty interval) that is larger then the variance of each component. And the variances add as root-sum-square. Just like Pat pointed out in his analysis.

If you have two measurements with an uncertainty interval of +/- 0.5deg then when combined you will have an uncertainty interval of sqrt( 0.25 + 0.25) = +/- 0.7deg a value larger than that of either component. This applies every time you do an iterative step in a CGM. Combine thirty of these to get a monthly average and your uncertainty interval becomes sqrt(30 * 0.7) = +/- 5deg. How in Pete’s name could you possibly say that one year is 0.01deg hotter than the other when your uncertainty interval spans a total of 10deg? And an uncertainty interval of +/- 0.5deg is certainly not unreasonable for measurements taken in the late 19th century and early 20th century. In fact, unless modern measurement devices are regularly calibrated, +/- 0.5deg is not an unreasonable assumption for the uncertainty them either!

This also means that when you do something like take daily averages of hundreds of stations and try to combine them that the uncertainty interval grows and grows. At some point the uncertainty overwhelms your ability to say that comparing one set of averages is X amount different that another.

February 15, 2020 2:57 am

Tim, I’ll deal with your specific points and then say something about when my calculations are or are not applicable, labelled APP.

T: How do you know what X_i actually is since any value within the uncertainty interval can be the true value?

Doh! Where did I say that X_i was known?

T: If the true value is at one extreme or the other then you no longer have: “Then given the MEASUREMENTS M_i, each X_i is uniform in [M_i-e,M_i+e].” (capitalization mine, tim).

Given that X is unknown (now dropping the i), we treat it as a random variable within that assumed uncertainty interval. We can write a probability density equation P[X ‘=’ x] = 1/(2e). Its actual value, x, could as you say be anywhere in the interval.

T: 1. The true value could be at the very extreme end of the uncertainty interval so there is no +/- e but only +e or -e.

I don’t understand your notation.

T: 2. You are working with a SINGLE measurement with an uncertainty interval, not multiple measurements of the same thing using the same device. There is no “measurments”.

No, in my second case I have a single measurement M_i approximating each unknown X_i, with the i’s representing different times and places. For my first case, see APP below.

T: You simply can’t assume that X_i, i.e. the mean, is the true value.

And I never did…

T: You keep falling back into the same old central limit theory assuming you have multiple measurements that can be combined to more accurately calculate the mean. I.e. “M_i is not equal to X_i because of both random variation in the measurement process”. This assumes you have multiple measurements that form a “random variation in the measurement process”.

I’ll have to keep on forgiving you for inappropriate reference to the Central Limit Theorem, which deals with the tendency to normality of summed random variables, not the reduction of variance in a mean.

T: There *is* no random variation in the measurement process. There is SINGLE measurement that has an uncertainty. When you calculate the mean temperature for a day at a measuring station you use a SINGLE measurement for the maximum temperature, a measurement that has an uncertainty interval. You don’t take multiple measurements at that station at the same point in time that can be used to generate an accurate mean using the central limit theory. The same thing applies to the minimum temperature. You have a *SINGLE* measurement, not multiple measurements.

See APP.

T: The plus and minus interval for uncertainty is *not* based on multiple measurements so there is no actual random probability distribution of measurements. The only reason for assuming a uniform distribution is to try and develop a way to add the uncertainty intervals based on known mathematics for random probability distributions. But since there are *not* multiple measurements you can’t take the similarity past figuring out how to handle the combining of the intervals.

If e > 1/2 then for some values of X, more than 1 possibility for M exists. See APP.

T: There is nothing wrong with your math, only with the assumptions that an uncertainty interval is a probability distribution of multiple measurements.

T(R): “So while the uncertainty in the sum of the X’s increases with n, the uncertainty in the mean decreases.”
“the probability that X lies between x and x+d is a tiny number ud.”
“Because the error distribution for M_i-X is uniform”

T: All of these assume a probability distribution for multiple measurements of the same thing. An uncertainty interval associated with a single measurement is *NOT* a probability distribution, not even a uniform one. It is only useful to consider as such in order to figure out a way to combine multiple separate single measurements of different things – root-sum-square.

No, in my second case they are distributions for measurements of many different things. The only way to use probability theory properly is to assume that some distribution exists, and then find out its implications. A national standards body with expensive equipment to measure values to within very small uncertainty will be able to measure error distributions for inferior devices.

T: You continue to ignore how to combine a minimum temperature measurement of 60deg +/- 0.5deg and 72deg +/- 0.5deg and focus instead on how you can say that 60deg or 72deg is the “true value” using the central limit theory based on the uncertainty interval of each being a probability distribution of multiple measurements.

The first clause is correct – I haven’t looked at it.

T: The fact is that when you combine two separate measurements, i.e. try to calculate a mean between the two, that mean will have a variance (i.e. uncertainty interval) that is larger then the variance of each component. And the variances add as root-sum-square. Just like Pat pointed out in his analysis.

This is your main error, which I have pointed out before. Let the uncertainties of X_1 and X_2 be u_1 and u_2 reespectively. We can agree that the uncertainty of X_1+X_2 is u = sqrt(u_1^2+u_2^2). But what is the uncertainty of Y = (X_1+X_2)/1000? It is u/1000. The JCGM defines uncertainty to be a measure of dispersion of the values a measurand could reasonably take. Since Y is 1000 times smaller than the X’s, its value and uncertainty are 1000 times smaller. Now replace 1000 by 2. The uncertainty of (X_1+X_2)/2 is u/2. If u_1 = u_2, then u = u_1/sqrt(2) < u_1. QED

T: If you have two measurements with an uncertainty interval of +/- 0.5deg then when combined you will have an uncertainty interval of sqrt( 0.25 + 0.25) = +/- 0.7deg a value larger than that of either component. This applies every time you do an iterative step in a CGM. Combine thirty of these to get a monthly average and your uncertainty interval becomes sqrt(30 * 0.7) = +/- 5deg. How in Pete’s name could you possibly say that one year is 0.01deg hotter than the other when your uncertainty interval spans a total of 10deg? And an uncertainty interval of +/- 0.5deg is certainly not unreasonable for measurements taken in the late 19th century and early 20th century. In fact, unless modern measurement devices are regularly calibrated, +/- 0.5deg is not an unreasonable assumption for the uncertainty them either!

See my reply to the previous paragraph.

T: This also means that when you do something like take daily averages of hundreds of stations and try to combine them that the uncertainty interval grows and grows. At some point the uncertainty overwhelms your ability to say that comparing one set of averages is X amount different that another.

No, as before, with n independent measurements, uncertainties of sums increases with n, uncertainties of means decreases. Now for:

APP: This is about the applicability of my calculations for multiple measurements M_i on a single measurand X. It assumes that independent measurements are possible, and that may not be the case. Let us take a particular value of e, 0.7, to demonstrate. We have digital output as integers and e = 0.7 means that in addition to the systematic rounding by up to +/-0.5 of the true value X, there is a further interval of +/-0.2. So, if X is truly 31.42, the device before rounding can register anywhere between 31.22 and 31.62 equally likely. Let's call that value Y_1. Those values between 31.22 and 31.5 will record output as 31, and those between 31.5 and 31.62 will output 32.

Now, suppose there is good reason to believe that X has not changed. For example we might be in laboratory conditions where we are tightly constraining it. What if we take a new measurement, say one minute later? It all depends on the nature of the device as to whether the new value Y_2 before rounding, is independent of Y_1. If Y fluctuates rapidly, then independence of Y_1 and Y_2 seems reasonable. For example, remember the old speedometers with analogue needles which would wobble noticeably. On the other hand, Y might be pretty stable over short periods, but affected by lunar tide, or cosmic particles, or Earth's magnetic field, etc. etc., and be more variable over a longer time. In this case Tim is right and a new M_2 is almost certain to agree with the previous M_1.

So my earlier demonstration, of a modest reduction in uncertainty with multiple observations of the same quantity, does depend on independence and that depends on the physics of the particular instrument.

Rich.

Reply to  See - owe to Rich
February 15, 2020 1:35 pm

“Doh! Where did I say that X_i was known?”

When you assume an equal “+/- e” then you have assumed that the mean is the true value and, therefore, that it is known.

“we treat it as a random variable”

It is not a “random variable”. Being a random variable assumes there is are multiple measurements whose values take the form of a probability distribution.

Rich: “Then given the MEASUREMENTS M_i, each X_i is uniform in [M_i-e,M_i+e]”
tim: “The true value could be at the very extreme end of the uncertainty interval so there is no +/- e but only +e or -e.”

“I don’t understand your notation.”

What don’t you understand? You are the one that used the terminology. I am just pointing out that X_i, the true value, doesn’t have to be uniform with a uniform negative and positive interval.

Uncertainty is not error. Pat has said that often enough that it should be burned in everyone’s brain. Error you can resolve with multiple measurements of a standard, you can’t do the same with one measurement that has uncertainty.

“No, in my second case I have a single measurement M_i approximating each unknown X_i, with the i’s representing different times and places. For my first case, see APP below.”

How can any single M-i approximate a true value value X-i? That’s the whole issue in a nutshell. If I give you a single measurement of 72deg +/- 0.5deg, how do you know that nominal measurement, 72deg, approximates the true value? If that were the case then why does the uncertainty interval even exist?

If you measure the temperature at different places with different thermometers then how do you add the uncertainty intervals together? You can’t do it with the central limit theory because that only holds for multiple measurements of the same thing, i.e. object, time, and place.

“If e > 1/2 then for some values of X, more than 1 possibility for M exists”

Again, you are assuming a probability function, see the word “possibility”.

“No, in my second case they are distributions for measurements of many different things. The only way to use probability theory properly is to assume that some distribution exists, and then find out its implications. A national standards body with expensive equipment to measure values to within very small uncertainty will be able to measure error distributions for inferior devices.”

Error is *not* uncertainty. If you are measuring a standard to determine calibration then you are doing a straight comparison to determine an error bias. That has nothing to do with uncertainty. The minute your calibrated instrument leaves the laboratory it will begin to lose calibration from aging, environment differences, etc. It will develop an uncertainty interval that only grows over time.

” But what is the uncertainty of Y = (X_1+X_2)/1000?”

You keep falling back into the same trap, over and over. There is no “1000”. The population size of a single measurement is “1”. Y = whatever the nominal value of the measurement is. There is no X-1 and X_2. Y = X-1 in every case. And Y has an uncertainty interval that you can’t that you can’t resolve from that one measurement.

“No, as before, with n independent measurements, uncertainties of sums increases with n, uncertainties of means decreases. Now for:”

ONLY IF YOU ARE MEASURING THE SAME THING MULTIPLE TIMES! When I tell you that the temperature here, right now, is 40degF that is based on one measurement by one device at one point in time and at one location. Where do you keep coming up with independent measurements? And if the guy down the six miles down the road says his thermometer reads 39degF, that is one measurement by one device at one point in time and at one location.

Now tell me how averaging those two independent measurements with individual uncertainty intervals can be combined to give a mean that has a *smaller* uncertainty interval than either. These are not measuring the same thing with the same device in the same environment at the same time. It simply doesn’t matter how accurately you think you can calculate the mean of these two independent measurements, the overall uncertainty will grow, it will not decrease.

Combining two independent populations is just not as simple as calculating the mean and dividing by the total population. I tried to explain that with the two independent populations of pygmys and Watusis. An example you *still* have not addressed.

“This is about the applicability of my calculations for multiple measurements M_i on a single measurand X.”

You simply don’t have multiple measurements on a single measurand. You don’t even have the same measuring device! Take all 20 temperature stations within a ten-mile radius of my location. The maximum temperature is measured at each. Each measurement is a population of one. The measurements are all taken at different times, in different locations, using different instruments. Each single measurement has a different uncertainty interval depending on the instrument model, age, location, etc.

Now, you can certainly calculate the mean of those measurements. But the total uncertainty will be the root-sum-square of all the individual uncertainties. The total uncertainty will *not* decrease based on the size of the population. Not directly or by the square root.

1sky1
Reply to  See - owe to Rich
February 15, 2020 4:19 pm

[W]hen you do something like take daily averages of hundreds of stations and try to combine them…the uncertainty interval grows and grows.

This contention flies in the face of the fact that the variance of time-series of temperature in any homogeneous climate area DECREASES as more COHERENT time-series are averaged together. Each measurement–when considered as a deviation from its own station mean–is a SINGLE REALIZATION, but NOT a POPULATION of one, as erroneously claimed. Sheer ignorance of this demonstrable empirical fact, along with the simplistic presumption that all measurements are stochastically independent, is what underlies the misbegotten random-walk conception of climatic uncertainty argued with Pavlovian persistence here.

Reply to  1sky1
February 16, 2020 9:57 am

As usual, 1sky1, you ignore the impact of non-normal systematic measurement error. Is yours a Pavlovian blindness, too?

Every measurement has a unique deviation from the physically true temperature that is not known to belong to any normally-distributed population.

Those deviations arise from uncontrolled environmental variables, especially wind speed and solar irradiance. Messy, isn’t it.

Like Rich, 1sky1, you live in a Platonic fantasyland.

Reply to  1sky1
February 16, 2020 10:39 am

sky:

“This contention flies in the face of the fact that the variance of time-series of temperature in any homogeneous climate area DECREASES as more COHERENT time-series are averaged together.”

The measurements are not time coherent. What makes you think they are? They are maximum temperatures and maximum temperatures can occur at various times even in an homogeneous climate area. They are minimum temperatures and minimum temperatures can occur at various times even in an homogeneous climate area. Even stations only a mile apart can have different cloud coverage and different wind conditions, both of which can affect their readings.

“Each measurement–when considered as a deviation from its own station mean”

How do you get a station mean? That would require multiple measurements of the same thing and no weather data collection station that I know of does that.If a measurement device has an uncertainty interval associated with it no amount of calculating a daily mean from multiple measurements can decrease that uncertainty interval.

“is a SINGLE REALIZATION, but NOT a POPULATION of one, as erroneously claimed. ”

Of course it is a population of one. And it has an uncertainty interval.

“Sheer ignorance of this demonstrable empirical fact”

The empirical fact is that stations take one measurement at a time, separated in time from each other. Each stations measures a different thing, like two investigators of which one measures the height of a pygmy and the other the height of a Watusi. How do you combine each of those measurements into a useful mean? How does combining those two measurements decrease the overall uncertainty associated with the measurements?

“along with the simplistic presumption that all measurements are stochastically independent, is what underlies the misbegotten random-walk conception of climatic uncertainty argued with Pavlovian persistence here.”

What makes you think the temperature measurements are made at random? That *is* the definition of stochastic, a random process, specifically that of a random variable.

They *are* independent. The temperature reading at my weather station is totally independent of the temperature reading at another weather station 5 miles away! They are simply not measuring the same thing and they are not the same measuring device. And they each have their own uncertainty interval that are independent of each other.

If you had actually been playing attention, uncertainty does *not* result in a random walk. Uncertainty is not a random variable that provides an equal number of values on each side of a mean. And it is that characteristic that causes a random walk. Sometimes you turn left and sometimes you turn right. An uncertainty interval doesn’t ever tell you which way to turn!

1sky1
Reply to  1sky1
February 16, 2020 4:19 pm

Frank clings doggedly to the unfounded notion that “non-normal systematic measurement error” somehow overturns everything that is known analytically about stochastic processes in the ensemble sense and in their individual realizations. Truly systematic error introduces the well-known feature of systematic bias, which can be readily identified and removed. But what we have with sheltered temperature measurements in situ is sporadic (episodic) bias, which itself is a random process. Such bias becomes gaussian “noise” in the case of AGGREGATED station data.

Gorman, once again, continues to express his blind faith, which flies in the face of demonstrable station data analyses.

1sky1
Reply to  1sky1
February 16, 2020 4:44 pm

Frank clings doggedly to the unfounded notion “non-normal systematic measurement error” somehow overturns everything that is known analytically about stochastic processes, both in the ensemble sense and in the case of individual realizations. Truly systematic measurement error introduces a well-known bias, which can readily removed. But in the case of sheltered temperature measurements in situ, what we have is sporadic episodes of bias, which itself produce a random process. That process becomes gaussian “noise” when station data are aggregated over a sufficiently large number. The independence of measurement uncertainty at different stations thus leads to a reduction in total variance of the data in the aggregate case.

Gorman’s ex ante argumentatiom is patently unaware of all of this and fails to come to grips with the well-known cross spectral coherence of nearby stations.

Reply to  1sky1
February 16, 2020 6:27 pm

1sky1 “Truly systematic measurement error introduces a well-known bias, which can readily removed.

Do you understand the concept and impact of uncontrolled variables, 1sky1?

1sky1 “But in the case of sheltered temperature measurements in situ, what we have is sporadic episodes of bias, which itself produce a random process.

Undemonstrated anywhere.

Hubbard and Lin (2002) doi: 10.1029/2001GL013191 combined thousands of single-instrument measurements and found non-normal distributions of error. Under ideal conditions of repair, calibration, and siting.

That process becomes gaussian “noise” when station data are aggregated over a sufficiently large number.

Hand-waving. You don’t know that, and neither does anyone else.

… the well-known cross spectral coherence of nearby stations

You’re in for a surprise.

Reply to  See - owe to Rich
February 16, 2020 6:55 am

Tim Feb 15 1:35pm

This is getting wearisome. You went up in my estimation last October, Tim, but have rather declined since. The problem is that you seem to be denying some basic mathematics, so this may well be the last time I respond to you. I’m going to mark new comments with ‘RN’ below.

“Doh! Where did I say that X_i was known?”

When you assume an equal “+/- e” then you have assumed that the mean is the true value and, therefore, that it is known.

RN: No, +/-e (uniform) means there is an interval (X-e,X+e) in which M must lie. In addition, M must be an integer under the assumption of digital output appropriately scaled. After M=m is observed, we know that X is in the interval (m-e,m+e). Obviously we don’t know what X is.

“we treat it as a random variable”

It is not a “random variable”. Being a random variable assumes there is are multiple measurements whose values take the form of a probability distribution.

RN: False. Suppose I choose to throw a fair die once? Perhaps it has been thrown many times in the past to establish its fairness. Or perhaps never, but comes from a sample which has through trials been shown statistically fair. Before I throw it, it is a random variable with probability 1/6 of each face coming topmost. After I throw it, it is a measurement of that random variable, and is now a fixed value.

Rich: “Then given the MEASUREMENTS M_i, each X_i is uniform in [M_i-e,M_i+e]”
tim: “The true value could be at the very extreme end of the uncertainty interval so there is no +/- e but only +e or -e.”

“I don’t understand your notation.”

What don’t you understand? You are the one that used the terminology. I am just pointing out that X_i, the true value, doesn’t have to be uniform with a uniform negative and positive interval.

RN: True, X_i doesn’t have to be uniform, but science proceeds by making assumptions and testing them where possible. For simplicity, I have mostly been assuming uniformity, as indeed ahve you in many of your comments.

Uncertainty is not error. Pat has said that often enough that it should be burned in everyone’s brain. Error you can resolve with multiple measurements of a standard, you can’t do the same with one measurement that has uncertainty.

RN: Yes, uncertainty is not error. It is the distribution, or often just the dispersion of, or often just the standard deviation of the dispersion of, the random variable which represents the error M-X before measurement – see the JCGM.

“No, in my second case I have a single measurement M_i approximating each unknown X_i, with the i’s representing different times and places. For my first case, see APP below.”

How can any single M-i approximate a true value value X-i? That’s the whole issue in a nutshell. If I give you a single measurement of 72deg +/- 0.5deg, how do you know that nominal measurement, 72deg, approximates the true value? If that were the case then why does the uncertainty interval even exist?

RN: I know it approximates the true value because you, or someone else, swore blind that the thermometer reads accurately to within a certain error, some of whose parameters have been determined. If your thermometer is really only accurate to within 2 degrees, don’t tell me it’s accurate to within half a degree. And the uncertainty “interval” exists to reflect exactly those statements about its error.

If you measure the temperature at different places with different thermometers then how do you add the uncertainty intervals together? You can’t do it with the central limit theory because that only holds for multiple measurements of the same thing, i.e. object, time, and place.

RN: False. (More forgiveness, I’m just back from Sunday church and you are inappropriately using “central limit theory” again. It’s OK, I know what you mean anyway.) No, the theory of the distribution of the sum, or of a mean, of n random variables does not depend on them being multiple measurements of the same thing. Probably best for you to read a good statistics book. The theory does have to take account of correlation between them, and gives a simpler result if there isn’t any, which incidentally is more plausible if they are at a different time or place.

“If e > 1/2 then for some values of X, more than 1 possibility for M exists”

Again, you are assuming a probability function, see the word “possibility”.

RN: Yes, I am, quite justifiably. The theory of uncertainty for a sum of measurements in the JCGM does not proceed mathematically without assumption that a probability function, even if unknown, exists. Oh, I see that’s in my next quoted comment anyway.

“No, in my second case they are distributions for measurements of many different things. The only way to use probability theory properly is to assume that some distribution exists, and then find out its implications. A national standards body with expensive equipment to measure values to within very small uncertainty will be able to measure error distributions for inferior devices.”

Error is *not* uncertainty. If you are measuring a standard to determine calibration then you are doing a straight comparison to determine an error bias. That has nothing to do with uncertainty. The minute your calibrated instrument leaves the laboratory it will begin to lose calibration from aging, environment differences, etc. It will develop an uncertainty interval that only grows over time.

RN: Perhaps it will, but if we are to use the instrument to good effect we need an estimate, or perhaps worst case, of its uncertainty when we use it. Otherwise all bets are off, and we might as well say “Oh, we can’t measure global warming, we just believe in it”. Oh, lots of people do that anyway… And in any case the laboratory should have tested the instrument under varying environmental conditions, and supplied an uncertainty value appropriately. In fact, they might know it has great accuracy between 20 and 25 degC, but have to publish a worse figure incase someone uses it at -40 or +50.

” But what is the uncertainty of Y = (X_1+X_2)/1000?”

You keep falling back into the same trap, over and over. There is no “1000”. The population size of a single measurement is “1”. Y = whatever the nominal value of the measurement is. There is no X-1 and X_2. Y = X-1 in every case. And Y has an uncertainty interval that you can’t that you can’t resolve from that one measurement.

RN: Now you are not only denying basic mathematics/statistics, but denying the existence of the number 1000! My X_1 and X_2 are, for example, the LCF values for the years 2011 and 2012, the uncertainties of which Pat Frank combines in the way I described. And the point about the 1000 is that scaling matters. So uncertainty of a mean is not the same as the uncertainty of a sum.

“No, as before, with n independent measurements, uncertainties of sums increases with n, uncertainties of means decreases. Now for:”

ONLY IF YOU ARE MEASURING THE SAME THING MULTIPLE TIMES! When I tell you that the temperature here, right now, is 40degF that is based on one measurement by one device at one point in time and at one location. Where do you keep coming up with independent measurements? And if the guy down the six miles down the road says his thermometer reads 39degF, that is one measurement by one device at one point in time and at one location.

Now tell me how averaging those two independent measurements with individual uncertainty intervals can be combined to give a mean that has a *smaller* uncertainty interval than either. These are not measuring the same thing with the same device in the same environment at the same time. It simply doesn’t matter how accurately you think you can calculate the mean of these two independent measurements, the overall uncertainty will grow, it will not decrease.

RN: Again, you use the word “combine”, but the combining function is important. Mean and sum are not the same function, because mean scales down by the sample size. It’s elementary mathematics.

Combining two independent populations is just not as simple as calculating the mean and dividing by the total population. I tried to explain that with the two independent populations of pygmys and Watusis. An example you *still* have not addressed.

RN: True, better things to do with my time I’m afraid.

“This is about the applicability of my calculations for multiple measurements M_i on a single measurand X.”

You simply don’t have multiple measurements on a single measurand. You don’t even have the same measuring device! Take all 20 temperature stations within a ten-mile radius of my location. The maximum temperature is measured at each. Each measurement is a population of one. The measurements are all taken at different times, in different locations, using different instruments. Each single measurement has a different uncertainty interval depending on the instrument model, age, location, etc.

RN: In the APP section, the type of measurand was not specified. Temperature may not be a good example for reasons you cite. That’s why I suggested an old style wobbly speedometer converted to digital output, but I expect there are better examples.

Now, you can certainly calculate the mean of those measurements. But the total uncertainty will be the root-sum-square of all the individual uncertainties. The total uncertainty will *not* decrease based on the size of the population. Not directly or by the square root.

RN: But in the APP section I did not suggest taking the mean of them. Why did you think I did? Again, your comment is wide of the mark.

Farewell and adieu,
Rich.

Reply to  See - owe to Rich
February 16, 2020 11:00 am

Rich, “[uncertainty is] just the standard deviation of the dispersion of, the random variable which represents the error …

Not in the real world of measurement science.

You quote the JCGM where it is elaborating the assumptions that suit your views, Rich. Your use of their authority is purely circular. You choose the part where they assume what you do, then you cite them as authority for your assumption.

In B.N. Taylor and C.E. Kuyatt., Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results 1994, National Institute of Standards and Technology: Washington, DC. p. 20.

Under “D.1.1.6 systematic error [VIM 3.14]

[The] mean that would result from an infinite number of measurements of the same measurand carried out under repeatability conditions minus the value of the measurand

NOTES
1 Systematic error is equal to error minus random error
(my bold).
2 Like the value of the measurand, systematic error and its causes cannot be completely known.

Systematic error is not random error. The sign or magnitude of any instance of systematic error is not known.

In the Note under 3.2.3 in the JCGM “The uncertainty of a correction applied to a measurement result to compensate for a systematic effect is not the systematic error, often termed bias, in the measurement result due to the effect as it is sometimes called. It is instead a measure of the uncertainty of the result due to incomplete knowledge of the required value of the correction. The error arising from imperfect compensation of a systematic effect cannot be exactly known. The terms “error” and “uncertainty” should be used properly and care taken to distinguish between them.

JCGM under 3.3 Uncertainty
3.3.1 The uncertainty of the result of a measurement reflects the lack of exact knowledge of the value of the measurand (see 2.2). The result of a measurement after correction for recognized systematic effects is still only an estimate of the value of the measurand because of the uncertainty arising from random effects and from imperfect correction of the result for systematic effects.

NOTE The result of a measurement (after correction) can unknowably be very close to the value of the measurand (and hence have a negligible error) even though it may have a large uncertainty. Thus the uncertainty of the result of a measurement should not be confused with the remaining unknown error.

Under 3.3.2 “Of course, an unrecognized systematic effect cannot be taken into account in the evaluation of the uncertainty of the result of a measurement but contributes to its error.

Under 3.3.4 “Both types of evaluation [i.e., of random error and of systematic error — P] are based on probability distributions (C.2.3), and the uncertainty components resulting from either type are quantified by variances or standard deviations.

Under 5.1.4 “The combined standard uncertainty u[c(y)] [i.e., consisting of both random and systematic components — P] is the positive square root of the combined variance u²[c( y)], which is given by … Equation (10) … based on a first-order Taylor series approximation of Y = f (X₁, X₂, …, Xɴ), express what is termed in this Guide the law of propagation of uncertainty.”

You’re just being tendentious, Rich. Sticking to the incomplete view that allows your preferred conclusion.

Reply to  Pat Frank
February 17, 2020 1:52 am

Pat Feb 16 11:00am

I am amazed that you think that I am being “tendentious” and ignoring important parts of the JCGM! In the spirit of “what do you mean by ‘mean'” I shall once again supply some mathematics to elucidate, again with X = true (unknowable) value, M or M_i for measurements, D = M-X for error. For now I’ll just consider the case where M is an analogue (continuous) reading, because rounding does complicate matters. My position is that D is a random variable and that its distribution is the most general concept available for uncertainty. But I accept that the JCGM simplifies that into a bias element b = E[D] and an uncertainty element s = sqrt(Var[D]). So now I’ll annotate what you wrote, with ‘P’ prefixes for your paragraphs, and you’ll find little disagreement.

P: Under “D.1.1.6 systematic error [VIM 3.14]
[The] mean that would result from an infinite number of measurements of the same measurand carried out under repeatability conditions minus the value of the measurand.”

Infinite number of measurements would be M_1,…,M_n as n goes to infinity. Mean = sum M_i/n. “Minus the value of the measurand” is subtracting X, giving sum (M_i-X)/n = sum D_i/n. Providing that the M_n are relatively independent (not highly correlated), that sum tends to b under the Weak Law of Large Numbers. So systematic error equals b.

P: NOTES
1 Systematic error is equal to error minus random error.
2 Like the value of the measurand, systematic error and its causes cannot be completely known.

Note 1 implies that random error is error minus systematic error, which is D-b, with expectation E[D]-b = 0. So the term “random error” is for the departure of D from its mean, and standard uncertainty is its standard deviation, which is also its root mean square since it has mean 0. All seems reasonable – and I now notice that Section 3.2.2 confirms all that.

P: Systematic error is not random error. The sign or magnitude of any instance of systematic error is not known.

Indeed, the deduction made above is that the JCGM takes b to be the “systematic error”, and D-b to be the “random error”.

P: In the Note under 3.2.3 in the JCGM “The uncertainty of a correction applied to a measurement result to compensate for a systematic effect is not the systematic error, often termed bias, in the measurement result due to the effect as it is sometimes called. It is instead a measure of the uncertainty of the result due to incomplete knowledge of the required value of the correction. The error arising from imperfect compensation of a systematic effect cannot be exactly known. The terms “error” and “uncertainty” should be used properly and care taken to distinguish between them.”

That confirms that “systematic error” is what I was calling bias (the more usual name in statistics), which is b = E[D]. I agree with everything there.

P: JCGM under 3.3 Uncertainty
“3.3.1 The uncertainty of the result of a measurement reflects the lack of exact knowledge of the value of the measurand (see 2.2). The result of a measurement after correction for recognized systematic effects is still only an estimate of the value of the measurand because of the uncertainty arising from random effects and from imperfect correction of the result for systematic effects.

Yes, agree.

P: NOTE The result of a measurement (after correction) can unknowably be very close to the value of the measurand (and hence have a negligible error) even though it may have a large uncertainty. Thus the uncertainty of the result of a measurement should not be confused with the remaining unknown error.”

Yes, agree.

P: Under 3.3.2 “Of course, an unrecognized systematic effect cannot be taken into account in the evaluation of the uncertainty of the result of a measurement but contributes to its error.”

Yes, assuming “systematic effect” equates to “systematic error”, which is bias, then that affects the bias (b) and the total error (D), but not the “random error” and therefore not the uncertainty.

P: Under 3.3.4 “Both types of evaluation [i.e., of random error and of systematic error — P] are based on probability distributions (C.2.3), and the uncertainty components resulting from either type are quantified by variances or standard deviations.”

This is exactly what I have been saying but Tim Gorman appears to have been denying.

P: Under 5.1.4 “The combined standard uncertainty u[c(y)] [i.e., consisting of both random and systematic components — P] is the positive square root of the combined variance u²[c( y)], which is given by … Equation (10) … based on a first-order Taylor series approximation of Y = f (X₁, X₂, …, Xɴ), express what is termed in this Guide the law of propagation of uncertainty.”

Yes, with two caveats. The first is that, as you will know, that section is for uncorrelated errors. By the way, you are actually quoting from 5.1.2 not 5.1.4. The second caveat regards your parenthesis about random and systematic components. The “combined” adjective refers to combining the uncertainties u(x_i) of all the inputs to a function f. Each one of those can be either a Type A or a Type B uncertainty, which corresponds to the way in which the uncertainty value was derived, but does not differentiate between random and systematic components.

In fact, a systematic error does not per se contribute to an uncertainty term u(x_i). It is only the attempt to correct for the systematic error, i.e. to reduce the bias, which adds uncertainty. But typically the added uncertainty there is small. It can be shown under reasonable assumptions that using n observations to determine the correction leads to uncertainty being multiplied by (1+1/(2n)).

So, Pat, where is it that you think I have been misusing the JCGM just in ways that suit me?

Reply to  See - owe to Rich
February 17, 2020 9:56 am

Rich — right at the start, “ D = M-X for error. … My position is that D is a random variable and that its distribution is the most general concept available for uncertainty. But I accept that the JCGM simplifies that into a bias element b = E[D] and an uncertainty element s = sqrt(Var[D]).

D is not a random variable. JCGM describes systematic error as unknowable. In insisting that D is a random variable, you’re immediately assuming your conclusion.

That’s as tendentious as it is possible to get.

Look at note 1 from Taylor and Kuyatt: “Systematic error is equal to error minus random error” Systematic error is definitively not randomly distributed.

Systematic error from uncontrolled variables is unknown in both sign and magnitude. Measurements that contain systematic errors can behave just like good data.

The systematic error in the prediction from an inadequate theory cannot be known, even in principle, because there are no well-constrained observables for comparison.

And then you go right ahead and assume from the outset that it’s all just random variables. It’s too much, Rich.

Science isn’t statistics. Physical methods and incomplete physical theories do not conform to any ideal.

No amount of mathematics will convert bad data into good. Or sharpen the blur within a resolution limit, for that matter.

Reply to  See - owe to Rich
February 17, 2020 12:14 pm

Rich, “Minus the value of the measurand” is subtracting X, giving sum (M_i-X)/n = sum D_i/n.”

Not quite.

“Minus the value of the measurand” is subtracting X from the mean of measurements, giving {sum [(M_i)/n]} – X = systematic error S, because random error D has reduced by 1/[sqrt(n=infinity)]

Rich, “assuming “systematic effect” equates to “systematic error”, which is bias, then that affects the bias (b) and the total error (D), but not the “random error” and therefore not the uncertainty.

Rather, ‘yes the uncertainty.’ Unknown bias contributes uncertainty to a measurement.

Unknown bias cannot be subtracted away.

In real world measurements and observations, the size and sign of ’b’ are unknown. Hence the need for instrumental (or model) calibration under the conditions of the experiment.

Quoted P: Under 3.3.4 “Both types of evaluation [i.e., of random error and of systematic error — P] are based on probability distributions (C.2.3), and the uncertainty components resulting from either type are quantified by variances or standard deviations.”

Rich, “This is exactly what I have been saying but Tim Gorman appears to have been denying.

Rather, it’s what you denied just above, Rich, where you wrote (wrongly) that bias does not contribute to uncertainty.

The JCGM there specifically indicated that uncertainty arises from systematic error. You denied it. Tim Gorman has repeatedly explained it.

I believe the problem is that you’re supposing that the use of variances or standard deviations in the JCGM strictly imply the metrics of a normal distribution.

If you think this, you are under a serious misapprehension. They imply no such thing.

In practice, the same mathematics and the same terminology are used to evaluate non-normal systematic error, as are used for random error. See JCGM 5.1.2ff.

Sorry for the 5.1.2 – 5.1.4 mix-up. But the point remains.

Rich, “Yes, with two caveats. The first is that, as you will know, that section is for uncorrelated errors.

Actually, it’s for uncorrelated input quantities — the X_i, not the D_i.

The X_i would be multiple independent measurements of the same so-called measurand. In analytical work, statistical independence can be provided by making multiple independent samples, and then measuring each once. In a perfect system, all the X_i would be identical. This is never achieved in real labs.

Hence the need for calibration against known standards, and application of the calibration-derived uncertainty to every single experimental measurement. That uncertainty never diminishes with repeated measurements. In sums of experimental measurements, the final uncertainty is the rms.

When such measurements serially enter a sequential set of calculations, the uncertainties propagate through as the rss.

Rich, “Each one of those can be either a Type A or a Type B uncertainty, which corresponds to the way in which the uncertainty value was derived, but does not differentiate between random and systematic components.

From the JCGM, Type A evaluations of standard uncertainty components are founded on frequency distributions while Type B evaluations are founded on a priori distributions..

Only completed Type A evaluations provide information about the shape of the error distribution, potentially meeting the assumptions that fully justify statistical analysis.

Type B evaluations estimate uncertainty by bringing in external information. The assumptions that justify statistics need not be met.

Going down to 4.3 Type B evaluation of standard uncertainty, we find among the sources of Type B information: data provided in calibration and other certificates;.

Calibration is exactly what we have been discussing, with respect to evaluation of systematic errors. GCM global cloud fraction simulation error is a systematic error, and satellite observations of cloud fraction is the (imperfect) calibration standard.

Take a look at JCGD Appendix F.2 Components evaluated by other means: Type B evaluation of standard uncertainty

One must go all the way down to F.2.6.3, to find mention of the situation we’re discussing here.

The very last sentence includes, “… when the effects of environmental influence quantities on the sample are significant, the skill and knowledge of the analyst derived from experience and all of the currently available information are required for evaluating the uncertainty.

F 2.6.2 also has relevance in the discussion of the effects of unknown sample inhomogeneities (i.e., uncontrolled environmental variables).

The effects of environmental influence quantities refers exactly to the impact of uncontrolled variables on the measurement, for which the complete uncertainty must account.

Calibration of an instrument against a well-known standard is the requisite way of assessing the accuracy of an experimental measurement. Calibration against a well constrained observable is the standard way of assessing the accuracy of a prediction from a physical model.

Calibration error includes both random and systematic components, and the sign and magnitude of the systematic error is not known for any single measured datum M_i, typically because the true value X_i is not known.

In a prediction of future climate states, the sign and magnitude of the systematic error in each step of a simulation is necessarily unknown. All one has is the prior information about predictive uncertainty determined by the calibration error statistic.

Regarding the meaning of the ±u(c) derived from systematic error of unknown sign and magnitude arising from uncontrolled variables, the JCGM says this:

Under 4.3.7, “In other cases, it may be possible to estimate only bounds (upper and lower limits) for Xi, in particular, to state that “the probability that the value of Xi lies within the interval a− to a+ for all practical purposes is equal to one and the probability that Xi lies outside this interval is essentially zero”. If there is no specific knowledge about the possible values of Xi within the interval, one can only assume that it is equally probable for Xi to lie anywhere within it…

Paragraph 4.3.7 is exactly what Tim Gorman has been explaining.

Rich, “So, Pat, where is it that you think I have been misusing the JCGM just in ways that suit me?

In your insistence that all error is random, Rich. In your denial of the impact of bias on uncertainty.

The JCGM does not support your view.

But your mistaken view is what allows you your preferred conclusions.

Nor does the actual practice of science support your view. Science is theory and result. Not just theory. Result is observation and measurement. Theory stands or falls on the judgment of result.

Observations and measurements always have an associated uncertainty that includes limits of resolution as well as external impacts. Physical science is messy that way.

And so physical scientists have had to develop methods to estimate the reliability of data. Statistics provides a very important method of estimation, and is used even when the structure of the error violates statistical assumptions.

Climate modeling — and consensus climatology in general — has neglected, even repudiated, the results part of science to their eventual utter downfall.

Reply to  See - owe to Rich
February 18, 2020 3:08 am

Pat Feb 17 9:56am

P: Rich — right at the start, “ D = M-X for error. … My position is that D is a random variable and that its distribution is the most general concept available for uncertainty. But I accept that the JCGM simplifies that into a bias element b = E[D] and an uncertainty element s = sqrt(Var[D]).”

P: D is not a random variable. JCGM describes systematic error as unknowable. In insisting that D is a random variable, you’re immediately assuming your conclusion.

I’m not sure what “conclusion” I have assumed, but let that pass; I suppose your implication is that if I got that wrong then you can safely ignore anything else I write. But my reading of the JCGM, replete with probability theory as it is, says I didn’t get it wrong; I’ll revisit that in the next paragraph. By all means, Pat, concentrate on physics rather than statistics, but don’t then claim that you know enough about statistics and uncertainty to claim that your emulator is good enough, even using valid equations of propagation of uncertainty, to say anything useful about the propagation of uncertainty in GCMs (not that I hold GCMs in huge regard myself).

So, is D a random variable? Why does the statistics of error exist, or the JCGM exist, if not to shed light on the discrepancy (error) between a presumed physical value X and a measurement M of it? Why does the JCGM talk of variance and standard deviation of error, if M-X is not fairly represented as a random variable? JCGM 3.3.4: “Both types of evaluation are based on probability distributions (C.2.3), and the uncertainty components resulting from either type are quantified by variances or standard deviations”.

P: Look at note 1 from Taylor and Kuyatt: “Systematic error is equal to error minus random error” Systematic error is definitively not randomly distributed.

Correct: systematic error = bias = a parameter of the error probability distribution, is unknown though some inferences from data may occur, and is not a random variable.

P: Systematic error from uncontrolled variables is unknown in both sign and magnitude. Measurements that contain systematic errors can behave just like good data.

Correct, but beware the “uncontrolled”.

P: The systematic error in the prediction from an inadequate theory cannot be known, even in principle, because there are no well-constrained observables for comparison.

I agree, but again beware the “inadequate”: the GCMs do have well-constrained observables for comparison, namely global temperature data. In any case, the crux of your paper isn’t about systematic error (= bias = mean error), it is about uncertainty (= standard deviation of error).

P: Science isn’t statistics. Physical methods and incomplete physical theories do not conform to any ideal. No amount of mathematics will convert bad data into good. Or sharpen the blur within a resolution limit, for that matter.

But where science uses numbers to draw conclusions, mathematics is needed to ensure that those conclusions are derived in a rational, justifiable, way. It is no good using formulae from the JCGM and then saying you don’t believe in any of the mathematics underpinning it.

I know that science isn’t, or shouldn’t be, about consensus, but I’d be fascinated to know how many readers with science/maths degrees agree with me or with you, and how the type of degree might affect those statistics!

Rich.

Reply to  See - owe to Rich
February 18, 2020 10:25 am

Rich, “I’m not sure what “conclusion” I have assumed, but let that pass; …

When you assume random variables you assume your conclusion, which is embedded in the supposition that the statistics of normal distributions uniformly apply to those variables and ultimately to physical error.

Rich, “I suppose your implication is that if I got that wrong then you can safely ignore anything else I write.

When have I ignored what you write, Rich? I’ve engaged you at virtually every turn.

Rich, “don’t then claim that you know enough about statistics and uncertainty to claim that your emulator is good enough, even using valid equations of propagation of uncertainty, to say anything useful about the propagation of uncertainty in GCMs

I don’t claim my emulator is good enough, Rich. I demonstrated that it is good enough.

I showed that it can accurately emulate the air temperature projections of any arbitrary CMIP3 or CMIP5 GCM. Sixty-eight examples, not counting the 19 in Figure 1.

I don’t propagate the error in GCMs, Rich. Supposing so is to make Nick Stokes’ mistake.

I propagate the error of GCMs; the error GCMs observably make. It’s the calibration error that exposes the resolution lower limit of GCMs as regards the tropospheric thermal energy flux. A resolution lower limit that is 114 times larger than the perturbation to be resolved.

GCMs plain cannot resolve the impact, if any, of CO2 emissions. There’s just no doubt that the error analysis is correct.

Rich, “Why does the JCGM talk of variance and standard deviation of error, if M-X is not fairly represented as a random variable?

Because the same statistical formalisms are used to estimate the uncertainty of non-normal error. JCGM says that over, and yet over again, whenever it discusses systematic error.

Quoting P: P: Systematic error from uncontrolled variables is unknown in both sign and magnitude. Measurements that contain systematic errors can behave just like good data.

Rich, “Correct, but beware the “uncontrolled”.

I acknowledge the uncontrolled. These are external variables that enter into the experiment or observation and modify the result. They are of unknown impact, and can be cryptic in that the experimenter or observer may not know of even their possible existence.

That definitely describes the case for GCM cloud error.

Rich, “I agree, but again beware the “inadequate”: the GCMs do have well-constrained observables for comparison, namely global temperature data.

Air temperature data are not well-constrained. It’s merely that the systematic measurement error is completely neglected as a standard of practice. This error arises from wind speed effects and solar irradiance, which impact the air temperature inside the sensor housing.

Workers in the field, at UKMet, UEA Climate Research Unit, NASA GISS and Berkeley BEST completely ignore measurement error. Mention does not appear in their work. They uniformly make your assumption that all measurement error is random — an assumption made without warrant and in the face of calibration experiments that demonstrate its contradiction. Their carelessness makes a mockery of science.

I’ve published on air temperature measurement error here (900 kB pdf), here (1 MB pdf) and here (abstract), and have finished the analysis for another paper that is going to seriously expose the field for their incompetence.

The global air temperature record is not known to better than ±0.5 C during the entire 20th century, and the 21st outside of the US CRN. Prior to 1900, the uncertainty becomes very large.

The entire field lives on false precision — like the rest of consensus climatology.

GCMs that calibrate on the air temperature record incorporate that ±0.5 C as a systematic error affecting their parameterizations — parameters that have their own physical uncertainty bounds. GCMs all reproduce the global air temperature record despite varying by factors of 2-3 in their climate sensitivity. That alone should give you fair warning that their projections are not reliable.

Even the TOA balance is not known to better than ±3.9 Wm⁻². GCMs do not have any well-constrained observables on which to rely. But the unconstrainedness is studiedly ignored in the field, leaving folks like you, Rich, subject to false confidence.
Rich, “In any case, the crux of your paper isn’t about systematic error (= bias = mean error), it is about uncertainty (= standard deviation of error).

It’s about the impact of GCM calibration error on predictive reliability. GCM error in simulated cloud fraction is a systematic physical error.

GCM systematic cloud error is the source of the long-wave cloud forcing error. That error is combined into a per-GCM global annual average uncertainty statistic, which shows that the thermal flux of the simulated atmosphere is wrong. That calibration statistic conditions every single air temperature projection.

Rich, “It is no good using formulae from the JCGM and then saying you don’t believe in any of the mathematics underpinning it.

I’ve never suggested that I don’t believe the mathematics. I’ve pointed out that the mathematics is used with a greater range of error types than where the statistical assumptions generally obtain.

Your apparent position is that wherever that mathematics appears, those assumptions apply. They don’t. Empirical science does not fit within statistical constraint. Scientists need an estimate of reliability. So, they’ve dragoooed statistics for use in places and ways that probably raise the neck hairs of statisticians.

Rich, I’d be fascinated to know how many readers with science/maths degrees agree with me or with you, and how the type of degree might affect those statistics!”

Multiple trained people have agreed with my work. Including, in this thread alone, Tim Gorman, David Dibbell, and Geoff Sherrington.

Elsewhere physicist Nick, ferd berple, Paul Penrose and JRF in Pensacola (my understanding is they are engineers) in this thread, meteorologist Mark Maguire, and implicitly the authors of the papers listed here, — see especially S.J. Kline.

And that doesn’t exhaust the number.

You’ve just set them all aside, Rich.

Reply to  See - owe to Rich
February 18, 2020 10:29 am

Should be ‘dragooned.’
and
Rich, “I’d be fascinated to know how many readers with science/maths degrees agree with me or with you, and how the type of degree might affect those statistics!

Regrets about the mistakes.

Reply to  See - owe to Rich
February 16, 2020 12:22 pm

Rich,

“This is getting wearisome. You went up in my estimation last October, Tim, but have rather declined since. The problem is that you seem to be denying some basic mathematics, so this may well be the last time I respond to you. I’m going to mark new comments with ‘RN’ below.”

When you have to resort to ad hominems you have lost the argument.

“No, +/-e (uniform) means there is an interval (X-e,X+e) in which M must lie.”

Why? Why isn’t the interval X+2e or X-2e? The fact that you are using the term “uniform” means you are assumng that the true value of the measurement *is* the mean.

“False. Suppose I choose to throw a fair die once? Perhaps it has been thrown many times in the past to establish its fairness. Or perhaps never, but comes from a sample which has through trials been shown statistically fair.”

A dice throw has no uncertainty. This is just proof that you are still confusing uncertainty as being a random variable.

“True, X_i doesn’t have to be uniform, but science proceeds by making assumptions and testing them where possible. For simplicity, I have mostly been assuming uniformity, as indeed ahve you in many of your comments.”

But you are assuming that uncertainty is a random variable. It isn’t. That isn’t simplicity. That is ignoring the actual characteristics of each. And I only assumed an interval of uncertainty resembles the variance of a random variable in order to try and understand how to add the uncertainty intervals of two indpendent measurements.

“Yes, uncertainty is not error. It is the distribution, or often just the dispersion of, or often just the standard deviation of the dispersion of, the random variable which represents the error M-X before measurement”

And now we are back to calling uncertainty a random variable with a probability function. It isn’t.

“I know it approximates the true value because you, or someone else, swore blind that the thermometer reads accurately to within a certain error, some of whose parameters have been determined. If your thermometer is really only accurate to within 2 degrees, don’t tell me it’s accurate to within half a degree. And the uncertainty “interval” exists to reflect exactly those statements about its error.”

Error is not uncertainty. You are still confusing those two things as well. Uncertainty has multiple components. For the liquid in glass thermometers used in the 19th century and most of the 20th century the ability to read the thermometer meant knowing how to read a convex vs concave meniscus. Do *you* know how to do that? Don’t cheat and go look it up. Parallax is also a problem. A short person many times read the thermometer differently than a tall person.These are just two contributors to uncertainty.

“No, the theory of the distribution of the sum, or of a mean, of n random variables does not depend on them being multiple measurements of the same thing.”

The distribution of the mean is all about taking several sets of random samples from a population and calculating the mean of each sample. You then average the means of the samples to get a more accurate mean for the population. First, the distribution of the sample means tells you nothing about the distribution of the overall population. Even a heavily skewed population will have its distribution of means tend toward a normal distribution. Second, again, with one measurement you have no population from which to draw samples. All you have is a nominal value and an uncertainty interval. The uncertainty interval does not represent a probability function for values in the interval.

“Perhaps it will, but if we are to use the instrument to good effect we need an estimate, or perhaps worst case, of its uncertainty when we use it.”

But that unceratinty is not a probability function.

““Oh, we can’t measure global warming, we just believe in it”.”

That is *NOT* what I or most on here are saying. We *are* saying that the uncertainty associated with the global annual temperature average is so large that trying to say that Year X1 is 0.01deg hotter than Year X2 is just a joke. The uncertainty intverval just overwhelms the comparison. The proper assertion would be “we don’t know if this year was hotter than last year”.

The climate alarmists, and you apparently, want to ignore that there is any uncertainty in your results. The climate alarmists, and you apparently, keep trying to say that you can calculate the mean more and more accurately to any required number of significant digits with no uncertainty.

“So uncertainty of a mean is not the same as the uncertainty of a sum.”

But, again, the uncertainty of a mean tells you nothing about the population! And since uncertainty is not a probability function then using the distribution of the mean to calculate a more accurate mean is itself a meaningless exercise.

“RN: Again, you use the word “combine”, but the combining function is important. Mean and sum are not the same function, because mean scales down by the sample size. It’s elementary mathematics.”

No, the mean does not scale down for uncertainty. Uncertainty is not a probability function. There are no samples from the same population you can use to scale down the mean.

“RN: True, better things to do with my time I’m afraid.”

In other words you *know* that you have no answer. I didn’t think you would!

“RN: In the APP section, the type of measurand was not specified. Temperature may not be a good example for reasons you cite. That’s why I suggested an old style wobbly speedometer converted to digital output, but I expect there are better examples.”

But the issue at hand *IS* the temperature! Specifically the uncertainty associated with the global annaul average temperature!

February 15, 2020 2:33 pm

Just to say, I’m working my way through Rich’s essay and am now down to Section D “Emulator Parameters.”

Thus far, the head-post analysis has not withstood examination.

February 16, 2020 8:07 am

David Dibbell Feb 11 2:10 pm

Dr. Booth – two points here. First, about what the “bias” is. From Figure 16 and its caption, it looks like this “bias” is the difference of annual means, gridpoint by gridpoint, subtracting the CERES values averaged over the period 2001-2015 from the CM4.0 values averaged over the period 1980-2014. (If I have misunderstood this, I welcome a correction.) Second, then, I’m not suggesting the RMSE 6 W/m^2 value from Figure 15 corresponds somehow to the +/- 4 W/m^2 value appearing in Pat Frank’s paper. They are quite different.

I’ve looked at that paper. Figures 15, 18, 19, 20, 21 all mention CM4.0 for 1980-2014. But Figure 16 doesn’t. I would hope that there they extracted the relevant CM4.0 data to match the CERES years. For the latter you say “2001-2015”, but in fact that dataset runs to March 2017. Possibly they intersected to use 2001-2014 for both CERES and CM4.0. It doesn’t really matter except in respect of the credence to attach to Figure 16.

You say that you don’t think that their 6W/m^2 corresponds in any way to Pat Frank’s 4W/m^2. Why? The Figure 16 unit is “Annual mean outgoing longwave radiation”, but as an anomaly wrt CERES. Pat Frank’s unit is described in “The CMIP5 models were reported to produce an annual average LWCF RMSE = ± 4 Wm–2 year^–1 model^–1, relative to the observational cloud standard (Lauer and Hamilton, 2013). This calibration error represents the average annual uncertainty within any CMIP5 simulated tropospheric thermal energy flux and is generally representative of all CMIP5 models.” So for a single model, both seem to be in W/m^2/y uncertainty.

Have I gone wrong somewhere?

Rich.

Reply to  See - owe to Rich
February 16, 2020 8:27 am

Oops, I got confused between Figure 15 and 16 there. I was looking at Figure 16(c) there, which I think is a relevant source of 6W/m^2, but you were looking at Figure 15.

Figure 16(c) is also rather interesting, because it gives a bias, -1.83 W/m^2. This means the standard deviation is sqrt(6.02^2-1.83^2) = 5.7 W/m^2. So a previous thought I had, that LCF error might be mostly bias and not standard deviation ~ uncertainty, looks to be wrong. However, that is all averaged over grid cells, and it is clear from the colours on the figures that there are strong geographical components.

I’d like to come back to the timescale 1/40000 years, which approximates GCM “ticks”, and my calculation that in that timescale LCF error might be expected to be +/-0.02W/m^2. And I’d like to ask Nick Stokes again what effect such an error would have when propagating through ticks. He has previously said that auto-corrections are made on the basis of conservation of energy, but not on the radiative equations, of which LCF is presumably a part. Do GCM outputs after a year resemble a +/-4 W/m^2 difference (I don’t think so)?

Please can Nick, or some other GCM expert, comment on this?

Rich.

Reply to  See - owe to Rich
February 16, 2020 9:45 am

Rich, “Do GCM outputs after a year resemble a +/-4 W/m^2 difference (I don’t think so)?

You still don’t get it. Incredible.

Maybe that’s why you went into statistics rather than physical science, Rich. The scientific method seems far beyond your grasp.

Nick Stokes, by the way, will only encourage you in your misguidedness. Doing so is in his interest.

Reply to  Pat Frank
February 17, 2020 2:09 am

Interesting, Pat. In the Tim Gorman school of discourse, you have just lost the argument by making an ad hominem comment. But fortunately I don’t belong to that school.

Your paper does many fine things, but in my opinion it only scratches the surface of what the GCMs can, or cannot, tell us. I am trying to delve deeper into how error actually propagates within them, not just how it might appear to from those scratches on the surface.

So far we have usefully discovered from Nick that some automatic error correction happens through application of conservation of energy, but apparently that doesn’t apply to the basic quantities of radiation floating around, and that is where I want to learn more. I have a nascent idea which I may share later.

Rich.

Reply to  See - owe to Rich
February 17, 2020 12:26 pm

That wasn’t an ad hominem argument, Rich. It was an observation based on many-times repeated experience.

Repeated experience would show you that I could never be an Olympic runner. It would not be an ad hominem for you to tell me so.

Error in a prediction is not corrected by adjustment out to falsely reproduce a calibration observable. The underlying physics remains incorrect.

Nick has evidenced no understanding of how to judge predictive error and uncertainty. Nor has Ken Rice (Mr. ATTP), nor any climate modeler of my experience.

Rich, you’d never let anyone get away with subtracting an error from a result obtained using defective statistics, and who then claims the method is predictively useful.

Why on earth would you let climate modeling get away with the same shenanigan?

Reply to  See - owe to Rich
February 18, 2020 10:31 am

Rich,
About “ticks”, please see this paper on the GFDL AM4.0/LM4.0 components of CM4.0. This gives the time steps, which are not the same for all aspects of the simulation. Search for “step”.

https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2017MS001209

At Table 1 and the related text, there is an interesting explanation of the choice of time step for shortwave radiation.

About the treatment of conservation of energy, see the Supplemental Information, for which a link to a pdf is provided at the end of the paper. I have pasted the first section below.

***************************
21 S1 Treatment of energy conservation in dynamical core
22 The dissipation of kinetic energy in this model, besides the part due to explicit ver-
23 tical diffusion, occurs implicitly as a consequence of the advection algorithm. As a re-
24 sult, the dissipative heating balancing this loss of kinetic energy cannot easily be com-
25 puted locally, and is, instead returned to the flow by a spatially uniform tropospheric
26 heating. This dissipative heating associated with the advection in the dynamical core
27 in AM4.0 is ~ 2 Wm−2.
28 There is also another energy conservation inconsistency in that the energy conserved
29 by the dynamical core involves a potential energy computed with the virtual tempera-
30 ture, while the model column physics uses temperature without the virtual effect, assum-
31 ing that the conservation of internal plus potential energy, vertically integrated, reduces
32 to the conservation of vertically integrated enthalpy, cpT . This discrepancy averages to
33 0.4 Wm−2. We adjust the dissipative heating correction in the dynamical core to ac-
34 count for this discrepancy. As a result, there is good consistency, within 0.1 Wm−2, be-
35 tween energy fluxes at the TOA and at the surface in equilibrium, with the net down-
36 ward heat surface flux defined as Rsfc −LvE −S −LfPsnow. Here Rsfc is net down-
37 ward LW + SW radiative flux, E surface evaporation of vapor, S upward sensible heat
38 flux, Psnow surface precipitation flux of frozen water, Lv and Lf are the latent heat of
39 vaporization and fusion respectively. A remaining problem is that these latent heats are
40 assumed to be independent of temperature. Removing the latter inaccuracy in the most
41 appropriate fashion would involve multiple changes to the code and was postponed to
42 another development cycle.
**************************

DD

Reply to  See - owe to Rich
February 16, 2020 8:29 am

And so those geographical components could have high bias and low standard deviation…

Reply to  See - owe to Rich
February 16, 2020 4:50 pm

Rich,
Your question to me is: “You say that you don’t think that their 6W/m^2 corresponds in any way to Pat Frank’s 4W/m^2. Why?”
I actually said, “I’m not suggesting the RMSE 6 W/m^2 value from Figure 15 corresponds somehow to the +/- 4 W/m^2 value appearing in Pat Frank’s paper. They are quite different.”
But in any case, the answer to “why?” is apparent by considering this excerpt from Pat Frank’s opening paragraph in his paper: “A directly relevant GCM calibration metric is the annual average +/- 12.1% error in global annual average cloud fraction produced within CMIP5 climate models. This error is strongly pair-wise correlated across models, implying a source in deficient theory. The resulting long-wave cloud forcing (LWCF) error introduces an annual average +/- 4 W/m^2 [dd format edit] uncertainty into the simulated tropospheric thermal energy flux.” So it is a global annual average that is being characterized.
On the other hand, the RMSE 6 W/m^2 for outgoing longwave we’ve been referring to, for GFDL’s CM4.0, from both Figure 15 and Figure 16(c) in the referenced article, characterizes the gridpoint-by-gridpoint bias as defined in the caption to Figure 16. It is not a characterization of differences between model global annual average values and reference global annual average values. But nevertheless it is similarly revealing, as I see it, that the GCM’s are simply not capable of resolving outgoing longwave emissions, or other related fluxes and conditions, closely enough to measured values to support projections of a temperature response to greenhouse gas forcing in a stepwise simulation.
I hope this helps by answering your question. I do see your later reply.
DD

Reply to  David Dibbell
February 19, 2020 8:00 am

Dibbell Feb 16 4:50pm

David, thanks for the clarification. I am certainly interested in the standard deviation = uncertainty in the annual average LWCF. However, when Googling I came across Calisto et al (2014) “Cloud radiative forcing intercomparison…” at https://www.ann-geophys.net/32/793/2014/angeo-32-793-2014.pdf and was intrigued by their Table 1. For model HadCM3 for example, the mean LWCF over 10 years was 21.2 W/m^2, but the biggest anomalies in the 10 yearly data points were apparently +0.68 and -0.64 W/m^2. That isn’t anything like +/-4 W/m^2, and even less like that value multiplied by 3 = sqrt(9) for accumulation of error over those 10 years.

Am I comparing apples to oranges again? I’ll have to look again at the paper Pat Frank cited for that.

Rich.

Reply to  See - owe to Rich
February 20, 2020 6:12 am

See – owe to Rich February 19, 2020 at 8:00 am

Rich, in the paper you linked, take a look at figure 1 lower left panel, and figure 7, the two lower panels for sea and land. From these charts the CLT (cloud amounts) from the CMIP5 models differ from the CERES values notably. It is from such differences that the theory deficiency becomes most apparent, and from which the +/- 4 W/m^2 uncertainty follows, as I understand it in Pat Frank’s paper. The fact that the Table 1 model outputs (again, in the paper you linked) look “better” than that implies compensating errors in the models.
DD

Reply to  See - owe to Rich
February 20, 2020 12:43 pm

Thanks, David. Actually, I’m not sure the Table 1 values do look better: the CERES LWC[R]F has only 2 (almost 1) models outlying it, so a strong bias is noticeable there. Your comment prompted me to look again.

So in addition to uncertainty from random errors, there is a sizeable systematic error/bias. From my comments on uncertainty versus bias further below, this means that if this bias was not corrected the models would drift even faster away from reality, linear in time, than the square root in time for uncertainty. Since this doesn’t occur, that bias must be partially cancelled by something else.

Nick Stokes has said that auto-correction occurs for “conservation of energy”, but did not admit to any auto-correction in the radiative forcings themsleves. But perhaps there is some, to explain the results. Someone, somewhere, must have data on this.

Thanks very much for the alert on this.

Rich.

Reply to  See - owe to Rich
February 20, 2020 3:28 pm

See – owe to Rich February 20, 2020 at 12:43 pm

Rich,
About your point, “Since this doesn’t occur, that bias must be partially cancelled by something else.” In the paper you linked, search the term “compensating error” concerning cloud-related radiative forcing. Apparent stability is evidently achieved by tuning the model parameters. Here is a quote from Lauer and Hamilton 2013, “The problem of compensating biases in simulated cloud properties is not new and has been reported in previous studies.”

About your interest in how the conservation of energy is addressed, please see my comment and link elsewhere on this page about GFDL AM4.0/LM4.0, which you may not have explored yet. David Dibbell February 18, 2020 at 10:31 am

DD

Reply to  See - owe to Rich
February 21, 2020 2:18 am

David, thanks again. There are 7 matches to ‘compens’ in the paper, each of them interesting. Yes, during model calibration there is no doubt an objective function to be minimized (roughly RMS of model-reality I would guess) which will force cancellation of biases. Willis Eschenbach is no doubt appalled at the number of free parameters available to do that, but happy that they are at least based on credible physics. Oh, Willis, sorry if I am putting words in your mouth, I know you don’t like that…

This reduction of bias will have its own standard uncertainty associated with it, but I don’t think that Pat Frank has approached it that way; perhaps he will revisit his derivation of +/-4 W/m^2.

February 16, 2020 11:47 am

sky:

“This contention flies in the face of the fact that the variance of time-series of temperature in any homogeneous climate area DECREASES as more COHERENT time-series are averaged together.”

The measurements are not time coherent. What makes you think they are? They are maximum temperatures and maximum temperatures can occur at various times even in an homogeneous climate area. They are minimum temperatures and minimum temperatures can occur at various times even in an homogeneous climate area. Even stations only a mile apart can have different cloud coverage and different wind conditions, both of which can affect their readings.

“Each measurement–when considered as a deviation from its own station mean”

How do you get a station mean? That would require multiple measurements of the same thing and no weather data collection station that I know of does that.If a measurement device has an uncertainty interval associated with it no amount of calculating a daily mean from multiple measurements can decrease that uncertainty interval.

“is a SINGLE REALIZATION, but NOT a POPULATION of one, as erroneously claimed. ”

Of course it is a population of one. And it has an uncertainty interval.

“Sheer ignorance of this demonstrable empirical fact”

The empirical fact is that stations take one measurement at a time, separated in time from each other. Each stations measures a different thing, like two investigators of which one measures the height of a pygmy and the other the height of a Watusi. How do you combine each of those measurements into a useful mean? How does combining those two measurements decrease the overall uncertainty associated with the measurements?

“along with the simplistic presumption that all measurements are stochastically independent, is what underlies the misbegotten random-walk conception of climatic uncertainty argued with Pavlovian persistence here.”

What makes you think the temperature measurements are made at random? That *is* the definition of stochastic, a random process, specifically that of a random variable.

They *are* independent. The temperature reading at my weather station is totally independent of the temperature reading at another weather station 5 miles away! They are simply not measuring the same thing and they are not the same measuring device. And they each have their own uncertainty interval that are independent of each other.

If you had actually been playing attention, uncertainty does *not* result in a random walk. Uncertainty is not a random variable that provides an equal number of values on each side of a mean. And it is that characteristic that causes a random walk. Sometimes you turn left and sometimes you turn right. An uncertainty interval doesn’t ever tell you which way to turn!

(Rescued from spam bin) SUNMOD

Reply to  Tim Gorman
February 17, 2020 7:02 pm

Tim, 1sky1’s coherent time series remark probably refers to Hansen and Lebedeff (1987) Global Trends of Measured Surface Air Temperature JGR 92 (Dll), 13,345-13,372.

Figure 3 shows the correlation of air temperature, with r averaging ~0.5 at 1200 km for the northern hemisphere.

Hansen and Lebedeff didn’t consider measurement error at all.

February 18, 2020 2:40 pm

Pat Feb 17 12:14pm

I’m starting a new comment thread to reply to just two of the statements from that comment of yours.

First: “Unknown bias contributes uncertainty to a measurement… The JCGM there specifically indicated that uncertainty arises from systematic error.”

I believe you are wrong, wrong, wrong, but I think we need an independent expert arbitrator (where can we find one?). The main two parameters of a probability distribution are the mean and standard deviation. Do you agree? The mean is a measure of location, and the standard deviation is a measure of dispersion, or variation, about that mean. In the case of D = M-X I am using b for the mean and s for the standard deviation (which is also the standard deviation of M under the assumption that X is an unknown but fixed value).

The JCGM 2.2.3 says that ‘uncertainty’ “characterizes the dispersion of the values that could reasonably be attributed to the measurand NOTE 1 The parameter may be, for example, a standard deviation (or a given multiple of it), or the half-width of an interval having a stated level of confidence.” So this is s; there is no room for a component involving the mean b there.

NOTE 3 there says “It is understood that the result of the measurement is the best estimate of the value of the measurand, and that all components of uncertainty, including those arising from systematic effects, such as components associated with corrections and reference standards, contribute to the dispersion.” The contribution from systematic effects is not, in my insufficiently humble opinion, from the systematic error (= bias = b) itself, but from perfectly valid attempts to reduce the bias via calibration. But this point is quite subtle, and as I say, we probably won’t agree on it so a third party is needed. It makes sense to me mathematically, since the systematic error is a fixed unknown value, with no dispersion/variance, but perhaps it doesn’t make sense to you from a physicist’s point of view. It’s funny, isn’t it, how difficult it is to agree on what “mean” means, or how difficult it is to interpret possibly contradictory passages from various bibles?

Second: “Under 4.3.7, “In other cases, it may be possible to estimate only bounds (upper and lower limits) for Xi, in particular, to state that “the probability that the value of Xi lies within the interval a− to a+ for all practical purposes is equal to one and the probability that Xi lies outside this interval is essentially zero”. If there is no specific knowledge about the possible values of Xi within the interval, one can only assume that it is equally probable for Xi to lie anywhere within it…”

It is unfortunate that the JCGM gives any weight to that case, because it is very unlikely and usually suggests that information has been wantonly thrown away, and leads to bizarre effects as in my essay. And wow, I have just noticed the following in 4.3.9: “Such step function discontinuities in a probability distribution are often unphysical. In many cases, it is more realistic to expect that values near the bounds are less likely than those near the midpoint. It is then reasonable to replace the symmetric rectangular distribution with a symmetric trapezoidal distribution having equal sloping sides (an isosceles trapezoid)”. This is amazing, because I introduced a trapezoidal distribution in my essay with no prior knowledge of this 4.3.9, which I’ve only just seen. And I did that because the cliff edge of a uniform leads, as said, to bizarre effects. So please don’t rely on 4.3.7 to be a good corner of the uncertainty world in which to reside. Repeat: “often unphysical”; does that worry a physicist?

Reply to  See - owe to Rich
February 18, 2020 4:21 pm

Rich, “…but I think we need an independent expert arbitrator

Why do we need an arbiter, Rich? The JCGM is right there in front of our eyes.

Here’s what you insisted is wrong, wrong, wrong:The JCGM there specifically indicated that uncertainty arises from systematic error.”

Here’s what the JCGM says in the note under 3.3.3: “In some publications, uncertainty components are categorized as “random” and “systematic” and are associated with errors arising from random effects and known systematic effects, respectively.

And, “B.2.22 systematic error mean that would result from an infinite number of measurements of the same measurand carried out under repeatability conditions minus a true value of the measurand

NOTE 1 Systematic error is equal to error minus random error.

NOTE 2 Like true value, systematic error and its causes cannot be completely known.

NOTE 3 For a measuring instrument, see “bias” (VIM:1993, definition 5.25). [VIM:1993, definition 3.14]

Guide Comment: The error of the result of a measurement (see B.2.19) may often be considered as arising from a number of random and systematic effects that contribute individual components of error to the error of the result. Also see the Guide Comment to B.2.19 and to B.2.3.

Note under B2.23 and B2.24: “Since the systematic error cannot be known perfectly, the compensation cannot be complete.

In GCM projections of future climate, the systematic cloud fraction error cannot be corrected at all.

Under E 3.6 c) “it is unnecessary to classify components as “random” or “systematic” (or in any other manner) when evaluating uncertainty because all components of uncertainty are treated in the same way.

Benefit c) is highly advantageous because such categorization is frequently a source of confusion; an uncertainty component is not either “random” or “systematic”. Its nature is conditioned by the use made of the corresponding quantity, or more formally, by the context in which the quantity appears in the mathematical model that describes the measurement. Thus, when its corresponding quantity is used in a different context, a “random” component may become a “systematic” component, and vice versa.

It’s right there in front of you, Rich. Known systematic errors can be partially compensated. Unknown systematic errors cannot be compensated at all.

Partial or uncompensated, systematic error contributes to uncertainty in a result.

When a deficient theory is used in a step-wise series of calculations, the systematic errors are unknown, the biases are not known to be constant, and model calibration error must be propagated into the result.

No arbitrator needed. Or wanted. No arguments from authority. I’ll think for myself, thanks.

Reply to  Pat Frank
February 19, 2020 10:00 am

Pat, a few hours ago I posted a theorem which apparently disproves your interpretation of the JCGM. But it appears to have got lost in moderation, which will suit you!

Rich.

Reply to  See - owe to Rich
February 19, 2020 7:58 pm

Rich, “But it appears to have got lost in moderation, which will suit you!

Implicit ad hominem, Rich.

You implied I’m opportunistically unfair. A remarkably unfair view itself considering how thoroughly I’ve engaged your arguments.

You’ve lost the argument by your own interpretation of Tim Gorman’s standard.

I’ll reply to the rest later. Consider this for context, though: all of mathematics including statistics are complicated ways of saying a = a.

Reply to  Pat Frank
February 20, 2020 1:12 am

Pat, any such implication was not intended. I know from my end that engaging with arguments costs time and effort. So I imagined that if you had one fewer argument to address, then that would suit you. As for “Tim Gorman’s standard”, in my comment which you link I did say “But fortunately I don’t belong to that school”. In other words, though it is better to avoid ad hominem remarks, and you claimed you avoided it in your own remarks about me, the argument is still the argument and has to rest under its own merits.

I hope that clears things up a bit, and (ad hominem) I retain a good deal of respect for you. Whilst retaining some healthy disagreement…such as on the “a = a” remark.

Rich.

Reply to  See - owe to Rich
February 19, 2020 6:25 am

Pat Feb 18 4:21pm

“Why do we need an arbiter?” You’re right that we don’t, but for the wrong reason. And the reason is that I have a proof that your interpretation of the JCGM is wrong.

Theorem: Assume that prior to a measurement, the error which will occur may be considered to be a random variable with mean b and variance s^2. Let v = s^2 and let g(v,b) be a differentiable function which defines the “uncertainty” of the measurement. Assume that the correct formula for the uncertainty of a sum of n independent measurements with respective means b_i and variances v_i is that given by JCGM 5.1.2 with unit differential: g(v,b) = sqrt(sum_i g(v_i,b_i)^2) where v = sum_i v_i, b = sum_i b_i. Then if g(v,b) is consistent under rescaling, i.e. multiplying all measurements by k then multiplies g by k, g(v,b) is independent of b,

Proof: Let n = 2. (Shorten v_1 to v1 etc.) Then v = v1+v2, b = b1+b2,

g(v,b)^2 = g(v1,b1)^2 + g(v2,b2)^2

We now differentiate this with respect to v1, using rules of the form (d/dv1)(h(x)) = (d/dx)(h(x)) dx/dv1 with h(x) replaced by g(x,y)^2, x replaced by v=v1+v2, and y replaced by b=b1+b2. Since d(v1+v2)/dv1 =dv1/dv1=1, and dv2/dv1 = 0, overall we get

2g(v1+v2,b1+b2) dg(v1+v2,b1+b2)/dv = 2g(v1,b1)dg(v1,b1)/dv

Because this is true for any values of the arguments, the function g(v,b)dg(v,b)/dv is a constant, say k_v, where the v is a label. Then integrating with respect to v,

int k_v dv = l(b)+vk_v = int g(v,b)dg(v,b) = g(v,b)^2/2

Note that because we integrated over v, the constant of integration is independent of v but might depend on b, hence the term l(b).

In the same way from differentiating with respect to b1, we deduce that

m(v)+bk_b = g(v,b)^2/2

The only consistent solution is g(v,b) = sqrt(2vk_v + 2bk_b). We may arbitrarily choose the scaling constant k_v to be ½ (dimensionless). Let the units of the measurement be called the ‘tinu’. Then v has dimension tinu^2, and if k_b=0 then g(v,b) has dimension tinu. But because b has dimension tinu, there is a problem if k_b is nonzero. To preserve dimension k_b must have dimension tinu. Suppose we choose k_b = i/2 where i is 1 tinu. Then

g(v,b) = sqrt(v+ib)

Let g’ = g(v,b) for a particular v and b. Now change scale to milli-tinus. g should become 1000g’ in these units. But v is a million times as big whilst b is a thousand times as big in these units, so the right hand side is now sqrt(10^6 v + 10^3 ib) which is not 10^3 g’. Therefore g(v,b) is not invariant under scale.

QED

I did not find that theorem in a book, but had the idea for it overnight. The “dimension” problem means that one might like to use sqrt(s^2+b^2) for the uncertainty, which is at least commensurate, but it fails the other mathematical requirements. The conclusion is that the JCGM uncertainty formula is what it says it is, which is the square root of a variance. A variance cannot include any terms from bias, which is the neat four letter word for “systematic error”. Only when a correction term is used to reduce the bias, and this term perforce is based on data which have an associated uncertainty (i.e. variance), does an uncertainty term associated with systematic error come into play.

So, Pat, are you going to disprove my theorem, prove its assumptions are unjustified, or merely ignore it? More to the point, if you have data on random error and systematic error, how exactly are you going to combine them into an uncertainty value? What is your magic function?

Rich,

(Rescued from spam bin) SUNMOD

Reply to  See - owe to Rich
February 20, 2020 2:12 am

Rich Feb 19 6:25am

Following up on my theorem, I shall examine some practical consequences. Suppose that some radiative forcing is in error each year by an average of 1 W/m^2, and that the error accumulates from one year to the next. And suppose that that is RMSE (Root Mean Squared Error) rather than standard deviation (numerous papers do quote RMSE). Now RMSE, which I’ll call r, equals sqrt(s^2+b^2), where s is standard deviation (uncertainty) and b is bias (systematic error). What are the consequences of different proportions of s^2 and b^2 going into r^2 = s^2+b^2?

Suppose b = 0. Then s = 1, and after, say, 81 years, the square root law of propagation of uncertainty means that the uncertainty is +/-9 W/m^2.

Now suppose that s and b are each 1/sqrt(2) = +/-0.707. Then after 81 years the uncertainty is +/-0.707*9 ~ +/-6.4 W/m^2. But the bias accumulates linearly. b is either 0.707 or -0.707, and the calibration should have told us which; let’s assume positive. Then after 81 years the systematic error will have accumulated to +0.707*81 = +57.3 W/m^2. That is huge!

Before calibration, in the absence of external information, very little would be known about s and b. The process of calibration allows estimates of s and b to be determined, never perfectly, but often good enough to be useful. The difference between s and b, though, is that s cannot be corrected, in the sense that a single future measurement in a scenario where s and b are still valid, is still subject to the (standard) uncertainty +/-s. But b can be corrected, and if it is anywhere near as large as s, and many measurements are going to be meaningfully summed, then the above example shows that it is vital that this be done, since otherwise error grows way beyond the limits predicted by uncertainty calculations.

If bias is corrected by subtracting the mean error z over n observations, then the variance of z is s^2/n and so the extra uncertainty induced by the correction is +/-s/sqrt(n). My interpretation of the JCGM where it talks about uncertainties associated with systematic error is exactly this – correction of bias, not bias itself.

I am, as usual, open to other rational explanations and formulae, provided that they stand up to mathematical scrutiny. I believe that the JCGM is based on sound mathematics, insofar as mathematics can apply at all to measurement, but that some of its statements could be tightened up to reduce confusion among practitioners.

Rich.

Reply to  See - owe to Rich
February 19, 2020 11:31 am

Rich, “since the systematic error is a fixed unknown value

Not when it is caused by uncontrolled variables. Variables, Rich, as in changing over time.

Varying environmental impacts do produce a dispersion of error, which has an empirical standard deviation even though not normally distributed.

Rich, “because it is very unlikely …

No it’s not. How are you able to pronounce on the likelihood of cases?

Rich, “and usually suggests that information has been wantonly thrown away,

And you know that, how? What is your experience carrying out physical experiments?

In X-ray spectroscopy, the x-ray beam can heat the monochromator crystals locally, causing a slow unknown drift in energy. It’s known to happen, and one usually has to live with it. Samples change in the beam. There are small unknown errors in spectrum calibration and normalization. There is the resolution limit of the spectrometer itself. All of that combines into a rectangular uncertainty in the energy position of an observed absorption feature. That sort of thing is a common part of measurement.

Trapezoidal uncertainties require unfounded assumptions about the error distribution. Unprofessional wishful thinking.

I consider them to be dishonest when they are objectively unjustified.

1sky1
February 18, 2020 3:23 pm

What becomes unmistakably clear here is the chronic inability to grasp the essential difference between the laboratory case of making a chain of independent measurements or estimates of a single FIXED quantity and the in situ case of measuring or modeling the auto-correlated and spatially coherent time-series of any particular geophysical VARIABLE. Only that can explain such totally fantastic claims as:

The measurements are not time coherent…How do you get a station mean? That would require multiple measurements of the same thing and no weather data collection station that I know of does that…
The empirical fact is that stations take one measurement at a time, separated in time from each other. Each stations measures a different thing…The temperature reading at my weather station is totally independent of the temperature reading at another weather station 5 miles away…etc. etc.

In reality, the measurement uncertainties of physically sheltered thermometers, which have been investigated for more than century by meteorologists, indeed produce sporadic episodes of bias,. dependent upon wind speed and insolation. So does the practice of taking the mid-range daily reading (Tmax + Tmin)/2 as the daily “mean”. Nevertheless, these well-known shortcomings do not substantially affect the utility of century-long TIME-SERIES obtained at well-maintained met stations in studying the variations of the “climate signal.”

The very fact that cross-spectral coherency is typically high (>0.75) over hundreds of kilometers at the important multidecadal frequencies shows that, contrary to Frank’s claim, the signal-to-noise ratio is more than adequate for the intended purpose. His notion that the “uncertainty” compounds with every annual time-step, like a random-walk of independent increments, is simply contradicted in every corner of the globe where quality station data are available. It’s on that basis, not on H&L’s superficial zero-lag correlation analysis, that I comment here. It takes, however, a modicum of knowledge of stochastic processes in the geophysical setting to comprehend that.

Reply to  1sky1
February 18, 2020 3:55 pm

1sky1, “The very fact that cross-spectral coherency is typically high (>0.75) over hundreds of kilometers at the important multidecadal frequencies shows that, contrary to Frank’s claim, the signal-to-noise ratio is more than adequate for the intended purpose.

As mentioned above, 1sky1, you’re in for a surprise on this claim.

His notion that the “uncertainty” compounds with every annual time-step, like a random-walk of independent increments, is simply contradicted in every corner of the globe where quality station data are available.

I have made no claims about measurement uncertainty compounding with any time-step. Whoever wrote that is improperly conflating my work on propagation of GCM calibration error with air temperature measurements, thus demonstrating a lack of understanding of both.

quality station data” That’s rich. There have been none, from a climatological perspective, until the CRN system came on line.

It’s on that basis, not on H&L’s superficial zero-lag correlation analysis, that I comment here. It takes, however, a modicum of knowledge of stochastic processes in the geophysical setting to comprehend that.

Use of inflammatory language — superficial — does not make an argument. It makes a polemic.

H&L computed a GCM annual average long wave cloud forcing calibration error. Global cloud fraction error is inherent in the model and LWCF error is therefore present in every single time step. Its propagation through GCM air temperature projections follows immediately.

I’ve yet to encounter a climate modeler who knows the first thing about physical error analysis. That diagnosis includes you, 1sky1.

1sky1
Reply to  1sky1
February 18, 2020 5:51 pm

I have made no claims about measurement uncertainty compounding with any time-step.

I was clearly addressing Gorman’s claim, which mimics Frank’s academic notion of modeling uncertainty. The only surprise here is the total lack of comprehension what high cross-spectral coherency between time-series implies vis a vis S/N ratio. It’s not polemics, but the incisiveness of that metric compared to the zero-lag “correlation of air temperature, with r averaging ~0.5 at 1200 km for the northern hemisphere,” as reported by H&L, that renders the latter quite superficial. And their paper was about “Measured Surface Air Temperature,” not model results.

BTW, cross-spectral analysis between output and input also provides an analytically superior way of calibrating or determining the frequency response characteristics of instruments as well as of mathematical models. Alas, that is terra incognita to those who pitch their academic preconceptions about real-world physics without any serious empirical study.

February 18, 2020 7:39 pm

1sky1″ The only surprise here is the total lack of comprehension what high cross-spectral coherency between time-series implies vis a vis S/N ratio.

There’s no lack of comprehension, 1sky1. I know exactly what you mean. I’d agree, too, if I didn’t know better. As mentioned above, you have a surprise in store regarding that claim.

It’s not polemics, but the incisiveness of that metric compared to the zero-lag “correlation of air temperature, with r averaging ~0.5 at 1200 km for the northern hemisphere,” as reported by H&L, that renders the latter quite superficial.

As it has turned out, the coherence doesn’t make the error analysis superficial. I’ve done further analysis, 1sky1. It’s just as yet unpublished.

And their paper was about “Measured Surface Air Temperature,” not model results.

Apologies for my mistake. I mistook H&L to mean Lauer & Hamilton, rather than your intended, Hubbard and Lin.

February 18, 2020 7:41 pm

1sky1, “BTW, cross-spectral analysis between output and input also provides an analytically superior way of calibrating or determining the frequency response characteristics of instruments as well as of mathematical models.

BTW, cross comparison among model outputs reveals nothing about accuracy.

Comparison of models with observations reveals nothing about accuracy either, given model tuning.

1sky1
February 20, 2020 5:15 pm

While typing a final comment that required much cross-referencing, WUWT spontaneously took me off the page. Upon returning to it, I was dismayed to find the comment box empty. Sadly, this is not the first such experience here. Since I value my time, I’ll simply point to the latest non sequitur employed here to deflect substantive scientific criticism of Frank’s extravagant claims:

[C]ross comparison among model outputs reveals nothing about accuracy.

Such cross-comparisons were never even raised here in regard to model accuracy, which must necessarily be judged (pre-tuning or not) against the best available observations, using the most incisive analytic methods.
His claim that “I have a surprise in store regarding that claim” remains an empty piece of rhetoric.

February 21, 2020 3:26 am

I believe this is the last day on which comments are open here. So I’ll make some valedictory remarks. Since last August I have spent more time than I would have intended in trying to understand the theory behind Pat Frank’s paper, but for the most part it has been rewarding.

It has been quite a challenge to stay abreast of the various comments on my essay and matters arising, and I thank many commenters for their help and insights. It is perhaps worth mentioning some of the successes and failures which I feel I have had here.

I have failed to persuade Tim Gorman that the uncertainty of a mean is smaller than the uncertainty of the sum from which it was derived. You can lead a horse to water, but you can’t make it drink.

I have failed to persuade Pat Frank that his view is wrong about the way the JGCM allows systematic error to enter into the formula for combining uncertainties. Though, he has not yet replied to my comments of Feb 19 6:25am and Feb 20 2:12am, so it is possible that he is thinking hard about his position on that.

I have failed to get any information on whether my parameter ‘a’ in Equation (1) may be greater than zero. The consequences are rather important for climate and model stability.

I have succeeded in eliciting from Nick Stokes some details on GCM corrections for conservation of energy. I think this is mainly a mathematical computation issue, with modest errors being corrected, and not a big issue.

I have succeeded in getting help from David Dibbell on how the greater systematic errors in components of the GCMs are effectively bound to get approximately cancelled during model calibration/fitting. Nevertheless it appears that GCMs with significantly different values of sensitivity to CO2 can be made to fit, which is why AR5 was not able to narrow the uncertainty range of sensitivity. Because I have criticized Pat Frank’s treatment of cloud uncertainty and its propagation, some readers have concluded that I do this to support the GCM modellers. Nothing could be further from the truth, and I remain extremely sceptical about them.

I have succeeded in learning more, and I hope teaching more, about the science of uncertainty, but it has more nuances than I expected and I do not claim to comprehend all the problems which can arise in its practical use.

To conclude, I continue to believe that emulators of GCMs are a good idea, but that in order to address GCMs’ uncertainty they need to reproduce the error regime of the GCMs as well as the mean, which is no doubt an extra challenge. Just as GCMs need to be compared with each other (especially how does a high sensitivity GCM approximately match a low sensitivity one), emulators need to be compared with each other and with GCMs.

Best wishes to all, and I am going on a well-timed holiday to the Alps!

Rich.