Tendency, Convenient Mistakes, and the Importance of Physical Reasoning.

Last February 7, statistician Richard Booth, Ph.D. (hereinafter, Rich) posted a very long critique titled, What do you mean by “mean”: an essay on black boxes, emulators, and uncertainty” which is very critical of the GCM air temperature projection emulator in my paper. He was also very critical of the notion of predictive uncertainty itself.

This post critically assesses his criticism.

An aside before the main topic. In his critique, Rich made many of the same mistakes in physical error analysis as do climate modelers. I have described the incompetence of that guild at WUWT here and here.

Their view is wrong.

Unknown physical competence and accuracy describes the current state of climate models (at least until recently. See also Anagnostopoulos, et al. (2010), Lindzen & Choi (2011), Zanchettin, et al., (2017), and Loehle, (2018)).

GCM climate hindcasts are not tests of accuracy, because GCMs are tuned to reproduce hindcast targets. For example, here, here, and here. Tests of GCMs against a past climate that they were tuned to reproduce is no indication of physical competence.

When a model is of unknown competence in physical accuracy, the statistical dispersion of its projective output cannot be a measure of physical error or of predictive reliability.

Ignorance of this problem entails the very basic scientific mistake that climate modelers evidently strongly embrace and that appears repeatedly in Rich’s essay. It reduces both contemporary climate modeling and Rich’s essay to scientific vacancy.

The correspondence of Rich’s work with that of climate modelers reiterates something I realized after much immersion in published climatology literature — that climate modeling is an exercise in statistical speculation. Papers on climate modeling are almost entirely statistical conjectures. Climate modeling plays with physical parameters but is not a branch of physics.

I believe this circumstance refutes the American Statistical Society’s statement that more statisticians should enter climatology. Climatology doesn’t need more statisticians because it already has far too many: the climate modelers who pretend at science. Consensus climatologists play at scienceness and can’t discern the difference between that and the real thing.

Climatology needs more scientists. Evidence suggests many of the good ones previously resident have been caused to flee.

Rich’s essay ran to 16 typescript pages and nearly 7000 words. My reply is even longer — 28 pages and nearly 9000 words. Followed by an 1800-word Appendix.

For those disinclined to go through the Full Tilt Boogie below, here is a short precis followed by a longer summary.

The very short take-home message: Rich’s entire analysis has no critical force.

A summary list of its problems:

1. Rich’s analysis shows no evidence of physical reasoning.

2. His proposed emulator is constitutively inapt and tendentious.

3. Its derivation is mathematically incoherent.

4. The derivation is dimensionally unsound, abuses operator algebra, and deploys unjustified assumptions.

5. Offsetting calibration errors are incorrectly and invariably claimed to promote predictive reliability.

6. The Stefan-Boltzmann equation is inverted.

7. Operators are improperly treated as coefficients.

8. Accuracy is repeatedly abused and ejected in favor of precision.

9. The GCM air temperature projection emulator (paper eqn. 1) is fatally confused with the error propagator (paper eqn. 5.2)

10. The analytical focus of my paper is fatally misconstrued to be model means.

11. The GCM air temperature projection emulator is wrongly described as used to fit GCM air temperature means.

12. The same emulator is falsely portrayed as unable to emulate GCM projection variability, despite 68 examples to the contrary.

13. A double irony is that Rich touted a superior emulator without ever displaying a single successful emulation of a GCM air temperature projection.

14. Assumed away all the difficulties of measurement error or model error (qualifying Rich to be a consensus climatologist).

15. Uncertainty statistics are wrongly and invariably asserted to be physical error or an interval of physical error.

16. Systematic error is falsely asserted as restricted to a fixed constant bias offset.

17. Uncertainty in temperature is falsely and invariably construed to be an actual physical temperature.

18. Empirically unjustified invariably ad hoc assumptions of error as a random variable.

19. The JCGM description of standard uncertainty variance is self-advantageously misconstrued.

20. The described use of rulers or thermometers is unrealistic.

21. Readers are advised to record and accept false precision.

A couple of preliminary instances that highlight the difference between statistical thinking and physical reasoning.

Rich wrote that, “It may be objected that reality is not statistical, because it has a particular measured value. But that is only true after the fact, or as they say in the trade, a posteriori. Beforehand, a priori, reality is a statistical distribution of a random variable, whether the quantity be the landing face of the die I am about to throw or the global HadCRUT4 anomaly averaged across 2020.”

Rich’s description of an a priori random variable status for some as-yet unmeasured state is wrong when the state of interest, though itself unknown, falls within a regime treated by physical theory, such as air temperature. Then the a priori meaning is not the statistical distribution of a random variable, but rather the unknown state of a deterministic system that includes uncontrolled but explicable physical effects.

Rich’s comment implied that a new aspect of physical reality is approached inductively, without any prior explanatory context. Science approaches a new aspect of physical reality deductively from a pre-existent physical theory. The prior explanatory context is always present. This inductive/deductive distinction marks a fundamental departure in modes of thinking. The first neither recognizes nor employs physical reasoning. The second does both.

Rich also wrote, “It may also be objected that many black boxes, for example Global Circulation Models, are not statistical, because they follow a time evolution with deterministic physical equations. Nevertheless, the evolution depends on the initial state, and because climate is famously “chaotic”, tiny perturbations to that state, lead to sizeable divergence later. The chaotic system tends to revolve around a small number of attractors, and the breadth of orbits around each attractor can be studied by computer and matched to statistical distributions.”

But this is not known to be true. On the one hand, an adequate physical theory of the climate is not available. This lack leaves GCMs as parameterized engineering models. They are capable only of statistical arrays of outputs. Arguing the centrality of statistics to climate models as a matter of principle begs the question of theory.

On the other hand, supposing a small number of attractors flies into the face of the known large number of disparate climate states spanning the entire variation between “snowball Earth” and hot house Earth. And supposing those states can be studied by computer and expressed as statistical distributions again begs the question of physical theory. Lots of hand-waving, in other words.

Rich went on to write that the problem of climate could be approached as “a probability distribution of a continuous real variable.” But this assumes the behavior of the physical system as smoothly continuous. The many Dansgaard- Oeschger and Heinrich events are abrupt and discontinuous shifts of the terrestrial climate.

None of Rich’s statistical conjectures are constrained by known physics or by the behavior of physical reality. In other words, they display no evidence of physical reasoning.

The Full-Tilt Boogie.

In his Section B, Rich set up his analysis by defining three sources of result:

1. physical reality ® X(t) (data)

2. black box model ® M(t) (simulation of the X(t)-producing physical reality)

3. model emulator ® W(t) (emulation of model M output)

I. Problems with “Black Box and Emulator Theory” Section B:

Rich’s model emulator W is composed to, “estimate of the past black box values and to predict the black box output.” That is, his emulator targets model output. It does not emulate the internal behavior or workings of the full model in some simpler way.

Its formal structure is given by his first equation:

W(t) = (1-a)W(t-1) + R₁(t) + R₂(t) + (-r)R₃(t), (1ʀ)

where W(t-1) is some initial value and W(t) is the final value after integer time-step ‘t.’ The equation number subscript “ʀ” designates Rich as the source.

As an aside here, it is not unfair to notice that despite its many manifestations and modalities, Rich’s superior GCM emulator is never once used to actually emulate an air temperature projection.

The eqn. 1ʀ emulator manifests persistence, which the GCM projection emulator in my paper does not. Rich began his analysis, then, with an analogical inconformity.

The factors in eqn. 1ʀ are described as: “R₁(t) is to be the component which represents changes in major causal influences, such as the sun and carbon dioxide. R₂(t) is to be a component which represents a strong contribution with observably high variance, for example the Longwave Cloud Forcing (LCF). … R₃(t) is a putative component which is negatively correlated with R₂(t) with coefficient -r, with the potential (dependent on exact parameters) to mitigate the high variance of R₂(t).”

Emulator coefficient ‘r’ is always negative. The R₃(t) itself is negatively correlated with R₂(t) so that R₃(t) offsets (reduces) the magnitude of R₂(t), and 0 £ a £ 1. The Rn(t) are defined as time-dependent random variables that add into (1-a)W(t-1).

The relative impact of each Rn on W(t-1) is R₁(t) > R₂(t) ³ |rR₃(t)|.

A problem with factor R₃(t):

The R₃(t) is given to be “negatively correlated” with R₂(t), “to mitigate the high variance of R₂(t).” However, factor R₃(t) is also multiplied by coefficient -r.

“Negatively correlated” refers to R₃(t). The ‘-r’ is an additional and separate conditional.

There are three cases governing the meaning of ‘negative correlation’ for R₃(t).

1) R₃(t) starts at zero and becomes increasingly negative as R₂(t) becomes increasingly positive.

2) R₃(t) starts positive and becomes smaller as R₂(t) becomes large, but remains greater than zero.

3) R₃(t) starts positive and becomes small as R₂(t) becomes large but can pass through zero into negative values.

If 1), then -rR₃(t) is positive and has the invariable effect of increasing R₂(t) — the opposite of what was intended.

If 2), then -rR₃(t) has a diminishing effect on R₂(t) as R₂(t) becomes larger — again opposite the desired effect.

If 3), then -rR₃(t) diminishes R₂(t) at low but increasing values of R₂(t), but increases R₂(t) as R₂(t) becomes large and R₃(t) passes into negative values. This is because -r(-R₃(t)) = rR₃(t). That is, the effect of R₃(t) on R₂(t) is concave upwards around zero, è₀ø.

That is, none of the combinations of -r and negatively correlated R₃(t) has the desired effect on R₂(t). A consistently diminishing effect on R₂(t) is frustrated.

With negative coefficient -r, the R₃(t) term must be greater than zero and positively correlated with R₂(t) to diminish the contribution of R₂(t) at high values.

Curiously, Rich did not designate what X(t) actually is (perhaps air temperature?).

Nor did he describe what process the model M(t) simulates, nor what the emulator W(t) emulates. Rich’s emulator equation (1ʀ) is therefore completely arbitrary. It’s merely a formal construct that he likes, but is lacking any topical relevance or analytical focus.

In strict contrast, my interest in emulation of GCMs was roused when I discovered in 2006 that GCM air temperature projections are linear extrapolations of GHG forcing. In December 2006, John A publicly posted that finding at Steve McIntyre’s Climate Audit site, here.

That is, I began my work after discovering evidence about the behavior of GCMs. Rich, on the other hand, launched his work after seeing my work and then inventing an emulator formalism without any empirical referent.

Lack of focus or relevance makes Rich’s emulator irrelevant to the GCM air temperature emulator in my paper, which was derived with direct reference to the observed behavior of GCMs.

I will show that the irrelevance remains true even after Rich, in his Section D, added my numbers to his invented emulator.

A Diversion into Dimensional Analysis:

Emulator 1ʀ is a sum. If, for example, W(t) represents one value of an emulated air temperature projection, then the units of W(t) must be, e.g., Celsius (C). Likewise, then, the dimensions of W(t-1), R₁(t), R₂(t), and -rR₃(t), must all be in units of C. Coefficients a and r must be dimensionless.

In his exposition, Rich designated his system as a time series, with t = time. However, his usage of ‘t’ is not uniform, and most often designates the integer step of the series. For example, ‘t’ is an integer in W(t-1) in equation 1ʀ, where it represents the time step prior to W(t).

Continuing:

From eqn. (1ʀ), for a time series i = 1®t and when W(t-1) = W(0) = constant, Rich presented his emulator generalization as:

(2ʀ)

Let’s see if that is correct. From eqn. 1ʀ:

W(t1) = (1-a)W(0) + R₁(t1) + R₂(t1) -rR₃(t1), 1ʀ1

where the subscript on t indicates the integer step number.

W(t2) = (1-a)W(t1) + R₁(t2) + R₂(t2) -rR₃(t2) 1ʀ2

Substituting W(t1) into W(t2),

W(t2) = (1-a)[(1-a)W(0)+ R₁(t1) + R₂(t1) – rR₃(t1)] +[R₁(t2) + R₂(t2)-rR₃(t2)]

= (1-a)²W(0)+(1-a)[R₁(t1) + R₂(t1) – rR₃(t1)] + (1-a)⁰[R₁(t2)) + R₂(t2) -rR₃(t2)]

(NB: (1-a)⁰ = 1, and is added for completion)

Likewise, W(t3)=(1-a){[(1-a)²W(0)+(1-a)[R₁(t1)+R2(t1)-rR3(t1)]+[R1(t2)+R₂(t2)-rR₃(t2)]} + (1-a)⁰[R₁(t3) + R₂(t3) -rR₃(t3)]

= (1-a)³W(0)+(1-a)²[(R₁(t1)+R2(t1)-rR3(t1)]+(1-a)[R1(t2)+R₂(t2)-rR₃(t2)]+(1-a)^⁰[R₁(t3)+R₂(t3)-rR₃(t3)]

Generalizing:

(1)

Compare eqn. (1) to eqn. (2ʀ). They are not identical.

In generalized equation 1, when i = t = 1, W(tt) goes to W(t₁) = (1-a)W(0) + R₁(t1) + R₂(t1) -rR₃(t1) as it should do.

However, Rich’s equation 2ʀ does not go to W(t₁) in the limiting case i = t = 1.

Instead 2ʀ becomes W(t₁) = (1-a)W(0)+(1-a)[R₁(0)+R₂(0) -rR₃(0)], which is not correct.

The R-factors should have their t₁ values, but do not. There are no Rn(0)’s because W(0) is an initial value that has no perturbations. Also, coefficient (1-a) should not multiply the Rn’s (look at eqn. 1ʀ).

So, equation 2ʀ is wrong. The 1ʀ®2ʀ transition is mathematically incoherent.

There’s a further conundrum. Rich’s derivation, and mine, assume that coefficient ‘a’ is constant. If ‘a’ is constant, then (1-a) becomes raised to the power of the summation e.g., (1-a)ᵗW(0).

But there is no reason to think that coefficient ‘a’ should be a constant across a time-varying system. Why should every new W(t-1) have a constant fractional influence on W(t)?

Why should ‘a’ be constant? Apart from convenience.

Rich then defined E[] = expectation value and V[] = variance = (standard deviation)², and assigned that:

E[R₁(t)] = bt+c

E[R₂(t)] = d

E[R₃(t)] = 0.

Following this, Rich allowed (leaving the derivation to the student) that, “Then a modicum of algebra derives

“E[W(t)] = b(at + a-1 + (1-a)^t+1)/a² + (c+d)(1 – (1-a)^t)/a + (1-a)W(0)” (3ʀ)

Evidently 3ʀ was obtained by manipulating 2ʀ (can we see the work, please?). But as 2ʀ is incorrect, nothing worthwhile is learned. We’re told that eqn. 3ʀ ® 4ʀ as coefficient ‘a’ ® 0.

E[W(t)] = bt(t+1)/2 + (c+d)t + W(0) (4ʀ)

A Second Diversion into Dimensional Analysis:

Rich assigned E[R₁(t)] = bt+c. Up through eqn. 2ʀ, ‘t’ was integer time. In E[R₁(t)] it has become a coefficient. We know from eqn. 1ʀ that R₁(t) must have the identical dimensional unit carried by W(t), which is, e.g., Celsius.

We also know R₁(t) is in Wm⁻², but W(t) is in Celsius (C). Factor “bt” must be in the same Celsius units as [W(t)]. Is the dimension of b, then, Celsius/time? How does that work? The dimension of ‘c’ must also be Celsius. What is the rationale of these assignments?

The assigned E[R₁(t)] = bt+c has the formula of an ascending straight line of intercept c, slope b, and time the abscissa.

How convenient it is, to assume a linear behavior for the black box M(t) and to assign that linearity before ever (supposedly) considering the appropriate form of a GCM air temperature emulator. What rationale determined that convenient form? Apart from opportunism?

The definition of R₁(t) was, “…the component which represents changes in major causal influences, such as the sun and carbon dioxide.”

So, a straight line now represents the major causal influence of the sun or of CO2. How was that decided?

Next, multiplying through term 1 in 4ʀ, we get bt(t+1)/2 = (bt²+bt)/2. How do both bt² and bt have the units of Celsius required by E[R₁(t)] and W(0)?

Factor ‘t’ is in units of time. The internal dimensions of bt(t+1)/2 are incommensurate. The parenthetical sum is physically meaningless.

Continuing:

Rich’s final equation for the total variance of his emulator,

Var[W(t)] = (s₁²+s₂²+s₃²-2r s₂s₃)(1 – (1-a)^2t)/(2a-a²) (5ʀ)

included all the Rn(t) terms and the assumed covariance of his R₂(t) and R₃(t).

Compare his emulator 4ʀ with the GCM air temperature projection emulator in my paper:

(2)

In contrast to Rich’s emulator, eqn. 2 has no offsetting covariances. Not only that, all the DT-determining coefficients in eqn. 2 except fCO₂ are givens. They have no uncertainty variance at all.

In short, both Rich’s emulator itself and its dependent variances are utterly irrelevant to any evaluation of the GCM projection emulator (eqn. 2). Utterly irrelevant, even if they were correctly derived, which they were not.

Parenthetical summary comments on Rich’s “Summary of section B:”

“A good emulator can mimic the output of the black box.“

(Trivially true.)

“A fairly general iterative emulator model (1) is presented.”

(Never once used to actually emulate anything, and of no focused relevance to GCM air temperature projections.)

“Formulae are given for expectation and variance of the emulator as a function of time t and various parameters.“

(An emulator that is critically vacant and mathematically incoherent, and with an inapposite variance.)

“The 2 extra parameters, a, and R₃(t), over and above those of Pat Frank’s emulator, can make a huge difference to the evolution.”

(Extra parameters in an emulator that does not deploy the formal structure of the GCM emulator, and missing any analytically equivalent factors. The extra parameters are ad hoc, while ‘a’ is incorrectly specified in 3ʀ and 4ʀ. The emulator is critically irrelevant and its expansion in ‘a’ is wrong.)

“The “magic” component R₃(t) with anti-correlation -r to R₂(t) can greatly reduce model error variance whilst retaining linear growth in the absence of decay.“

(Component R₃(t) is likewise ad hoc. It has no justified rationale. That R₃(t) has a variance at all requires its rejection (likewise rejection of R₁(t) and R₂(t)) because the coefficients in the emulator in the paper (eqn. 2 above) have no associated uncertainties.)

“Any decay rate a>0 completely changes the propagation of error variance from linear growth to convergence to a finite limit.“

(The behavior of a critically irrelevant emulator engenders a deserved, ‘so what?’

Further, a>0 causes general decay only by allowing the mistaken derivation that put the (1-a) coefficient into the Rn(t) factors in 2ʀ)

Section I conclusion: The emulator construction itself is incongruous. It includes an unwarranted persistence. It has terms of convenience that do not map onto the target GCM projection emulator. The -rR₃(t) term cannot behave as described.

The transition from eqn. 2ʀ to eqn. 3ʀ is mathematically incoherent. The derivations following that employ eqn. 3ʀ are therefore wrong, including the variances.

The eqn. 1ʀ emulator itself is ad hoc. Its derivation is without reference to the behavior of climate models and of physical reasoning. Its ability to emulate a GCM air temperature projection is undemonstrated.

II. Problems with “New Parameters” Section C:

Rich rationalized his introduction of so-called decay parameter ‘a’ in the “Parameter” section C of his post. He introduced this equation:

M(t) = b + cF(t) +dH(t-1), 6ʀ

where M = temperature, F = forcing, and H(t) is “heat content“.

The ‘b’ term might be the ‘b’ coefficient assigned to E(R₁(t)] above, but we are not told anything about it.

I’ll summarize the problem. Coefficients ‘c’ and ‘d’ are actually functions that transform forcing and heat flux (not heat content) in Wm⁻², into their respectively caused temperature, Celsius. They are not integers or real numbers.

However, Rich’s derivation treats them as real number coefficients. This is a fatal problem.

For example, in equation 6ʀ above, function ‘d’ transforms heat flux H(t-1) into its consequent temperature, Celsius. However, the final equation of Rich’s algebraic manipulation ends with ‘d’ inappropriately operating on M(0), the initial temperature. Thus, he wrote:

“M(t) = b + cF(t) + d(H(0) + e(M(t-1)-M(0)) = f + cF(t) + (1-a)M(t-1) (7ʀ)

where a = 1-de, f = b+dH(0)-deM(0). (my bold)”

There is no physical justification for a “deM(0)” term; d cannot operate on M(0).

Rich also assigned “a = 1-de,” where ‘e’ is an integer fraction, but again, ‘d’ is an operator function; ‘d’ cannot operate on ‘e’. The final (1-a)M(t-1) term is a cryptic version of deM((t-1), which contains the same fatal assault on physical meaning. Function ‘d’ cannot operate on temperature M.

Further, what is the meaning of an operator function standing alone with nothing on which to operate? How can “1-de” be said to have a discrete value, or even to mean anything at all?

Other conceptual problems are in evidence. We read, “Now by the Stefan-Boltzmann equation M [temperature – P] should be related to F^¼ …” Rather, S-B says that M should be related to H^¼ (H is here taken to be black body radiant flux). According to climate models M is linearly related to F.

We are also told, “Next, the heat changes by an amount dependent on the change in temperature: …” while instead, physics says the opposite: temperature changes by an amount dependent on the change in the heat (kinetic energy). That is, temperature is dependent on atomic/molecular kinetic energy.

Rich finished with, “Roy Spencer, who has serious scientific credentials, had written “CMIP5 models do NOT have significant global energy imbalances causing spurious temperature trends because any model systematic biases in (say) clouds are cancelled out by other model biases”. .”

Roy’s comment was originally part of his attempted disproof of my uncertainty analysis. It completely missed the point, in part because it confused physical error with uncertainty.

Roy’s and Rich’s offsetting errors do nothing to remove uncertainty from the prediction of a physical model.

Rich went on, “This means that in order to maintain approximate Top Of Atmosphere (TOA) radiative balance, some approximate cancellation is forced, which is equivalent to there being an R₃(t) with high anti-correlation to R₂(t). The scientific implications of this are discussed further in Section I.”

The only, repeat only, scientific implication of offsetting errors is that they reveal areas requiring further research, that the theory is inadequate, and that the predictive capacity is poor.

Rich’s approving mention of Roy’s mistake evidences that Rich, too, apparently does not see the distinction between physical error and predictive uncertainty. Tim Gorman especially, and others, have repeatedly pointed out the distinction to Rich, e.g., here, here, here, here, and here, but to no obvious avail.

Conclusions regarding the Parameter section C: analytically impossible, physically disjointed, wrongly supposes offsetting errors increase predictive reliability, wrongly conflates physical error with predictive uncertainty.

And once again, no demonstration that the proposed emulator can emulate anything relevant.

III. Problems with “Emulator Parameters” Section D:

In Section I above, I promised to show that Rich’s emulator would remain irrelevant, even after he added my numbers to it.

In his “Emulator Parameters” section Rich started out with, “Dr. Pat Frank’s emulator falls within the general model above.” This view could not possibly be more wrong.

First, Rich composed his emulator with my GCM air temperature projection emulator in mind. He inverted significance to say the originating formalism falls within the limit of a derivative composition.

Again, the GCM projection emulator is:

(2 again)

Rich’s emulator is W(t) = (1-a)W(t-1) + R₁(t) + R₂(t) + (-r)R₃(t) (1ʀ again)

(In II above, I showed that his alternative, M(t) = f + cF(t) + (1-a)M(t-1), is incoherent and therefore not worth considering further.)

In Rich’s emulator, temperature T₂ has some persistence from T₁. This dependence is nowhere in the GCM projection emulator.

Further, in the GCM emulator (eqn. 2-again), the temperature of time t-1 makes no appearance at all in the emulated air temperature at time t. Rich’s 1ʀ emulator is constitutionally distinct from the GCM projection emulator. Equating them is to make a category mistake.

Analyzing further, emulator R₁(t) is a, “component which represents changes in major causal influences, such as the sun and carbon dioxide,”

Rich’s R₁(t) describes all of, in the GCM projection emulator.

Rich’s R₁(t) thus exhausts the entire GCM projection emulator. What then is the purpose of his R₂(t) and R₃(t)? They have no analogy in the GCM projection emulator. They have no role to transfer into meaning.

The R₂(t) is “a strong contribution with observably high variance, for example the Longwave Cloud Forcing (LCF).” The GCM projection emulator has no such term.

The R₃(t) is, “a putative component which is negatively correlated with R₂(t)…” The GCM projection emulator has no such term. R₃(t) has no role to play in any analytical analogy.

Someone might insist that Rich’s emulator is like the GCM projection emulator after his (1-a)W(t-1), R₂(t), and (-r)R₃(t) terms are thrown out.

So, we’re left with this deep generalization: Rich’s emulator-emulator pared to its analogical essentials is M(tᵢ) = R(tᵢ),

where R(tᵢ) =.

Rich went on to specify the parameters of his emulator: “The constants from [Pat Frank’s] paper, 33K, 0.42, 33.3 Wm^-2, and +/-4 Wm^-2, the latter being from errors in LCF, combine to give 33*0.42/33.3 = 0.416 and 0.416*4 = 1.664 used here.”

Does anyone see a ±4 Wm⁻² in the GCM projection emulator? There is no such term.

Rich has made the same mistake as did Roy Spencer (one of many). He supposed that the uncertainty propagator (the right-side term in paper eqn. 5.2) is the GCM projection emulator.

It isn’t.

Rich then presented the conversion of his general emulator into his view of the GCM projection emulator: “So we can choose a = 0, b = 0, c+d = 0.416 F(t) where F(t) is the new GHG forcing (Wm^-2) in period t, s₁=0, s₂=1.664, s₃=0.“, and then derive

W(t) = (c+d)t + W(0) +/- sqrt(t) s₂” (8ʀ)

There are Rich’s mistakes made explicit: his emulator, eqn. 8ʀ, includes persistence in the W(0) term and a ±sqrt(t) s2 term, neither of which appear anywhere in the GCM projection emulator. How can eqn. 8ʀ possibly be an analogy for eqn. 2?

Further, including “ +/- sqrt(t) s₂“will cause his emulator to produce two values of W(t) at every time-step.

One value W(t) stems from the positive root of sqrt(s2) and the other W(t) from the negative root of sqrt(s2). A plot of the results will show two W(t) trends, one perhaps rising while the other falls.

To see this mistake in action, see the first Figure in Roy Spencer’s critique.

The “+/-” term in Rich’s emulator makes it not an emulator.

However, a ‘±’ term does appear in the error propagator:

±uᵢ(T) = [fCO₂ ´ 33 K ´ (±4 Wm⁻²)/F₀] — see eqns. 5.1 and 5.2 in the paper.

It should now be is obvious that Rich’s emulator is nothing like the GCM projection emulator.

Instead, it represents a category mistake. It is not only wrongly derived, it has no analytical relevance at all. It is conceptually adverse to the GCM projection emulator it was composed to critically appraise.

Rich’s emulator is ad hoc. It was constructed with factors he deemed suitable, but with no empirical reference. Theory without empiricism is philosophy at best; never science.

Rich then added in certain values taken from the GCM projection emulator and proceeded to zero out everything else in his equation. The result does not demonstrate equivalence. It demonstrates tendentiousness: elements manipulated to achieve a predetermined end. This approach is diametrical to actual science.

The rest of the Emulator Parameters section elaborates speculative constructs of supposed variances given Rich’s irrelevant emulator. For example, “Now if we choose b = a(c+d) then that becomes (c+d)(t+1), etc. etc.” This is to choose without any reference to any explicit system or any known physical GCM error. The b = a(c+d) is an ungrounded levitated term. It has no substantive basis.

The rest of the variance speculation is equally irrelevant, and in any case derives from an unreservedly wrong emulator.

Nowhere is its competence demonstrated by, e.g., emulating a GCM air temperature projection.

I will not consider his Section D further, except to note that Rich’s Case 1 and Case 2 clearly imply that he considers the variation of model runs about the model projection mean to be the centrally germane measure of uncertainty.

It is not.

The precision/accuracy distinction was discussed in the introductory comments above. Run variation supplies information only about model precision — run repeatability. The analysis in the paper concerned accuracy.

This distinction is absolutely central, and was of immediate focus.

Introduction paragraph 2:

“Published GCM projections of the GASAT typically present uncertainties as model variability relative to an ensemble mean (Stainforth et al., 2005; Smith et al., 2007; Knutti et al., 2008), or as the outcome of parameter sensitivity tests (Mu et al., 2004; Murphy et al., 2004), or as Taylor diagrams exhibiting the spread of model realizations around observations (Covey et al., 2003; Gleckler et al., 2008; Jiang et al., 2012). The former two are measures of precision, while observation-based errors indicate physical accuracy. Precision is defined as agreement within or between model simulations, while accuracy is agreement between models and external observables (Eisenhart, 1963, 1968; ISO/IEC, 2008). (bold added)

…

“However, projections of future air temperatures are invariably published without including any physically valid error bars to represent uncertainty. Instead, the standard uncertainties derive from variability about a model mean, which is only a measure of precision. Precision alone does not indicate accuracy, nor is it a measure of physical or predictive reliability. (added bold)

“The missing reliability analysis of GCM global air temperature projections is rectified herein.”

It is evidently possible to read the above and fail to grasp it. Rich’s entire approach to error and variance ignores it and thereby is misguided. He has repeatedly confused model precision with predictive accuracy.

That mistake is fatal to critical relevance. It removes any valid application of Rich’s critique to my work or to the GCM projection emulator.

Finally, I will comment on his last paragraph: “Pat Frank’s paper effectively uses a particular W(t;u) (see Equation (8) above) which has fitted m_w(t;u) to m_m(t), but ignores the variance comparison. That is, s₂ in (8) was chosen from an error term from LCF without regard to the actual variance of the black box output M(t).”

The first sentence says that I fitted “m_w(t;u) to m_m(t).” That is, Rich supposed that my analysis consisted of fits to the model mean.

He is wrong. The analysis focused on single projection runs of individual models.

Methodological SI Figure S3-2 shows each fit tested a single temperature projection run of a single target GCM plotted against a standard of GHG forcing (SRES, Meinshausen, or other).

SI Figure S3-2. Left: fit of cccma_cgcm3_1_t63 projected global average temperature plotted vs SRES A2 forcing. Right: emulation of the ccma_cgcm3_1_t63 A2 air temperature projection. Every fit had only one important degree of freedom.

Only Figure 7 showed emulation of a multi-model projection mean. All the 68 rest of them were single model projection runs. All of which Rich apparently missed.

There is no ambiguity in what I did, which is not what Rich supposed I did.

The second sentence, “That is, s₂ in (8) was chosen from an error term from LCF without regard to the actual variance of the black box output M(t).” is also factually wrong. Twice.

First, there is no LCF term in the emulator, nor any standard deviation. The “s2” is a fantasy.

Second, the long wave cloud forcing calibration error in the uncertainty propagator is the annual average error CMIP5 GCMs make in simulating annual global cloud fraction (CF).

That is, LWCF calibration error is exactly the actual [error] variance of the black box output M(t) with respect to observed global cloud fraction.

Rich’s “the actual variance of the black box output M(t).” refers to the variance of individual GCM air temperature projection runs around a projection mean; a precision metric.

The accuracy metric of model variance with respect to observation is evidently lost on Rich. He brought up inattention to bare precision as though it faulted an analysis concerned with accuracy.

This fatal mistake is a commonplace among the critics of my paper.

It shows a foundational inability to effectuate any scientifically valid criticism at all.

The explanation for entering the LWCF error statistic into the uncertainty propagator is given within the paper (p. 10):

“GHG forcing enters into and becomes part of the global tropospheric thermal flux. Therefore, any uncertainty in simulated global tropospheric thermal flux, such as LWCF error, must condition the resolution limit of any simulated thermal effect arising from changes in GHG forcing, including global air temperature. LWCF calibration error can thus be combined with 1Fi in equation 1 to estimate the impact of the uncertainty in tropospheric thermal energy flux on the reliability of projected global air temperatures.”

This explanation seems opaque to many for reasons that remain obscure.

Citations Zhang et al. (2005) and Dolinar et al. (2015) gave similar estimates of LWCF calibration error.

Summary conclusions about the Emulator Parameter Section D:

1) The proposed emulator is ad hoc and tendentious.

2) The proposed emulator is constitutively wrong.

· It wrongly includes persistence.

· It wrongly includes a cloud forcing term (or the like).

· It wrongly includes an uncertainty statistic.

3) Confused the uncertainty propagator with the GCM projection emulator.

4) Mistaken focus of precision in a study about accuracy.

Or, perhaps, ignorance of the concept of physical accuracy, itself.

5) Wrongly imputed that the study focused on GCM projection means.

6) Never once demonstrated that the emulator can actually emulate.

IV. Problems with “Error and Uncertainty” Section E:

Thus far, we’ve found that Rich’s emulator analysis is ad hoc, tendentious, constitutively wrong, dimensionally impossible, mathematically incoherent, confuses precision with accuracy, includes incongruous variances, and is empirically unvalidated. His analysis could almost not be more bolloxed up.

I here step through a few of his Section E mistakes, which always seem to simplify things for him. Quotes are marked “R:” followed by a comment.

R: “Assuming that X is a single fixed value, then prior to measurement, M-X is a random variable representing the error,…”

Except when the error is systematic stemming from uncontrolled variables. In that case M – X is a deterministic variable of no fixed mean, of a non-normal dispersion, and of an unknowable value. See the further analysis in the Appendix.

R: “+/-s_m is described by the JCGM 2.3.1 as the “standard” uncertainty parameter.”

Rich is being a bit fast here. He’s implying the JCGM Section 2.3.1 definition of “standard uncertainty” is limited to the SD of random errors.

The JCGM is the Evaluation of measurement data — Guide to the expression of uncertainty in measurement — the standard guide to the statistical analysis of measurements and their errors provided by the Bureau International des Poids et Mesures.

The JCGM actually says that the standard uncertainty is “uncertainty of the result of a measurement expressed as a standard deviation,” which is rather more general than Rich allowed.

The quotes below show that the JCGM includes systematic error as contributing to uncertainty.

Under E.3 “Justification for treating all uncertainty components identically” the JCGM says,

The focus of the discussion of this subclause is a simple example that illustrates how this Guide treats uncertainty components arising from random effects and from corrections for systematic effects in exactly the same way in the evaluation of the uncertainty of the result of a measurement. It thus exemplifies the viewpoint adopted in this Guide and cited in E.1.1, namely, that all components of uncertainty are of the same nature and are to be treated identically. (my bold)

Under JCGM E. 3.1 and E 5.2, we have that the variance of a measurement wᵢ of true value μᵢ is given by σᵢ² =E[(wᵢ – μᵢ)²], which is the standard expression for error variance.

After the usual caveats about [the] expectation of the probability distribution of each εi is assumed to be zero, E(εi) = 0, …, the JCGM notes that,

It is assumed that probability is viewed as a measure of the degree of belief that an event will occur, implying that a systematic error may be treated in the same way as a random error and that εᵢ represents either kind.(my bold).

In other words, the JCGM advises that systematic error is to be treated using the same statistical formalism as is used for random error.

R: “The real error statistic of interest is E[(M-X)²] = E[((M-m_m)+(m_m-X))²] = Var[M] + b², covering both a precision component and an accuracy component.”

Rich then referenced that equation to my paper and to long wave cloud forcing (LWCF;

Rich’s LCF) error. However, this is a fundamental mistake.

In Rich’s equation above, the bias, b = M-mm = a constant. Among GCMs however, step-wise cloud bias error varies across the global grid-points for each GCM simulation. And it also varies among GCMs themselves. See paper Figure 4 and SI Figure S6-1.

The factor (M-mm) = b, above, should therefore be (Mᵢ-mm) = bᵢ because b varies in a deterministic but unknown way with every Mᵢ.

A correct analysis of the case is:

E[(Mᵢ-X)²] = E[((Mᵢ-m_m)+(m_m-X))²] = Var[M] + Var[b]

Systematic error is discussed in more detail in the Appendix.

Rich goes on, “But the theory of converting variances and covariances of input parameter errors into output error via differentiation is well established, and is given in Equation (13) of the JCGM.”

Equation (13) of the JCGM provides the formula for the error variance in y, u²c(y), but describes it this way:

The combined variance, u²c(y), can therefore be viewed as a sum of terms, each of which represents the estimated variance associated with the output estimate y generated by the estimated variance associated with each input estimate xᵢ. (my bold)

That is, the combined variance, u²c(y), is the variance that results from considering all forms of error; not just random error.

Under JCGM 3.3.6:

The standard uncertainty of the result of a measurement, when that result is obtained from the values of a number of other quantities, is termed combined standard uncertainty and denoted by uc. It is the estimated standard deviation associated with the result and is equal to the positive square root of the combined variance obtained from all variance and covariance (C.3.4) components, however evaluated, using what is termed in this Guide the law of propagation of uncertainty (see Clause 5). (my bold)

Under JCGM E 4.4 EXAMPLE:

The systematic effect due to not being able to treat these terms exactly leads to an unknown fixed offset that cannot be experimentally sampled by repetitions of the procedure. Thus, the uncertainty associated with the effect cannot be evaluated and included in the uncertainty of the final measurement result if a frequency-based interpretation of probability is strictly followed. However, interpreting probability on the basis of degree of belief allows the uncertainty characterizing the [systematic] effect to be evaluated from an a priori probability distribution (derived from the available knowledge concerning the inexactly known terms) and to be included in the calculation of the combined standard uncertainty of the measurement result like any other uncertainty. (my bold)

JCGM says that the combined variance, u²c(y), includes systematic error.

The systematic error stemming from uncontrolled variables becomes a variable component of the output; a component that may change unknowably with every measurement. Systematic error then necessarily has an unknown and almost certainly non-normal dispersion (see the Appendix).

The JCGM further stipulates that systematic error is to be treated using the same mathematical formalism as random error.

Above we saw that uncontrolled deterministic variables produce a dispersion of systematic error biases in an extended series of measurements and in GCM simulations of global cloud fraction.

That is, the systematic error is a “fixed offset” = bᵢ only in the Mᵢ time-step. But the bᵢ vary in some unknown way across the n-fold series of Mᵢ.

In light of the JCGM discussion, the dispersion of systematic error, bᵢ, requires that any complete error variance include Var[b].

The dispersion of the bᵢ can be determined only by way of a calibration experiment against a known X carried out under conditions as identical as possible to the experiment.

The empirical methodological calibration error, Var[b] of X, is then applied to condition the result of every experimental determination or observation of an unknown X; i.e., it enters the reliability statement of the result.

In Example 1, the 1-foot ruler, Rich immediately assumed away the problem. Thus, “the manufacturer assures us that any error in that interval is equally likely[, but I will] write 12+/-_0.1 …, where the _ denotes a uniform probability distribution, instead of a single standard deviation for +/-.”

That is, rather than accept the manufacturer’s stipulation that all deviations are equally likely, Rich converted the uncertainty into a random dispersion, in which all deviations are no longer equally likely. He has assumed knowledge where there is none.

He wrote, “If I have only 1 ruler, it is hard to see how I can do better than get a table which is 120+/-_1.0”.” But that is wrong.

The unknown error in any ruler is a rectangular distribution of -0.1 to +0.1″, with all possibilities equally likely. Ten measurements with a ruler of unknown specific error can be anywhere from 1″ to -1″ in error. The expectation interval is (1-(-1)”/2 =1″. The standard uncertainty is then 1″/sqrt(3) = ±0.58″, thus 120±0.58″.

He then wrote that if one instead made ten measurements using ten independently machined rulers then the uncertainty of measurement = “sqrt(10) times the uncertainty of each.” But again, that is wrong.

The original stipulation is equal likelihood across ±0.1″ of error for every ruler. For ten independently machined rulers, every ruler has a length deviation equally likely to be anywhere within -0.1″ to 0.1″. That means the true total error using 10 independent rulers can again be anywhere from 1″ to -1″.

The expectation interval is again (1-(-1)”/2 = 1″, and the standard uncertainty after using ten rulers is 1″/sqrt(3) = ±0.58″. There is no advantage, and no loss of uncertainty at all, in using ten independent rulers rather than one. This is the outcome when knowledge is lacking, and one has only a rectangular uncertainty estimate — a not uncommon circumstance in the physical sciences.

Rich’s mistake is founded in his immediate recourse to pseudo-knowledge.

R; “We know by symmetry that the shortest plus longest [of a group of ten rulers] has a mean error of 0…” But we do not know that because every ruler is independently machined. Every length error is equally likely. There is no reason to assume a normal distribution of lengths, no matter how many rulers one has. The shortest may be -0.02″ too short, and the longest 0.08″ too long. Then ten measurements produce a net error of 0.3″. How would anyone know? One has no way of knowing the true error in the physical length of a shortest and a longest ruler.

The length uncertainty of any one ruler is [(0.1-(-0.1)/2]/sqrt(3) = ±0.058″. The only reasonable stipulation one might make is that the shortest ruler is (0.05±0.058)” too short and the longest (0.05±0.058)” too long. Then 5 measurements using each ruler yields a measurement with uncertainty of ±0.18″.

Complex variance estimates notwithstanding, Rich assumed away all the difficulty in the problem, wished his way back to random error, and enjoyed a happy dance.

Conclusion: Rich’s Section E is wrong, wherever it isn’t irrelevant.

1. He assumed random error when he should have considered deterministic error.

2. He badly misconstrued the message of JCGM concerning systematic error, and the meaning of its equation (13).

3. He ignored the centrally necessary condition of uncontrolled variables, and the consequent unknowable variation of systematic error across the data.

4. He wrongly treated systematic error as a constant offset.

5. His treatment of rectangular uncertainty is wrong.

6. He then wished rectangular uncertainty into a random distribution.

7. He treated assumed distributions as though they were known distributions — OK in a paper on statistical conjectures, a failing grade in an undergraduate instrumental lab course, and death in a real-world lab.

V. Problems with Section F:

The first part concerning comparative uncertainty is speculative statistics and so is here ignored.

Problems with Rich’s Marked Ruler Example 2:

The discussion neglected the resolution of the ruler itself, typically 1/4 of the smallest division.

It also ignored the question of whether the lined division marks are uniformly and accurately spaced — another part of the resolution problem. This latter problem can be reduced with recourse to a high-precision ruler that includes a manufacturer’s resolution statement provided by the in-house engineers.

It ignored that the smallest divisions on a to-be-visually-appraised precision instrument are typically manufactured in light of the human ability to resolve the spaces.

To achieve real accuracy with a ruler, one would have to calibrate it at several internal intervals using a set of high-accuracy length standards. Good luck with that.

VI. Problems with Rich’s thermometer Example 3 Section G:

Rich brought up what was apparently my discussion of thermometer metrology, made in an earlier comment on another WUWT essay.

He mentioned some of the elements I listed as going into uncertainty in the read-off temperature including, “the thermometer capillary is not of uniform width, the inner surface of the glass is not perfectly smooth and uniform, the liquid inside is not of constant purity, the entire thermometer body is not at constant temperature. He did not include the fact that during calibration human error in reading the instrument may have been introduced.”

I no longer know where I made those comments (I searched but didn’t find them) and Rich provided no link. However, I would never have intended that list to be exhaustive. Anyone wondering about thermometer accuracy can do a lot worse than to read Anthony Watts’ post about thermometer metrology.

Among impacts on accuracy, Anthony mentioned hardening and shrinking of the glass in LiG thermometers over time. After 10 years, he said, the reading might be 0.7 C high. A process of slow hardening would impose a false warming trend over the entire decade. Anthony also mentioned that historical LiG meteorology thermometers were often graduated in 2 ⁰F increments, yielding a resolution of ±0.5 ⁰F = ±0.3 ⁰C.

Rich mentioned none of that, in correcting my apparently incomplete list.

Here’s an example of a 19th century min-max thermometer with 2 ⁰F divisions.

Louis Cassella-type 19th century min-max thermometer with 2 ⁰F divisions.

Image from the Yale Peabody Museum.

High-precision Louis Castella thermometers included 1 ⁰F divisions.

Rich continued: “The interesting question arises as to what the (hypothetical) manufacturers meant when they said the resolution was +/-0.25K. Did they actually mean a 1-sigma, or perhaps a 2-sigma, interval? For deciding how to read, record, and use the data from the instrument, that information is rather vital.”

Just so everyone knows what Rich is talking about, pictured below are a couple of historical LiG meteorological thermometers.

Left: The 19th century Negretti and Zambara minimum thermometer from the Welland weather station in Ontario, Canada, mounted in the original Stevenson Screen. Right: A C.W Dixey 19th century Max-Min thermometer (London, after ca. 1870). Insets are close-ups.

The finest lineations in the pictured thermometers are 1 ⁰F and are perhaps 1 mm apart. The Welland instrument served about 1892 – 1957.

The resolution of these thermometers is ±0.25 ⁰F, meaning that smaller values to the right of the decimal are physically dubious. The 1880-82 observer at Welland, Mr. William B. Raymond, age about 20 years, apparently recorded temperatures to ±0.1 ⁰F, a fine example of false precision.

In asking, “Did [the manufacturers] actually mean a 1-sigma, or perhaps a 2-sigma, interval?“, Rich is posing the wrong question. Resolution is not about error. It does not imply a statistical variable. It is a physical limit of the instrument, below which no reliable data are obtainable.

The modern Novalynx 210-4420 Series max-min thermometer below is, “made to U.S. National Weather Service specifications.”

The specification sheet (pdf) of provides an accuracy of ±0.2 ⁰C “above 0 ⁰C.” That’s a resolution-limit number, not a 1σ number or a 2σ number.

A ±0.2 ⁰C resolution limit means the thermometers are not able to reliably distinguish between external temperatures differing by 0.2 ⁰C or less. It means any finer reading is physically suspect.

The Novalynx thermometers record 95 degrees across 8 inches, so that each degree traverses 0.084″ (2.1 mm). Reading a temperature to ±0.2 ⁰C requires the visual acuity to discriminate among five 0.017″ = 0.43 mm unmarked widths within each degree interval.

Historical thermometers were no better.

This leads to the question: — even though the thermometer is accurate to ±0.2 ⁰C, is it still reasonable to propose, as Rich did, that an observer should be able to regularly discriminate individual ±0.1 ⁰C intervals within merging 0.22 mm blank widths? Hint: hardly.

Rich’s entire discussion is unrealistic, showing no sensitivity to the meaning of resolution limits, of accuracy, of the graduation of thermometers, or of limited observer acuity.

He wrote, “In the present [weather thermometer] example, I would recommend trying for t² = 1/100, or as near as can be achieved within reason.” Rich’s t² is the variance of observer error, meaning he recommends reading to ±0.1 ⁰C in thermometers that are not accurate to better than ±0.2 ⁰C.

Rich finished by advising the manufacture of false data: “if the observer has the skill and time and inclination then she can reduce overall uncertainty by reading to a greater precision than the reference value. (my bold)”

Rich recommended false precision; a mistake undergraduate science and engineering students have flogged out of them from the very first day. But one that typifies consensus climatology.

His conclusion that, “Again, real life examples suggest the compounding of errors, leading to approximately normal distributions.” is entirely unfounded, based as it fully is on unrealistic statistical speculations. Rich considered no real-life examples at all.

The moral of Rich’s section G is that it’s not prudent to give advice concerning methods about which one has no experience.

The whole thermometer section G is misguided and is yet another example, after the several prior, of an apparently very poor grasp of physical accuracy, of its meaning, and of its fundamental importance to all of science.

VII. Problems with “The implications for Pat Frank’s paper” Section H:

Rich began his Section H with a set of declarations about the implications of his various

Sections now known to be over-wrought or plain wrong. Stepping through:

Section B: Rich’s emulator is constitutively inapt. The derivation is both wrong and incoherent. Tendentiously superfluous terms promote a predetermined end. The analysis is dimensionally unsound and deploys unjustified assumptions. No empirical validation of claimed emulator competence.

Section C: incorrectly proposes that offsetting calibration errors promote predictive reliability. It includes an inverted Stefan-Boltzmann equation and improperly treats operators as coefficients. As in other sections, Section B evinces no understanding of accuracy.

Section D: displays confusion about precision and accuracy throughout. The GCM emulator (paper eqn. 1) is confused with the error propagator (paper eqn. 5.2) which is fatal to Section D. No empirical validation of claimed emulator competence. Fatally misconstrues the analytical focus of my paper to be GCM projection means.

Section E: again, falsely asserted that all measurement or model error is random and that systematic error is a fixed constant bias offset. It makes empirically unjustified and ad hoc assumptions about error normality. It self-advantageously misconstrued the JCGM description of standard uncertainty variance.

Section F: has unrealistic prescriptions about the use of rulers.

Section G: displays no understanding of actual thermometers and advises observers to record temperatures to false precision.

Rich wrote that, “The implication of Section C is that many emulators of GCM outputs are possible, and just because a particular one seems to fit mean values quite well does not mean that the nature of its error propagation is correct.”

There we see again Rich’s fatal mistake that the paper is critically focused on mean values. He also wrote there are many possible GCM emulators without ever demonstrating that his proposed emulator can actually emulate anything.

And again here, “Frank’s emulator does visibly give a decent fit to the annual means of its target,…”

However, the analysis did not fit annual means. It fit the relationship between forcing and projected air temperature.

The emulator itself reproduced the GCM air temperature projections. It did not fit them. Contra Rich, that performance is indeed, “sufficient evidence to assert that it is a good emulator.”

And in further fact, the emulator tested itself against dozens of individual GCM single air temperature projections, not projection means. SI Figures S4-6, S4-8 and S4-9 show the decent fit residuals remain close to zero.

The tests showed beyond doubt that every tested GCM behaved as a linear extrapolator of GHG forcing. That invariable linearity of output behavior entirely justifies linear propagation of error.

Throughout, Rich’s analysis displays a thorough and comprehensively mistaken view of the paper’s GCM analysis.

The comments that finish his analysis demonstrate that case.

For example: “The only way to arbitrate between emulators would be to carry out Monte Carlo experiments with the black boxes and the emulators.” recommends an analysis of precision, with no notice of the need for accuracy.

Repeatability over reliability.

If ever there was a demonstration that Rich’s approach fatally neglects science, that is it.

This next paragraph really nails Rich’s mistaken thinking: “Frank’s paper claims that GCM projections to 2100 have an uncertainty of +/- at least 15K. Because, via Section D, uncertainty really means a measure of dispersion, this means that Equation (1) with the equivalent of Frank’s parameters, using many examples of 80-year runs, would show an envelope where a good proportion would reach +15K or more, and a good proportion would reach -15K or less, and a good proportion would not reach those bounds.”

First it was his Section E, not Section D, that supposed uncertainty is the dispersion of a random variable.

Second, Section IV above showed that Rich had misconstrued the JCGM discussion of uncertainty. Uncertainty is not error. Uncertainty is the interval within which the true value should occur.

Section D 6.1 of the JCGM establishes the distinction:

[T]he focus of this Guide is uncertainty and not error.

And continuing:

The exact error of a result of a measurement is, in general, unknown and unknowable. All one can do is estimate the values of input quantities, including corrections for recognized systematic effects, together with their standard uncertainties (estimated standard deviations), either from unknown probability distributions that are sampled by means of repeated observations, or from subjective or a priori distributions based on the pool of available information; ...

Unknown probability distributions sampled by means of repeated observations describes a calibration experiment and its result. Included among these is the comparison of a GCM hindcast simulation of global cloud fraction with the known observed cloud fraction.

Next, the ±15 C uncertainty does not mean some projections would reach “+15K or more” or “-15K or less.” Uncertainty is not error. The JCGM is clear on this point, as is the literature. Uncertainty intervals are not error magnitudes. Nor do they imply the range of model outputs.

The ±15 C GCM projection uncertainty is an ignorance width. It means that one has no information at all about the possible air temperature in year 2100.

Supposing that uncertainty propagated through a serial calculation directly implies a range of possible physical magnitudes is merely to reveal an utter ignorance of physical uncertainty analysis.

Rich’s mistake that an uncertainty statistic is a physical magnitude is also commonplace among climate modelers.

Among Rich’s Section H summary conclusion, the first is wrong, while the second and third are trivial.

The first is, “Frank’s emulator is not good in regard to matching GCM output error distributions.” There are two mistakes in that one sentence.

The first is that the GCM air projection emulator can indeed reproduce all the single air temperature projection runs of any given GCM. Rich’s second mistake is to suppose that GCM individual run variation about a mean indicates error.

Regarding Rich’s first mistake, the Figure below is taken from Rowlands, et al., (2012). It shows thousands of individual HadCM3L “perturbed physics” runs. Perturbed physics means the parameter sets are varied across their uncertainty widths. This produces a whole series of alternative projected future temperature states.

Original Figure Legend: “Evolution of uncertainties in reconstructed global-mean temperature projections under SRES A1B in the HadCM3L ensemble.”

This “perturbed physics ensemble” is described as “a multi-thousand-member ensemble of transient AOGCM simulations from 1920 to 2080 using HadCM3L,…”

Given knowledge of the forcings, the GCM air temperature projection emulator could reproduce every single one of those multi-thousand ensembled HadCM3L air temperature projections. As the projections are anomalies, emulator coefficient a = 0. The emulations would proceed by varying only the fCO₂ term. That is, the HadCM3L projections could be reproduced using the emulator with only one degree of freedom (see paper Figures 1 and 9).

So much for, “not good in regard to matching GCM output [so-called] error distributions.”

Second, the variance of the spread around the ensemble mean is not error, because the accuracy of the model projections remains unknown.

Studies of model spread, such as that of Rowlands, et al., (2012) reveal nothing about error. The dispersion of outputs reveals nothing but precision.

In calling that spread “error,” Rich merely transmitted his lack of attention to the distinction between accuracy and precision.

In light of the paper, every single one of the HadCM3L centennial projections is subject to the very large lower limit of uncertainty due to LWCF error, of the order ±15 C, at year 2080.

The uncertainty in the ensemble mean is the rms of the uncertainties of the individual runs. That’s not error, either. Or a suggestion of model air temperature extremes. It’s the uncertainty interval that reflects the total unreliability of the GCMs and our total ignorance about future air temperatures.

Rich wrote, “The “systematic squashing” of the +/-4 W/m^2 annual error in LCF inside the GCMs is an issue of which I for one was unaware before Pat Frank’s paper.

The implication of comments by Roy Spencer is that there really is something like a “magic” component R₃(t) anti-correlated with R₂(t), … GCM experts would be able to confirm or deny that possibility.”

Another mistake: the ±4 Wm⁻² is not error. It is uncertainty: a statistic. The uncertainty is not squashed. It is ignored. The unreliability of the GCM projection remains no matter that errors are made to cancel in the calibration period.

GCMs do deploy offsetting errors, but studied model tuning has no impact on simulation uncertainty. Offset errors do not improve the underlying physical description.

Typically, error (not uncertainty) in long wave cloud forcing is offset by an opposing error in short wave cloud forcing. Tuning allows the calibration target to be reproduced, but it provides no reassurance about predictive reliability or accuracy.

General conclusions:

The entire analysis has no critical force.

The proposed emulator is constitutively inapt and tendentious.

Its derivation is mathematically incoherent.

The derivation is dimensionally unsound, abuses operator mathematics, and deploys unjustified assumptions.

Offsetting calibration errors are incorrectly and invariably claimed to promote predictive reliability.

The Stefan-Boltzmann equation is inverted

Operators are improperly treated as coefficients.

Accuracy is repeatedly abused and ejected in favor of precision.

The GCM emulator (paper eqn. 1) is fatally confused with the error propagator (paper eqn. 5.2)

The analytical focus of the paper is fatally misconstrued to be model means.

The difficulties of measurement error or model error is assumed away, by falsely and invariably asserting all error to be random.

Uncertainty statistics are wrongly and invariably asserted to be physical error.

Systematic error is falsely asserted to be a fixed constant bias offset.

Uncertainty in temperature is falsely construed to be an actual physical temperature.

Ad hoc assumptions about error normality are empirically unjustified.

The JCGM description of standard uncertainty variance is self-advantageously misconstrued.

The described use of rulers or thermometers is unrealistic.

Readers are advised to read and record false precision.

Appendix: A discussion of Error Analysis, including the Systematic Variety

Rich also posted a comment under his “What do you mean by “mean” critique here attempting to show that systematic error cannot be included in an uncertainty variance.

Comments closed on the thread before I was able to finish a critical reply. The subject is important, so the reply is posted here as an Appendix.

In his comment, Rich assumed uncertainty to be the dispersion of a random variable with mean b and variance s². He concluded by claiming that an uncertainty variance cannot include bias errors.

Bias errors are another name for systematic errors, which Rich represented as a non-zero mean of error, ‘b.’

Below, I go through a number of relevant cases. They show that the mean of error, ‘b’, never appears in the formula for an error variance. They also show that the systematic errors from uncontrolled variables must be included in an uncertainty variance.

That is, the foundation of Rich’s derivation, which is:

“the uncertainty of a sum of n independent measurements with respective [error] means bᵢ and variances vᵢ is that given by JCGM 5.1.2 with unit differential: sqrt(sumᵢ g(vᵢ,bᵢ)²) where v = sumᵢ vᵢ, b = sumᵢ bᵢ.”

is wrong.

Given that mistake, the rest of Rich’s analysis there also fails, as demonstrated in the cases that follow.

Interestingly, the mean of error, ‘b,’ does not enter in the variance equation (10) in JCGM 5.1.2, either.

++++++++++++

For any set of n measurements xᵢ of X, xᵢ = X + eᵢ, where eᵢ is the total error in the xᵢ.

Total error eᵢ = rᵢ + dᵢ where rᵢ = random error and dᵢ = systematic error.

The errors eᵢ cannot be known unless the correct value of X is known.

In what follows “sumᵢ” means sum over the series of i where i = 1 ® n, and Var[x] is the error variance of x.

Case 1: X is known.

1.1) When X is known, and only random error is present.

The experiment is analogous to an ideal calibration of method.

Then eᵢ = xᵢ – X, and Var[x] = [sumᵢ(xᵢ – X)²]/n = [sumᵢ(eᵢ)²]/n. In this case eᵢ = rᵢ only, because systematic error = dᵢ = 0.

Then [sumᵢ (eᵢ)²]/n = [sumᵢ(rᵢ)²]/n.

For n measurements of xᵢ the mean of error = b = sumᵢ(eᵢ)/n.

When only random error contributes, the mean of error b tends to zero at large n.

So, Var[x] = sumᵢ[(xᵢ – X)²]/n = sumᵢ[(X + rᵢ) -X)²]/n = sumᵢ[(rᵢ)²]/n

and the standard deviation describes a normal dispersion centered around 0.

Thus, when error is a random variable, the mean of error ‘b’ does not appear in the variance.

In case 1.1, Rich’s uncertainty, sqrt[sumᵢ g(vᵢ,bᵢ)²] is not correct and in any event should have been written sqrt[sumᵢ g(sᵢ,bᵢ)²].

1.2) X is known, and both random error and constant systematic error are present.

When the dᵢ are present and constant, then dᵢ = d for all i.

The mean of error = ‘b,’ = sumᵢ [(xᵢ – X)]/n = sumᵢ [(X + rᵢ + d) – X]/n = sumᵢ [(rᵢ+d)]/n = nd/n + sumᵢ [(rᵢ)/n], which goes to ‘d’ at large n.

Thus, in 1.2, b = d.

And: Var[x] = sumᵢ[(xᵢ – X)²]/n = sumᵢ{[(X + rᵢ + d) – X]²}/n = sumᵢ [(rᵢ+d)²]/n, which produces a dispersion around ‘d.’

Thus, because X is known and ‘d’ is constant, ‘d’ can be found exactly and subtracted away.

The mean of the final error, ‘b’ never enters the variance.

That is, when b => d = a real number constant that can be known and can be corrected out of subsequent measurements of samples.

This last remains true in other laboratory samples where the X is unknown, because the method has been calibrated against a similar sample of known X and the methodological ‘d’ has been determined. That ‘d’ is always constant is an assumption, i.e., that experimenter error is absent and the methodology is identical.

Case 2: X is UNknown, and both random error and systematic error are present

Then the mean of xᵢ = [sumᵢ (xᵢ)/n] = x_bar.

As before, let xᵢ = X + eᵢ = X + rᵢ + dᵢ.

Var[x] = sumᵢ[(xᵢ – x_bar)²]/(n-1), and the SD describes a dispersion around x_bar.

2.1) Systematic error = 0.

If dᵢ = 0, then eᵢ = rᵢ is random, and x_bar becomes a good measure of X at large n.

Var[x] = sumᵢ[(xᵢ – x_bar)²]/(n-1) = sumᵢ[(X + rᵢ) – (X+r_r)]²/(n-1) = sumᵢ[(rᵢ – r_r)]²/(n-1), where r_r is the residual of error in x_bar over interval ‘n’.

As above, the mean of error ‘b’ = sumᵢ[(xᵢ – x_bar)]/n, = sumᵢ[(X+rᵢ) – (X+r_r)]/n = sumᵢ[(rᵢ – r_r)]/n = [n(r_r)/n + sumᵢ(rᵢ)/n], and b = r_bar + r_r, where r_bar is the average of error over the ‘n’ interval.

Then b is a real number, which again does not enter the uncertainty variance and which approaches zero at large n

2.2) If dᵢ is constant = c

Then xᵢ = X + rᵢ + c.

The error mean = ‘b’ = sumᵢ[(xᵢ – x_bar)]/n = sumᵢ{[(X + rᵢ + c) – (X + c + r_r)]}/n, and b = sumᵢ[(rᵢ -r_r)/]n = sumᵢ(rᵢ)/n – n(r_r)/n, and b = (r_bar – r_r), wherein the signs of r_bar and r_r are unspecified.

The Var[x] = sumᵢ[(xᵢ – x_bar)²]/(n-1) = sumᵢ [(X+rᵢ+c) – (X+r_r+c)]²/(n-1) = sumᵢ[(rᵢ – r_r)²]/(n-1).

The variance describes a dispersion around r_r.

The mean error, ‘b’ does not enter the variance.

Case 3; X is UNknown and systematic error, dᵢ, varies due to uncontrolled variables.

Uncontrolled variables mean that every measurement (or every model run) is impacted by inconstant deterministic perturbations, i.e., inconstant causal influences. These modify the value of each result with unknown biases that vary with each measurement (or model run).

Any measurement, xᵢ = X +rᵢ + dᵢ, and dᵢ is a deterministic, non-random variable, and usually non-zero.

Over two measurement sequences of number n and m, the mean error = b = sumᵢ(rᵢ + dᵢ)/n, = bn, and bm = sumj(rj + dj)/m and bn ≠ bm, even if interval n equals interval m.

Var[x]n = sumᵢ[(xᵢ – x_bar-n)²]/(n-1), where x_bar-n is x-bar over sequence n.

Var[x]m = sumj[(xj – x_bar-m)²/](m-1)

and Var[x]n = sumᵢ[(xᵢ – x_bar-n)²]/(n-1) = sumᵢ [(X+rᵢ+dᵢ)-(X+r_r+d_bar-n)]²/(n-1) = sumᵢ[(rᵢ – r_r + dᵢ – d_bar-n)²]/(n-1) = sumᵢ[rᵢ -(d_bar-n + r_r – dᵢ)]²]/(n-1).

Likewise, Var[x]m = sumj[(rj – (d_bar-m + r_r – dj)]²]/(m-1).

Thus, neither bn nor bm enter into either Var[x], contradicting Rich’s assumption.

The dᵢ, dj enter into the total uncertainty of the x_bar-n, x_bar-m. Further, the variation of dᵢ, dj with each i, j means that the dispersion of Var[x]n,m will include the dispersion of the dᵢ, dj. The deterministic cause of dᵢ, dj will very likely make their distribution non-normal.

That is, when systematic error is inconstant due to uncontrolled variables, dᵢ will vary with each i, and will produce a dispersion represented by the standard deviation of the dᵢ.

This negates the claim that systematic error cannot contribute an uncertainty interval.

Also, x_bar-n ≠ x_bar-m, and [dᵢ – (d_bar – n] ≠ [dj – (d_bar-m].

Therefore Var[x]n ≠ Var[x]m, even at large n, m and including when n = m over well-separated periods.

Case 4: X is known, and dᵢ varies due to uncontrolled variables.

This is a case of calibration against a known X when uncontrolled variables are present, and mirrors the calibration of GCM-simulated global cloud fraction against observed global cloud fraction.

4.1) A series of n measurements.

Here, eᵢ = xᵢ – X, and Var[x] = sumᵢ[(xᵢ – X)²]/n = sumᵢ(eᵢ)²/n = sumᵢ[(rᵢ + dᵢ)²]/n = [u(x)²].

As eᵢ = rᵢ + dᵢ, then Var[x] = sumᵢ(rᵢ + dᵢ)²]/n, but the values of each rᵢ and dᵢ are unknown.

The denominator is ‘n’ rather than (n+1) because X is known and degrees of freedom are not lost to a mean in calculating the standard variance.

For n measurements of xᵢ the mean of error = b = sumᵢ(eᵢ)/n = sumᵢ(rᵢ + dᵢ) = variable depending on ‘n,’ because dᵢ varies in an unknown but deterministic way across n.

However, X is known, therefore (xᵢ – X) = eᵢ is known to be the true and complete error in the i-th measurement.

At large n, the sumᵢ(rᵢ) becomes negligible. and Var[x] = sumᵢ[(eᵢ)²/n] = sumᵢ[(dᵢ)²/n] = [u(x)²] at the limit, which is very likely a non-normal dispersion.

The systematic error produces a dispersion because the dᵢ vary. At large n, the uncertainty reduces to the interval due to systematic error.

Th mean of error, ‘b,’ does not enter the variance.

The claim that systematic error cannot produce an uncertainty interval is again negated.

5) X is UNknown and dᵢ varies due to uncontrolled variables. The experimental sample is physically similar to the calibration sample in 4.

5.1) Let xᵢ’ be the i-th of n measurements of experimental sample 5.

The estimated mean of error = b = sumᵢ(x’ᵢ – x’_bar)/n

When x’ᵢ is measured, and X’ is unknown, Var[x’] = sumᵢ[(x’ᵢ – x’_bar)²]/(n-1).

= sumᵢ[(X’+r’ᵢ+d’ᵢ) – (X’ + d’_bar + r’-r)]²/(n-1).

and Var[x’] = sumᵢ[r’ᵢ + (d’ᵢ – d’_bar – r’_r))²]/(n-1).

Again, the mean of error b’ does not enter into the empirical variance.

And again, the dispersion of the implicit dᵢ contributes to the total uncertainty interval.

The empirical error mean ‘b’ is not an accuracy metric because the true value of X is not known.

The empirical dispersion, Var[x’], is an uncertainty interval about x’_bar within which the true value of X is reckoned to lay. The [Var[x’] does not describe an error interval, because the true error is unknown.

In the event of an available calibration, the uncertainty variance of the mean of the xᵢ’ can be assigned as the methodological calibration variance [(u(x)]² as in 4.1, if the multiple of measurements is close to the conditions of the calibration experiment.

The methodological uncertainty then describes an interval within which the true value of X is expected to lay. The uncertainty interval is not a dispersion of physical error.

Modesty about uncertainty is deeply recommended in science and engineering. So, if measurement n₅ < calibration n₄, we choose the conservative empirical uncertainty, [u(x’)²] = Var[x’] = sumᵢ[(x’ᵢ – x’_bar)²]/(n-1).

The estimated mean of error, b’, does not enter the variance.

The presence of the d’ᵢ in the variance again ensures that the uncertainty interval includes a contribution from the interval due to variable systematic error.

5.2) A single measurement of x’. The experimental sample is again similar to 4.

One measurement does not have a mean, so (x’_i – x’_bar) is undefined.

However, from 4.1, we know the methodological [u(x)²] from the calibration experiment using a sample of known X.

Then, for a single measurement of x’ in an unknown but analogous sample, we can indicate the reliability of x’ by appending the standard deviation of the known calibration variance above, sqrt(u(x)²) = ±u(x).

Thus, the measurement of x’ is conveyed as x’±u(x), and ±u(x) is assigned to any given single measurement of x’.

Single measurements do not have an error mean, ‘b.’ which in any case cannot appear in the error statement of x’.

However, the introduced calibration variance includes the uncertainty interval due to systematic error.

The uncertainty interval again does not represent the spread of error in the measurement (or model output). It represents the interval within which the physically true value is expected to lay.

Conclusions:

In none of these standard cases, does the error mean, ‘b,’ enter the error variance.

A constant systematic error does not produce a dispersion.

When variable systematic error is present and the X is known, the uncertainty variance of the measured ‘x’ represents a calibration error statistic.

When variable systematic error is present and X is unknown, the true error variance in measured x is also unknown, but is very likely a non-normal uncertainty interval that is not centered on the physically true X.

That uncertainty interval can well be dominated by the dispersion of the unknown systematic error. A calibration uncertainty statistic, if available, can then be applied to condition the measurements of x (or model predictions of x).

Rich’s analysis failed throughout.

This Appendix finishes with a very relevant quote from Vasquez and Whiting (2006):

“[E]ven though the concept of systematic error is clear, there is a surprising paucity of methodologies to deal with the propagation analysis of systematic errors. The effect of the latter can be more significant than usually expected. … Evidence and mathematical treatment of random errors have been extensively discussed in the technical literature. On the other hand, evidence and mathematical analysis of systematic errors are much less common in literature.”

My experience with the statistical literature has been the almost complete neglect of systematic error as well. Whenever it is mentioned, systematic error is described as a constant bias, and little further is said of it. The focus is on random error.

One exception is in Rukhin (2009), who says,

“Of course if [the expectation value of the systematic error is not equal to zero], then all weighted means statistics become biased, and [the mean] itself cannot be estimated. Thus, we assume that all recognized systematic errors (biases) have been corrected for …”

and then off he goes into safer ground.

Vasquez and Whiting go on: “When several sources of systematic errors are identified, ‘β’ is suggested to be calculated as a mean of bias limits or additive correction factors as follows:

β = sqrt{sumᵢ[u(x)ᵢ]²}

where i defines the sources of bias errors, and [dᵢ] is the bias range within the error source i . Similarly, the same approach is used to define a total random error based on individual standard deviation estimates,

ek = sqrt{sumᵢ[σ(x)ᵢ]²}”

That is, Vasquez and Whiting advise estimating the variance of non-normal systematic error using exactly the same mathematics as is used for random error.

They go on to advise combining both into a statement of total uncertainty as,

u(x)total = sqrt[β² +(ek)²].

The Vasquez and Whiting paper completely justifies the method of treating systematic error employed in “Propagation…”

++++++++++++

References:

V. R. Vasquez and W.B. Whiting (2006) Accounting for Both Random Errors and Systematic Errors in Uncertainty Propagation Analysis of Computer Models Involving Experimental Measurements with Monte Carlo Methods Risk Analysis 25(6),1669-1681 doi: 10.1111/j.1539-6924.2005.00704.x.

A. L. Rukhin, (2009) Weighted means statistics in interlaboratory studies Metrologia 46, 323-331; doi: 10.1088/0026-1394/46/3/021

0 0 votes

Article Rating

229 Comments

Stephen Richards

March 1, 2020 10:25 am

My head just exploded and I was a physicist

KcTaz

Reply to Stephen Richards

March 1, 2020 11:27 am

I am so glad to hear that! I was not a physicist so, I expected my head to explode, which it did. Glad I’m in good company. I did get the gist of it, I think, which is that climate models are predictive of nothing but whatever the modeler wants. Is that close enough?

Jeff Alberts

Reply to KcTaz

March 1, 2020 12:45 pm

I have a built-in safety. My eyes glaze over, preventing cranial explosion.

Steven Mosher

Reply to KcTaz

March 2, 2020 6:51 pm

“I did get the gist of it, I think, which is that climate models are predictive of nothing but whatever the modeler wants. Is that close enough?”

wrong. the models dont match observations in some key areas.

skeptics make 2 contradictory arguments:
A) models output what the scientists want
B) models fail becauee they dont match observations

Clyde Spencer

Reply to Steven Mosher

March 2, 2020 8:58 pm

Mosher
Your statement is illogical. For A), the implication is that scientists want lots of warming; the models provide that. For B), observations show little warming. There is nothing contradictory there. Move along, there is nothing to see here.

-1

Robert Kernodle

Reply to Clyde Spencer

March 3, 2020 9:03 am

skeptics make 2 contradictory arguments:
A) models output what the scientists want
B) models fail because they don’t match observations

I’m with Clyde S’s reply on that, which basically means that I do not see a contradiction either.

(A) Models output quite a bit of warming. Climate “scientists” WANT warming.
(B) Such output of quite a bit of warming that scientists want does not match observations.

Where’s the contradiction?

Climate scientists want something that is not there.
Climate models output something that is not there.
Models and scientists, thus, fail.

-1

Pat Frank

Reply to Steven Mosher

March 3, 2020 12:24 pm

Steve Mosher, “wrong. the models dont match observations in some key areas.”

Wrong. Models are forced to match observations in some key areas.

Their match of observations is no indication of physical accuracy. Their dis-match of observations is a positive indication of physical inadequacy.

AGW asserters make two contradictory arguments:
1) model air temperature projections are just story lines, not predictions
2) inter-model consistency means air temperature projections are predictions

-1

Greg

Reply to Stephen Richards

March 1, 2020 11:44 am

I’m not even prepared to attack something that long, especially after not being particularly impressed with previous offerings.

Pat Frank

Reply to Greg

March 1, 2020 1:54 pm

Greg, your impressions are not important to the conversation.

-1

Robert Kernodle

Reply to Pat Frank

March 2, 2020 2:23 pm

How about MY impression of readers like Greg? — not important scientifically, of course, but in terms of willingness to intellectually engage deeply, … perhaps.

Does this count as a passive-aggressive ad hominem? Sorry, I couldn’t resist.

Carry on with the real discussion, intact-head folk. I will, at least, try to tune in, without being dissuaded by my previous impressions, which went something like, “Good God, I’ll never be able to understand that level of technical detail!”

… yet here I am — I WILL read it all, and take away what I can.

gbaikie

Reply to Stephen Richards

March 1, 2020 4:06 pm

I just skipped 99% of it.
The problem is the model.

I used to say if one model Earth as world completely covered with an Ocean,
then you might get somewhere.
But it seems that is too hard. Now I have simpler idea:
the average temperature of the entire volume of the Ocean determines
global climate, or global temperature.
And Earth’s average ocean temperature is currently about 3.5 C.
And having ocean entire volume temperature of 3.5 , means we have a cold ocean. And doing our current Ice Age, the average temperature of the ocean has been in the range of about 1 to 5 C.

Whenever the ocean temperature is nearer 5 C, Earth is in the warmest periods of an interglacial period.
Increasing the ocean temperature from about 3.5 C to 4 C will result an significant amount of “global warming”. But such significant amount of global warming has nothing to do with the Earth becoming “too hot”. Though one could characterize the world as becoming more tropical.
And if or when ocean warms to 4 C, we still in an Ice Age.
Likewise there would a huge effect if ocean temperature were to cool from about 3.5 C to about 3 C.
Though just .5 C increase doesn’t cause Earth to become “too hot”, a .5 C decrease doesn’t cause Earth to become too cold. A .5 C drop in ocean temperature is an indication that we could be entering a glacial period. Or if we can’t predict when going to enter a glacial period {and apparently we currently can’t do this] .5 C drop in ocean temperature indicates we are somewhere near the “brink” of entering a glacial period.
And if we entering a glacial period, it still doesn’t mean Earth is “too cold”, but effects would be a tendency of more desertification and/or Earth becoming less “tropical”. Of course another aspect of entering a glacial period is growth of temperate glaciers. Another aspect of cooling oceans is more violent weather in the Temperate Zones. And you can get warmer winters, but most notable is the colder winters- or generally you get more extreme weather.

Robert Kernodle

Reply to gbaikie

March 3, 2020 9:18 am

I can give a concise, ten-word summary:

As serious predictive tools for policy makers, climate models suck.

Yes, I am proposing that the last word in my summary be instituted as standard language in modern scientific writing.

Robert Kernodle

Reply to Robert Kernodle

March 3, 2020 10:49 am

Seriously, though, the thing about this article that attracted me was the phrase, “physical reasoning” in the title.

I’m getting the impression that this phrase, “physical reasoning” has a more in-depth, defined meaning in computing than I associate it with my plain-language understanding of the phrase. But I also get the impression that my plain-language understanding and the possibly more in-depth formal meaning might not be so separated.

During my intense, although brief, encounter with the study of mathematics, I was always amazed by students who could crank out the calculations flawlessly at the highest level of complexity. But I always wondered whether they really understood the meaning of what they were doing. I always needed to understand the deep meaning of what I was doing, and I felt as though people were not taught this — there was no time for this. I needed to go more slowly, connecting to first principles, first definitions, and so forth. I could not advance as fast as my robotic-minded comrades. Teaching math does not seem to be oriented in this way at universities. I could not do it the way it was being done, and so I left the scene.

It seems easy to be captivated by the sheer power of math, so much so that the meaning of it can get lost, and hence the person gleefully cranking it out can also get lost, because he/she has lost touch with the meaning, which is what I would call “physical meaning”.

This is why I have to ask questions like, “What does a global average temperature really mean?” OR “What does an average solar flux really mean?” The math can be correct, sure, no doubt, BUT is the meaning correct? — I think not. That’s why I treat the concept of “global average temperature” with a great sense of caution, and why I flat-out reject the concept of “average solar flux”.

Resolving this argument between Pat and Richard, then, seems to be a very, very advanced exercise in pinpointing the minutia of differences between a strictly calculating approach and a calculating-with-meaning approach. I cannot understand the minutia, for sure, but I think I get the gist that implies that people can weave all manner of complex math justifications to cover their failures to understand the meaning behind their math.

4TimesAYear

Reply to Stephen Richards

March 2, 2020 2:53 am

I don’t feel so bad now either. It’s going to take me a few days to read it all, but I think the “short take-home message” was very helpful.

HD Hoese

March 1, 2020 10:36 am

Will read it later, but as a shining light of something that even us lesser trained knew about long ago.
Statisticians should enter research as in Smith, E.P. 2020. Ending Reliance on Statistical Significance Will Improve Environmental Inference and Communication. Estuaries and Coasts 43, 1–6. https://doi.org/10.1007/s12237-019-00679-y https://link.springer.com/article/10.1007/s12237-019-00679-y

From the abstract–“Numerous authors have commented and criticized its use as a means to identify scientific importance of results and have called for an end to using the term “statistical significance.” Recent articles in Estuaries and Coasts were evaluated for reliance on the use of statistical significance and reporting errors were identified.”

Pat Frank

Reply to HD Hoese

March 2, 2020 3:05 pm

Thanks for the reference HD Hoese.

That paper looks like an important corrective tonic, and went right into my Endnote library.

Pat Frank

March 1, 2020 10:51 am

My serious thanks to Anthony and Charles for posting my essay, and for all you.

Carlo, Monte

Reply to Pat Frank

March 1, 2020 1:10 pm

Thank you for taking the time to unwind Booth’s paper; for me it was obtuse, opaque, and hard-to-read, and now I know why. It was so obtuse that I completely missed how the dimensions in his main equation are unbalanced. I will note that when I pointed out his treatment of uncertainty did not follow the JCGM GUM, he had no real answer.

David L Hagen

Reply to Carlo, Monte

March 1, 2020 5:06 pm

Thanks Pat for your systematic explorations and Monte for JCGM GUM ref.
BIPM’s “GUM: Guide to the Expression of Uncertainty in Measurement”
See Evaluation of measurement data — Guide to the expression of uncertainty in measurement. JGCM 100:2008 PDF
Sadly this international guide standard is rarely if ever mentioned or applied by the IPCC or climate science authors. While they address statistical distributions, they hardly ever mention systematic uncertainties (Type B) that can be as large or larger then the statistical (Type A) errors.
The wide gap between climate model predictions of the Anthropogenic Signature and the reality of satellite and radiosonde data is exposed by McKitrick & Christy 2018 etc.
McKitrick R, Christy J. A Test of the Tropical 200‐to 300‐hPa Warming Rate in Climate Models. Earth and Space Science. 2018 Sep;5(9):529-36.
When will Booth etc. dare to address the massive systematic uncertainties between those? Where are they? How do we identify them? How large are they? etc.
https://www.bipm.org/en/publications/guides/gum.html
https://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf
https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2018EA000401

Pat Frank

Reply to Pat Frank

March 1, 2020 1:22 pm

A few special characters have mistranslated into HTML. Most especially the special character forward arrow, such as ->, translated as an R inscribed within a circle.

There are a few other special character glitches, including that none of the Word-defined super- or subscripts appear, which I hope do not cause anyone any trouble.

Clyde Spencer

Reply to Pat Frank

March 1, 2020 3:12 pm

Pat,
I suspect I have stumbled on another instance of mistranslation. Is the following correct, and if not, what did you intend? “…, and 0 £ a £ 1.”

Pat Frank

Reply to Clyde Spencer

March 1, 2020 3:34 pm

Hi Clyde, regrets about that.

The British pounds in the original were ‘less than or equal to,’ as in 0 ≤ a ≤ 1. Or, if that doesn’t come out either, 0 < or = a < or = 1.

Geoff Sherrington.

Reply to Pat Frank

March 1, 2020 8:16 pm

Hi Pat,
Looking forward to your thorough, expanded version of this rebuttal.
Over on Climate Etc I am showing examples of problems with temperature measurements and methods and getting blank stares in reply, plus a few diehard comments invoking authority.
Sooner or later, proper error treatment will sink in for climate researchers. It is so important. Geoff S

Matthew R Marler

March 1, 2020 10:53 am

Pat Frank, thank you again for your effort.

I think you are wrong there.

Unfortunately the universal presence of random measurement variation (variation that is neither exactly predictable nor reproducible) makes the last sentence untestable. There was much discussion of empirical (epistemological) random variability and fundamental (metaphysical) random variability in the quantum mechanical developments of the early 20th century. If you can’t tell whether there is in fact an underlying determinism, then you are unwise to assume to assume. Both before and after measurment, all that is known empirically about a measurable attribute is the range of its most likely values. and estimates of the parameters of the distribution. This is true of the center of mass of an aircraft and the center of lift of its wing — as well as your blood hemoglobin concentration and the O2 saturation of the blood. Without believing in the accuracy of the Bayesian mathematics, you can do well to think of measurement as reducing the variance of the measured quantity, and reducing the bias, but not of eliminating the random variation.

Random variation in measurement outcomes is the most thoroughly documented result in all of empirical science. Whatever you think is the outcome of a deterministic process, the next 3 measurements of it will, with high probability, not all be equal.

Admittedly, some measurement variation is extremely slight. Modern measurements of the speed of light and Avogadro’s number are so precise that the former is taken as a constant and used to redefine the meter; the second has been proposed as a constant to redefine mass. Cases like that are not common.

Matthew R Marler

Reply to Matthew R Marler

March 1, 2020 11:10 am

I don’t think my metaphysical/epistemological note affects any thing substantial in your response to Richard Booth.

Pat Frank

Reply to Matthew R Marler

March 1, 2020 11:41 am

Hi Matthew, thanks for your kind encouragement.

In your comment, it looks to me like you’ve first raised the inevitable appearance random error of some measurement, i.e., your “random measurement variation.” That appears to concern measurement error rather than the measurement itself.

In the bit you quoted, Rich wasn’t addressing error. He was addressing the unknown state as though it should be considered a random variable prior to measurement.

That is not true in science, where theory is always present. The measurement either validates the theory or refutes the theory.

Even when some new phenomenon is discovered, it is interpreted as unexpected, given a deficient explanatory scope of the existing theory. In science, as-yet unmeasured phenomena are neither viewed nor defined to be random variables until they are measured.

Quantum Mechanics is a fully deterministic theory, as the quantum state itself evolves in an entirely deterministic way described by the equations of the theory; in particular as governed by the Bell inequality (pdf).

The fact that quantum states emerge into a probabilistic distribution when they are scattered, e.g., on their measurement, does not imply that the unmeasured state itself is a random variable.

I don’t want to get into the philosophy of QM, which would be a distraction in this forum. And in any case, far too much air has been expended on it by others.

Jeff Alberts

Reply to Pat Frank

March 1, 2020 12:49 pm

I just want to know when my Infinite Improbability Drive will be ready.

Pat Frank

Reply to Jeff Alberts

March 1, 2020 3:35 pm

Flash Gordon had it in 1964, Jeff. I know because it was reliably reported in the comics. Somehow, we lost it. 🙂

Matthew R Marler

Reply to Pat Frank

March 1, 2020 11:58 pm

Pat Frank: That appears to concern measurement error rather than the measurement itself.

There is no “measurement itself” free from measurement error. It’s at best a conceptual distinction which disappears when actual measurements are undertaken.

Pat Frank

Reply to Matthew R Marler

March 2, 2020 4:12 pm

No problem with that, Matthew.

I had measurement as a category in mind.

Tim Gorman

Reply to Matthew R Marler

March 1, 2020 5:51 pm

Matthew:

“Unfortunately the universal presence of random measurement variation”

Think of it this way. When you go out to read your thermometer in the morning just how many measurements do you take? Do you read it once? Twice?

In order for there to be a “random measurement variation” you need multiple measurements. One or two don’t give you any kind of a probability distribution which you can use to determine a “variation”.

But there *is* an uncertainty associated with that morning measurement. For all the reasons that Pat discussed.

If you use that measurement to determine a daily average then how do you account for the uncertainty you have associated with that morning measurement in the daily average itself? The uncertainty doesn’t just disappear when you form the average. If that uncertainty interval is greater than the difference you are trying to discern, e.g. a +/-0.01deg difference from a +/-0.5deg uncertainty, then you are only fooling yourself. If you follow the rules of significant digits when you calculate the average, i.e. one decimal place, you won’t even see a 0.01deg difference from your average calculation!

There are too many computer scientists and statisticians associated with the CGM’s and too few engineers and physical scientists. The CGM’s should even be outputting temperature differences past the first decimal, their inputs just don’t justify any more significant digits than that! Any engineer or physical scientist can tell you that simple rule of significant digits!

AndyHce

Reply to Tim Gorman

March 2, 2020 5:47 pm

Many modern thermometers are electronic. As far as I know, the most common use a resistance measurement (voltage drop compared to a standard) of a bit of special metal that is (hopefully) calibrated to the temperature of said piece of metal, and which is hopefully in close agreement with the air temperature. I have read many comments about the ability of such thermometers to make very brief interval measurements. A frequent complaint is that in some places a momentary spike, which can be fairly large, will be recorded as the official high temperature (e.g. in Australia).

I have also read that the international metrological standard practice for such thermometers is to make one measurement per second for two and a half minutes, calculate the average over those 2.5 minutes, throw out any extreme values (greater than 2 SD?), calculate the average of the remaining values, and record that as the official temperature. I have also read that NOAA uses 5 minutes instead of 2.5 minutes.

It should be possible to empirically determine, in general, what the distribution and variation in air temperature is over an average measurement period. In this way one could determine what the reasonable uncertainty is that should be expressed. In the best case it would be like making multiple unbiased measurements of something that does not vary and thus getting a random distribution of variations that will reduce the uncertainty in the simplest way. This should include the variation of the instrument itself when the temperature of the air surrounding the instrument is held to a closer temperature than the precision of the instrument (i.e. how linear the thermometer is).

However, I have no idea if such measurements will usually produce anything like a normal distribution, or if any such validation is employed. Does anyone?

Clyde Spencer

Reply to AndyHce

March 2, 2020 9:06 pm

AndyHce
You asked, “if such measurements will usually produce anything like a normal distribution,”. Sometimes they will, sometimes they won’t. A common day will provide an approximately sinusoidal temperature change from sunrise to sunset. Now, imagine a day when a cold front moves through in the late morning and the temperatures drop 20 deg F. That will not be a normal distribution, and may not even be symmetrical.

Tim Gorman

Reply to AndyHce

March 3, 2020 1:15 pm

Andy,

Modern devices being electronic doesn’t do away with uncertainty in their readings. Even the thermistors used in the Argo float are non-linear over the range of temperatures they measure. and it gets worse when the thermistor is embedded in a device that has its own contribution to uncertainty.

Averaging over a period of time has its own uncertainty. In a two minute period clouds can put the device in shade, out of shade, and in shade. A five minute period makes it even worse. Even a one minute exposure to bare sun can change the temperature of the atmosphere surrounding the measurement device. So can one minute of shade. So what does that do to the uncertainty surrounding the temperature measurements? How do you determine what a “reasonable uncertainty” actually is?

It would seem that, as with so many, you are confusing error with uncertainty. You can take multiple measurements of the same thing using the same device and use the central limit theory to determine a more accurate mean. But spreading temperature measurements over a period of time is *not* taking multiple measurements of the same thing even though the same device is used. That’s like measuring 1000 8’x2″x4″ boards with the same measuring tape, averaging the results and then saying that average is the actual length of each of the 100 boards. Common sense should be all you would need to understand that just isn’t the case.

“In the best case it would be like making multiple unbiased measurements of something that does not vary and thus getting a random distribution of variations that will reduce the uncertainty in the simplest way.”

How does the temperature not vary over time? And even if it doesn’t vary how does that lessen the uncertainty associated with the measuring device? You seem to still be addressing error and not uncertainty.

If I take multiple measurements of an 8′ rod using a tape measure using a ruler I think is 12″ long but is actually 13″ long, how does taking multiple measurements and averaging them reduce the error in the mean you determine?

“This should include the variation of the instrument itself when the temperature of the air surrounding the instrument is held to a closer temperature ”

How do you hold the temperature of the outside air to a closer temperature than the precision of the thermometer? Don’t confuse calibration with either error or uncertainty. No calibration lab I know of pretends to be able to duplicate all environmental conditions a thermometer might endure when in the outside environment. They will duplicate certain, specified conditions and calibrate the instrument to those conditions. Everything else has an uncertainty associated with it. And one the instrument leaves the calibration lab the instrument’s calibration will begin to degrade, thus introducing uncertainty into any measurement. How many thermometers in the 19th and 20th centuries were regularly calibrated? Yet we use those measurements as a baseline to compare to today’s measurements.

The manipulations of data today adds in even *more* uncertainty into results. When they increase or decrease temperature data to “adjust” it, with no absolute knowledge of the calibration of each instrument whose measurement is thus adjusted how do we know they aren’t *increasing* uncertainty instead of decreasing it?

Jim Gorman

Reply to Matthew R Marler

March 2, 2020 12:46 pm

Matthew ->> “you can do well to think of measurement as reducing the variance of the measured quantity, and reducing the bias, but not of eliminating the random variation.”

Unless you make enough multiple readings of the temperature at a given time to be able to plot a distribution of measurements, you do not have anything but the one reading to work with. Therefore, you can not reduce the variance because there is no variance with one reading, and one reading only. The only uncertainty you have is the uncertainty involved with that one time measurement.

It is also impossible to understand how someone can think that the Central Limit Theory applies to temperature measurements and their averages. Each time a temperature measurement is made at a given device, that reading is the only measurement that you have and could ever have of that measurand at that point in time. You simply can not make another reading hours later, average the two and consequently increase the accuracy and precision, and also reduce the uncertainty by claiming CTL lets me divide by N. The same logic applies to averaging readings from multiple devices at multiple locations. Each measurand that goes into the average is a single unique population of 1 (one) with a given precision and a minimum uncertainty specified by the range of the next digit after the recorded value. There are multiple sources on the internet that tell you how to combine populations with different variances. Use them. Trying to say the CTL to divide the population variance by sqrt(N) only tells you how close a sample distribution is to the true mean. It doesn’t let you increase the precision and accuracy of the mean nor reduce the variance of the population. I’ve done it. Combine all your readings into one population and find the simple mean and the variance. Then go to the work of choosing samples, calculating the each sample mean, then find the mean of that sample mean distribution. The mean will be the same.

One should be careful about using the GUM in determining how to handle uncertainties from different devices, different times of reading, and different temperatures. The GUM basically deals with determining the uncertainty associated with the measurement of one thing (the measurand) at a time. It also deals with uncertainty in measuring devices and how their uncertainty should be included in an uncertainty budget.

The GUM has little about trending which is what we are really dealing with. To arrive at a GAT (global average temperature) stock traders probably have a better idea of how it should be done since they deal with indexes made up of unique individual measurements and trend them to resolve index pricing variations.

Krishna Gans

March 1, 2020 11:08 am

Sadly, it’s far beyond my horizon

jorgekafkazar

Reply to Krishna Gans

March 1, 2020 2:32 pm

There are many other things, possibly more important, within your horizon.

Vuk

March 1, 2020 12:03 pm

I trust all that maths has some meaning to it.
“estimate of the past black box values and to predict the black box output.” That is, his emulator targets model output. It does not emulate the internal behavior or workings of the full model in some simpler way.”
Black boxes for climate simulation are most likely waste of time.
In years past I have designed and build few ‘black boxes’ which were later manufactured for limited use for number of customers. At the start I knew what the input is and what the required output was meant to be, but I never ever bothered to start with an equation. When whole lot was done, I would do number of tests varying the input and recording the output. Data was used to produce graphs with useful working range clearly indicated, and then the maths of the transfer curve would be ‘constructed’ and numerically verified. Less wise were suitably impressed with the paperwork, but the engineers were only interested in a simple but the very essential ‘does it do the job’; and as it happens most of the time it did, but sadly there were one or two failures.

ferdberple

March 1, 2020 12:30 pm

This reminds me of the famous painting; ‘This is not a pipe’.

Climate modelling would have us understand a pipe by looking at a painting of a pipe. The first thing we would notice is that pipes are very thin and cannot actually hold much tobacco. The dimensions of the pipe have been corrupted to produce the painting.

A climate model is not climate. It is a painting of climate. The dimensions of the painting of climate are not the dimensions of the original.

ferdberple

March 1, 2020 12:35 pm

Rich and climate modelers both describe the probability distribution of the output of a model of unknown physical competence and accuracy, as being identical to physical error and predictive reliability.
===========
Mathematically, this is an error. The statistics of a painting of climate will not equal the statistics of actual climate, because your painting is not an exact replica. It is not a pipe. It is simply a picture of a pipe. Perhaps a complex and expensive picture, but a picture all the same.

Jeff Alberts

March 1, 2020 12:42 pm

What was the middle part again?

Climate Heretic

March 1, 2020 12:51 pm

The climate models have not been Validated or Verified. Hence they are useless.

Regards
Climate Heretic

commieBob

Reply to Climate Heretic

March 1, 2020 2:12 pm

Not all software is a model of a physical system. Rules have been created for the verification and validation of such software.

The obvious first requirement is that software that models a physical system should accurately reproduce that system’s behavior.

Since climate models fail at reproducing the climate’s behavior, the climate modelers try to use the verification and validation rules which apply to software that doesn’t model physical systems. It’s the old switcheroo. Their claim that their models are verified and validated is just bunk.

Paul Penrose

Reply to Climate Heretic

March 2, 2020 10:01 am

I disagree. As engineering process models they are useful in a limited way: for studying certain subsets of atmospheric physics and improving our understanding of them. But they are not fit for the purpose of projecting (predicting) future “global climate” states.

RickWill

March 1, 2020 1:31 pm

Climate models fail at a very basic level. The linked chart shows the precipitation minus evaporation for the mean of the CMIP5 models for RCP85:
http://climexp.knmi.nl/data/icmip5_pme_Amon_modmean_rcp85_0-360E_-90-90N_n_+++_2000:2020.png

I produced this chart to see how well the models tracked the measured atmospheric water vapour. The TPW data is not available on the KNMI site. I realised that if I integrated the the pme for each month I should get the TPW.

If you take a close look at the chart it is clear that there is considerably more precipitation than evaporation over each yearly cycle. Integrating over this 20 year period results in a very dry atmosphere. In fact the atmosphere needs to be manufacturing water because the integration results in MINUS 60mm TPW over the 20 years.

How ever the models are constructed, they do not bear any resemblance to the real physical world. Knowing the importance of atmospheric water vapour to the global weather and climate, it should be one of the key variable to get physically correct.

commieBob

Reply to RickWill

March 1, 2020 2:51 pm

Indeed. Increased water vapor is the mechanism which provides the positive feedback which increases the climate sensitivity beyond 2°C per doubling of CO2. As far as I can tell, the models ignore the energy it takes for the evaporation that results in water vapor.

It just occurred to me that it shouldn’t take increased CO2 to raise the temperature, thus causing enhanced evaporation which causes even more greenhouse effect. Just water, by itself, should be able to start the process of runaway global warming. Why doesn’t that happen?

Have I just found another fatal flaw in CAGW, or is that just my cold medication talking?

Tim Gorman

Reply to commieBob

March 1, 2020 5:58 pm

commieBob,

You aren’t the first one to notice this. It’s the old “don’t believe your lying eyes” magic. Supposedly water vapor by itself doesn’t provide a positive feedback loop which would, sooner or later, run away. It requires CO2 to make the water vapor into a positive feedback loop.

Which, of course, is garbage.

AGW is Not Science

Reply to Tim Gorman

March 2, 2020 12:59 pm

Yes, I pointed out that fallacy a long time ago. The ridiculous “positive feedback loop” wouldn’t need any CO2 to drive it.

Chaswarnertoo

Reply to commieBob

March 2, 2020 12:01 am

What? Another one? ( been done) . The science is settled! 😂

Pat Frank

Reply to RickWill

March 1, 2020 3:44 pm

Rick, my admiration for doing the detailed work, and congratulations for a striking result. It’s very worth publishing; most especially if the same result appears for other scenarios.

Your data are the CMIP5 mean, which means the random error in precipitation and evaporation should have averaged down to some small residual. So, the result you got then reveals a deterministic error.

You can demonstrate that as fact by assessing several models to see if they all make the same error.

RickWill

Reply to Pat Frank

March 1, 2020 8:44 pm

Pat
Other than mean, I have only looked at one model – the CSIRO. The KNMI site only provides a 10 run average for the CSIRO Mk3 model. This link is the 2000 to 2020 pme plotted for the RCP85:
http://climexp.knmi.nl/data/icmip5_pme_Amon_CSIRO-Mk3-6-0_rcp85_0-360E_-90-90N_n_+++_2000:2020.png
I have not integrated the actual data but by observation it appears that precipitation and evaporation are better balanced than the model mean.

The KNMI site has numerous runs for CMIP 3 and 5 models. This is the list of CMIP5 runs available:
http://climexp.knmi.nl/selectfield_cmip5.cgi?id=someone@somewhere
You can download the data from the site but it is tedious and any basic analysis demonstrates it is rubbish.

I have plotted the NASA earth observation data for TPW and OLR for the last 3 years:
https://1drv.ms/b/s!Aq1iAj8Yo7jNg1uzA-KKFEvD5BzX
The annual variation in TPW and OLR are positively correlated and in phase. This is the opposite of the “greenhouse effect”.

I was aiming to see how well the models related to what has been measured in the last three years. I didn’t bother looking at any more models once I found the model mean was so far “unphysical”. The models are fundamentally no better than an X order polynomial where X is chosen to to fit the number of slope reversals in the historical temperature record. Between CMIP3 and CMIP5, there must have been additional orders (more tunable factors) because there were a few more slope reversals that needed to be accounted for. There is no doubt your black box is as effective as any of these unphysical models in predicting some imagined future climate; and orders of magnitude simpler. (Could you imagine putting your hand out for billions to come up with a simple equation) It is no wonder climate modellers resent the concept of a black box with a single very simple equation.

WXcycles

Reply to RickWill

March 2, 2020 6:06 am

Congrats Rick, that’s an iceberg below the water line.

Tom

March 1, 2020 1:32 pm

“He then wrote that if one instead made ten measurements using ten independently machined rulers then the uncertainty of measurement = “sqrt(10) times the uncertainty of each.” But again, that is wrong.
The original stipulation is equal likelihood across ±0.1″ of error for every ruler. For ten independently machined rulers, every ruler has a length deviation equally likely to be anywhere within -0.1″ to 0.1″. That means the true total error using 10 independent rulers can again be anywhere from 1″ to -1″.
The expectation interval is again (1-(-1)”/2 = 1″, and the standard uncertainty after using ten rulers is 1″/sqrt(3) = ±0.58″. There is no advantage, and no loss of uncertainty at all, in using ten independent rulers rather than one. This is the outcome when knowledge is lacking, and one has only a rectangular uncertainty estimate — a not uncommon circumstance in the physical sciences.”

The above is not correct. The error distribution for ten different rulers is much different than using one ruler.

Tim Gorman

Reply to Tom

March 1, 2020 6:05 pm

Tom,

“The above is not correct. The error distribution for ten different rulers is much different than using one ruler.”

Ten *independent* rulers don’t have an error distribution. Ten independent rulers each have their own separate uncertainty interval. Their uncertainty intervals add.

Tom

Reply to Tim Gorman

March 3, 2020 3:46 am

If one picks a single ruler then its error distribution will be the same rectangular distribution of everything the manufacturer produces, that being an equal likelihood of error from -0.1” to +0.1 inches per foot. A repeated measurement with that one ruler just multiplies the error, so that in the case of 10 repetitions, the error would be from -1.0″ to +1.0″.

If instead, one picks 10 rulers all at once and measures by putting them end to end, the result is not going to be an equal likelihood of from -1.0″ to +1.0″. A selection of 10 rulers will result in a group with a normal (gaussian) distribution with a mean of zero. The probability of picking ten rulers all with the same error is vanishingly small. Its also as Rich said, if you just looked at all the rulers and picked the shortest one and the longest one you could find, you could be almost certain that the mean error of the two rulers would be zero. So, my understanding of the problem is that repeated measurement improves accuracy of the result when the error of the measuring device is not constant as would be the case with almost any real world situation that I can think of.

This has little to do with the problem of models which have errors of a different sort. The error in measuring todays temperature is one thing; an error in predicting tomorrow’s temperature is something else altogether.

Jim Gorman

Reply to Tom

March 3, 2020 6:35 am

Tom –> Let’s put it a different way. You assumed that “A selection of 10 rulers will result in a group with a normal (gaussian) distribution with a mean of zero”. You simply can’t make this assumption because you are uncertain what each ruler has for an error. Hence the term uncertainty. You could end up with ten rulers too long or ten that are too short or even ten rulers that are right on.

Consequently, you can’t assume any distribution. This is why WHEN you use the Central Limit Theory to predict a “true value” by averaging measurements there are two main assumptions, independent measurements of the same thing and a random distribution of measurements. In many cases not enough measurements of the same thing by the same device are made to insure a random distribution. This by itself results in additional uncertainty.

Ultimately, you can not reduce uncertainty by using different measuring devices (ten different rulers) to measure the same thing. Nor can you increase accuracy and precision. The uncertainties add whether you are using one ruler or ten.

Tom

Reply to Jim Gorman

March 3, 2020 7:20 am

If I pick ten of the rulers at random (of the stipulated rectangular error distribution), what are the chances of picking two in a row that both are short by 0.1″; ten in a row?

Tim Gorman

Reply to Jim Gorman

March 3, 2020 1:27 pm

Tom,

Let’s consider picking ten rulers at random from a run of 1000 from the same machine in a single day.

At the start of that day the operator should have calibrated the machine so the first few rulers churned out will be pretty close to the correct length. As the production run continues however, the cutting die will either start churning out rulers that get shorter and shorter or longer and longer. In other words you will no longer have a random distribution, it will be significantly skewed one way or the other.

If you pick from the machine that is progressively cutting the rulers shorter and shorter then your liklihood of picking ten rulers that are too short are pretty high. The odds of picking any that are too long are pretty low. If you pick from a machine that is progressively cutting the rulers longer and longer then your liklihood of picking ten rulers that are too long is pretty likely. The odds of picking any ruler that is too short is pretty low.

“stipulated rectangular error distribution”

And now we are back to the usual position of trying to equate error and uncertainty. They are not the same! That is what Pat’s whole point has been all along.

Tom

Reply to Jim Gorman

March 3, 2020 3:51 pm

Can you suggest an experiment that would help me understand?

Tim Gorman

Reply to Jim Gorman

March 4, 2020 3:00 pm

Tom,

An experiment?

Go to ebay and buy three, inexpensive analog multimeters. You can get them for about $5.50 each.

Take one reading each on your car battery (with car not running). Estimate the uncertainty associated with each reading. Average the readings together and estimate the uncertainty associated with that average.

Then, if you can, find a lab-grade, recently calibrated voltmeter and see what it gives you for a reading.

Pat Frank

Reply to Tom

March 3, 2020 8:29 am

Tom “A selection of 10 rulers will result in a group with a normal (gaussian) distribution with a mean of zero.”

You don’t know that, Tom. And the manufacturer’s specs do not say so. You’re just making a convenient assumption — rather like Rich did.

Every ruler has an equal chance of being anywhere between -0.1″ and 0.1″. Any distribution of lengths is equally possible for 10 rulers. The distribution is not Gaussian. It’s rectangular.

Tom

Reply to Pat Frank

March 3, 2020 9:45 am

The manufacture’s specs say any ruler has an equal chance of being in error by up to +/- 0.1″. If I get ten of their rulers, what are the chances they will all be in error by the same amount; I think the answer is approximately zero.

Pat Frank

Reply to Pat Frank

March 3, 2020 12:31 pm

Getting all the rulers with the same error has the same probability as getting any other set of lengths. There’s no way to know.

Pat Frank

Reply to Tom

March 1, 2020 6:39 pm

Following on from Tim Gorman, all we know is the manufacturer specification that any given ruler is within (+/-)0.1 inch of 12 inches. And every ruler can be anywhere in that interval with equal probability.

There is no known distribution of error.

Ten independent measurements yield an uncertainty interval of (+/-)1 inch, i.e., one could have 10 rulers all off by -0.1 inch or +0.1 inch. There’s no way to know.

Carlo, Monte

Reply to Pat Frank

March 2, 2020 7:03 am

If you submit a 9-meter tape measure (i.e. a long ruler) to a cal lab for calibration, what they will do is give you a report that states the distance indicated by the tape is within the manufacturer’s specifications at several discrete points, such as 10%, 50%, and 90% of full scale. They do not generate a calibration correction table for every 1-mm hash mark, it would be horrendously expensive. The uncertainty of a measurement using the tape can only be calculated from the specs as a Type B uncertainty interval. Averaging multiple measurements of the same distance cannot reduce the uncertainty.

Tom

Reply to Pat Frank

March 5, 2020 5:01 am

There is no known distribution of error? I understood that the error distribution was rectangular, and therefore equal probability of selecting a ruler with any error. This means that the mean error for the entire population of rulers is zero. I contend that as you increase your sample size, the mean of that sample will approach the mean of the total population, or zero, and therefore you are more likely to get an accurate reading if you use multiple rules, if the population is as I understand it to be.

Perhaps my understanding of the problem is incorrect?

Pat Frank

Reply to Tom

March 5, 2020 10:58 am

The uncertainty specification is about manufacturing deficiencies in accuracy, Tom. It’s not about the distribution of any population of manufactured rulers.

They could just as well have a run of rulers short by 0.03″ as anything else. But one never knows.

If someone wanted to get a proper handle on the problem, they’d have to do an accuracy study of methods and machines on the manufacturing floor. Is one operator or one machine more likely to produce shorter or longer rulers than another?

The point of the present exercise is that one never knows the length distribution of any given population, no matter how large. All one has is the (+/-)0.1″ uncertainty.

Tom

Reply to Pat Frank

March 5, 2020 4:13 pm

If one never knows, then why did you set up the problem on the basis of a rectangular distribution with the error range of +/- 0.1 in/ft? Doesn’t that have a very well defined meaning?

Pat Frank

Reply to Pat Frank

March 5, 2020 7:58 pm

Rich set up the rectangular distribution, not I.

We’ve been discussing the well-defined meaning. All length errors are equally possible. You continue to treat them as though they are not. So did Rich.

commieBob

March 1, 2020 1:44 pm

I think what Pat is describing is a kludge.

An ill-assorted collection of poorly-matching parts, forming a distressing whole.

Having evaluated a lot of student software, I am painfully familiar with kludges. Well written software is elegant and easy to understand. A kludge will make your brain explode. The writer of the kludge will assume that other peoples’ inability to understand his crap proves his superior intellect. The truth is that the writer of said crap, almost 100% guaranteed, understands it worse than the poor benighted person trying to grade it.

Just because someone can throw a bunch of stuff together, and it doesn’t actually crash, doesn’t mean it’s useful or valid in any way.

Kudos Pat. I suspect Hercules had an easier time cleaning the Augean Stables.

Nick Stokes

Reply to commieBob

March 1, 2020 1:48 pm

“Well written software is elegant and easy to understand.”
Well written articles are elegant and easy to understand. This was not one such.

jorgekafkazar

Reply to Nick Stokes

March 1, 2020 2:34 pm

Not helpful, Nick.

-1

commieBob

Reply to Nick Stokes

March 1, 2020 2:35 pm

To belabor my metaphor, we are presented with the image of muck flying in all directions rather than that of the shining edifice at the end of the process.

I’m sure you can point to examples where this kind of thing has been handled much better.

Nick Stokes

Reply to commieBob

March 1, 2020 4:55 pm

Do you think the article is elegant and easy to understand?

commieBob

Reply to Nick Stokes

March 1, 2020 5:36 pm

Not at all. Given what Pat is trying to do, I don’t think I could do better. I was hoping you could provide some useful guidance or an example or two.

-1

Nick Stokes

Reply to Nick Stokes

March 2, 2020 12:45 am

” I was hoping you could provide some useful guidance or an example or two.”

Well, take just the treatment of Rich’s recurrence relation, starting
“From eqn. (1ʀ), for a time series”
To the end of the 3R section is several pages. It’s just blundering about with a simple first order difference equation. Rich in fact got it right, in a few lines. 3R is the correct solution of 1R with his E{} values. The error with the starting point of the summation in 2R is evidently just a typo.

It’s true that 3R as shown here has the last term wrong; it should be (1-a)^t W(0), as Rich correctly wrote it.

Pat Frank

Reply to Nick Stokes

March 2, 2020 2:49 pm

Rich’s 3ʀ derives from his 2ʀ, which is wrong.

From Rich’s 2ʀ:
Case i = 0 = t
W(0) = (1-a)⁰[R₁(0)+R₂(0)-rR₃(0)]+ (1-a)⁰W(0)
which reduces to W(0) = W(0) because the Rn(0) are undefined.

Case i = 1 = t
W(t1) = (1-a)¹[R₁(t1)+R₂(t1)-rR₃(t1)]+ (1-a)¹W(0), which is wrong.

Case 1 should yield W(t1) = [R₁(t1)+R₂(t1)-rR₃(t1)]+ (1-a)¹W(0), which is Rich’s foundational “concrete time evolution of W(t)” eqn. 1ʀ, produced de novo, ex cathedra, and apropos of nothing.

The “blundering about” included the demonstration that 2ʀ is wrong.

The blundering also noticed the violation of dimensionality, the tendentious linearity, and the inappropriate variances.

Thank-you for pointing out the typo that left out the t-exponent in (1-a)^tW(0) in my rendition of 3ʀ.

-1

Janice Moore

Reply to Nick Stokes

March 2, 2020 3:16 pm

Mr. Stokes,

If you want to elucidate how Mr. Frank’s analysis is incorrect, you would do well to address the following:

None of Rich’s statistical conjectures are constrained by known physics or by the behavior of physical reality. In other words, they display no evidence of physical reasoning.

***

The eqn. 1ʀ emulator itself is ad hoc. Its derivation iswithout reference to the behavior of climate models and of physical reasoning. Its ability to emulate a GCM air temperature projection is undemonstrated.

Until you refute the above, Rich Booth’s attempt to criticize Frank fails and rather miserably, at that.

FYI: My highest level of math is college level Calculus and, thanks to his clear presentation and eloquent writing, I understood Mr. Frank quite well. He writes as we were taught to in Computer Science: accurately, completely, economically, and logically.

Janice

-1

Pat Frank

Reply to Nick Stokes

March 2, 2020 4:21 pm

Janice, really nice to see you. 🙂

I’m impressed with your advanced STEM education. I had no idea. 🙂 Congratulations!

Hope things are well with you, and best wishes 🙂

Janice Moore

Reply to Nick Stokes

March 2, 2020 6:19 pm

Aw, Pat. How kind of you … . One of my degrees was in Computer Science, but, I never “practiced” in that field. Sure am glad I took those courses, though. Well, heh, most of them…

You are such a fine scientist. A scientist’s scientist. I am currently reading Walter Isaacson’s biography of Albert Einstein. You remind me of him (Einstein – heh). A fine thinker AND an all-around fine human being. With an excellent sense of humor! 🙂

Thank you for your kind wishes (and for taking the time to write them — I was kind of hoping you would… it’s funny, but, just getting a friendly “Hi!” on WUWT can really make my day. And being ignored can make me sad …). Things are not exactly “well,” but, really, not so bad. After all, I can see. I can hear. I am healthy. Thus, I am rich, indeed.

And, now that I have had the pleasure of reading what you wrote to me — I am happy, too. 🙂

Take care,

Janice

Robert Kernodle

Reply to Nick Stokes

March 3, 2020 12:53 pm

Do you think the article is elegant and easy to understand?

Suppose you had to convince somebody that ingesting crap was not that good for them. Trouble is, this person has been eating crap for a lifetime.

You might first start by giving a detailed description of the human body, then a detailed description of its biochemistry, elucidating all levels of minutia about why the molecular structure of crap was not good for the body. This could go on for page after page, because, after all, the person you are trying to illuminate with better insight has a visceral, reflexive attachment to eating crap.

Breaking down crap-eating into its many, many faults is itself a crappy process. To assume that it would be elegant and easy is, perhaps, an inelegant starting assumption. (^_^)

sycomputing

Reply to Nick Stokes

March 1, 2020 3:55 pm

Well written articles are elegant and easy to understand. This was not one such.

Bite yer Nickers there budrow . . . ad hom Stokes understanding as much as it seeks it.

Phoenix44

Reply to Nick Stokes

March 2, 2020 1:58 am

But is it right? I assume you think it is, since your only criticism is of the style.

-1

Paul Penrose

Reply to Nick Stokes

March 2, 2020 10:18 am

Nick,
Cleaning up someone else’s mess is often a messy process in itself. Been there, done that.

Kevin kilty

March 1, 2020 1:51 pm

Pat,

Thanks for tackling such a massive analysis. I have only a couple of comments of a general nature.

First, you say…

Rich recommended false precision; a mistake undergraduate science and engineering students have flogged out of them from the very first day. But one that typifies consensus climatology.

I am unsure what gets flogged out of whom early in one’s education, but education is not very uniform. Lots of graduate physical science educations are deficient in probability and statistics. That is why our statistics department began to offer a course for new faculty hires, post-docs, etc. in statistics. I think they gave up most recently because our university is shorthanded across the “sciences”.

This was true of my own education. If it weren’t for a great deal of study on my own, plus one graduate course in probability, and then a willingness to teach statistics courses that no one else ever wanted to teach, I’d be pretty ignorant. I am sure most of my cohort stayed ignorant. The one course in my scientific discipline (physics/geophysics) where I was introduced to anything advanced was a course in “inverse theory” where our textbook was Philip Bevington’s book on data analysis. Yet, what was emphasized in this course were not the sections on propagation of error, but rather the algorithms for inverting data to find model parameter values. I.e. find an answer but be unable to articulate how insignificant it might be.

At any rate, the inability to do statistics or misunderstandings about what propagation of error is are understandable. You ought to see what ABET recommends that engineers know about probability or statistics or measurement uncertainty to see what a “low bar” means.

Second point — I see the reference to GUM and to JCGM which I was unfamiliar with, and could not find in the context of this work or in others referred to here. Perhaps others are puzzled about these acronyms too, so I will just mention that JCGM refers to the “Joint Committee for Guides in Metrology”. I have looked now at JCGM 100:2008 and see that it is very similar to the NIST Statistics Handbook. There is a PDF version of 100:2008 found here.

Tim Gorman

Reply to Kevin kilty

March 1, 2020 6:14 pm

Kevin,

“I am unsure what gets flogged out of whom early in one’s education, but education is not very uniform. Lots of graduate physical science educations are deficient in probability and statistics.”

Respectfully, this really isn’t an issue of probability and statistics. And engineering students *do* get false precision beat out of them pretty early.

You can take ten voltage measurements using a voltmeter that reads out to two decimal places, average the measurements together, and come up with an answer out to 16 decimal places on your calculator. At least when I was in the engineering lab the instructor would give you an F for such an answer.

I know they still teach the rules for significant digits in high school chemistry and physics. You are expected to know those rules when you get to college.

But it seems pretty apparent that the climate scientists and computer programmers doing the CGM’s either never learned the rules or are ignoring them out of convenience. Probability and statistics simply can’t create more precision than your inputs can provide. Yet the CGM’s do.

Kevin kilty

Reply to Tim Gorman

March 1, 2020 7:17 pm

Tim,

Respectfully, you have missed what I had to say, utterly.

1. I have had some 5,000 plus students in my one-half career as a college professor. I do my best to inform them about significant digits, but they persist. Through four years many of them. So, I am not sure who gets what beat out of whom. No one, to my knowledge, flunks out of engineering schools for reporting excessive precision. In fact, nowadays, you will lose a grade appeal for trying so.

2. I didn’t say this is a problem with statistics and probability per se, although it is hard to argue it is unrelated to it. Pat stated that there are too many statisticians in climate science already. My point was that there is a problem with inadequate preparation in ways through the science disciplines. Our statistics faculty recognized it among our scientist hires. I was never required to take a single course in probability and statistics through three scientific degrees. I did so on my own, but few others did. The problems with what one does not know runs all directions, and becomes critical in interdisciplinary studies, like climate science. I was reacting to something that Pat said with which someone like Nic Lewis, for example, might beg to differ.

3. I have no idea what is apparent with climate scientists. Perhaps they are not well versed in a broad range of topics, and tend to do what amateurs do. What I have noticed among the few climate science types I have met, but is just about universally true among people with science degrees and who believe deeply in climate change is this: They have mixed their work with a belief system and with their politics.

Pat Frank

Reply to Kevin kilty

March 1, 2020 7:02 pm

Thanks, Kevin. In my earliest undergrad lab courses, we were introduced to significant digits and the limits of resolution. Notions of measurement error were introduced.

At year 3, my Analytical Chemistry lab came down hard on experimental error and its propagation, as you would expect. Analytical chemists are fanatics about accounting for error.

Then the major went on to “Instrumental Methods of Analysis,” a strong physical methods lab course that required full treatment of error. Many years later, they still teach it as Chem 422.

I’d be surprised if engineering doesn’t teach treatment of error and its basic statistics in the context of the undergrad courses, even if not in a formal statistics class. Can that be true?

Thanks for posting the link to the JCGM. I should have done that.

If you’re not familiar with it, the NIST published B.N. Taylor and C.E. Kuyatt (1994) Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results, which is a good treatment as well.

Though there, too, the attention to systematic error is sparse.

Kevin kilty

Reply to Pat Frank

March 1, 2020 8:53 pm

Pat,

Thanks. I will have a look at the Taylor, Kuyatt publication. I learn something new with each treatment I examine.

I have the distinct impression that chemists do the best job with significant digits and propagation of error at present, although one of my uncles was a land surveyor and wore me out as his young assistant with his treatment of closure error. Thus, I think you are really over-estimating what other people are likely to know, even if well educated.

Engineering curricula, in my experience, vary quite a bit depending on discipline and instructor. Some may not ever see the topic, others may get a formulaic approach. For example, for a time I taught a laboratory fluids course. By the time students reached it some had gotten introduced to a sort of propagation of error through the “measurement equation.” What they did was to evaluate a measurement equation at a number of extreme values of its parameters to arrive at uncertainty envelope. The idea of a coverage factor was unknown to them. Others had no idea what I was speaking about.

The reference guide for the FE (fundamentals of engineering) exam just states an equation to handle measurement errors as what they call the Kline-McClintock eq.

$\sigma_y = \sqrt{(\frac{\partial y}{\partial x})^2 \cdot \sigma_x^2+ ... }$

without elaboration. I doubt most engineering programs teach anything especially rigorous.

Tim Gorman

Reply to Kevin kilty

March 2, 2020 5:31 am

Kevin,

I had uncertainty beaten into me in my electrical engineering courses, especially those using analog computers. You could only read the voltmeters and ammeters to a certain resolution. If you ran a simulation and got an answer, then cleared it all, and then reran the simulation you could get different answers – unless you took into consideration significant digits and uncertainty intervals. It’s why two different people could run the same simulation and get two different answers – close but not the same.

With the advent of digital computers the concept of significant digits and uncertainty has seem to gotten lost. Perhaps analog computing needs to be re-introduced into engineering labs!

Btw, it wasn’t just in analog computing that this all became apparent. Two different people measuring stall loads on a motor could come up with different answers because of the resolution of the voltmeters, ammeters, and load inducers and the inherent uncertainty of setting them. Same for using lecher wires to measure the wavelength of microwave signals (nowadays they use frequency counters but ignore their uncertainties because they are digital! ).

Carlo, Monte

Reply to Kevin kilty

March 2, 2020 7:54 am

This just happens to also be equation 10 from the JCGM GUM (Guide to the Expression of Uncertainty in Measurement)…

john cooknell

March 1, 2020 1:58 pm

Stephen Skinner

Reply to john cooknell

March 2, 2020 5:04 am

Excellent.

Rud Istvan

March 1, 2020 2:05 pm

Pat Frank, thanks for the long post rebutting the Booth critique of your work. I have a formal PhD level background in probability theory and statistics, all in the service of econometrics. I thought you were right originally, and said so with a brief non math explanation. I thought Booth was off point (and in some ways wrong by perhaps unwittingly misdefining your carefully delineated actual issues), and said so, but without explanation. You have provided an eloquent rigorous explanation. Kudos.

To state the core issues differently and logically, not mathematically (as posted here several times before), there is an inescapable basic model problem. The CFL constraint means grids are 6-7 orders of magnitude bigger than needed to properly solve phenomena like convection cells using CFD. That forces parameterization of key processes. That requires parameter tuning to best hindcast. That drags in the unknowable (yet) attribution problem. So models provably run hot in aggregate; the nonexistent tropical troposphere hotspot models produce is sufficient evidence. All the climate model kluging cannot erase that.
And you add a second inescapable problem, that the real physical uncertainty around this wrong result compounds greatly—something no amount of pseudostatistical math futzing with model results can ever fix.

Nick Stokes

Reply to Rud Istvan

March 1, 2020 2:31 pm

“The CFL constraint means grids are 6-7 orders of magnitude bigger than needed to properly solve phenomena like convection cells using CFD. That forces parameterization of key processes. That requires parameter tuning to best hindcast.”

Using CFD, as engineers do, grids are always too large to resolve turbulent eddies, including those generated by convection cells. That does not require parameter tuning to best hindcast. CFD practitioners don’t do that (they can’t). Nor do GCMs.

What it does require is an analysis of turbulent kinetic energy and its transport and dissipation.

Derg

Reply to Nick Stokes

March 1, 2020 3:34 pm

In other words Nick, “settled science” 😉

-1

Janice Moore

Reply to Derg

March 2, 2020 3:35 pm

Heh. Good one, Derg. 🙂

Dr. Chris Essex presents the actual state of the physics concisely and clearly here:

http://wattsupwiththat.com/2015/02/20/believing-in-six-impossible-things-before-breakfast-and-climate-models/

Dr. Christopher Essex, Chairman, Permanent Monitoring Panel on Climate, World Federation of Scientists, and Professor and Associate Chair, Department of Applied Mathematics, University of Western Ontario (Canada) in London, 12 February 2015
{video here on youtube: https://www.youtube.com/watch?v=19q1i-wAUpY} …

{25:17} 1. Solving the closure problem. {i.e., the “basic physics” equations have not even been SOLVED yet, e.g., the flow of fluids equation “Navier-Stokes Equations” — we still can’t even figure out what the flow of water in a PIPE would be if there were any turbulence}

-1

Rud Istvan

Reply to Nick Stokes

March 1, 2020 5:01 pm

Nick, why not post something on ‘engineering’ CFD, as you recommended to explain the validity of climate models. Come one, post what you falsely said exists. But before doing so, see my long ago technical post here on that subject.

-1

Rud Istvan

Reply to Nick Stokes

March 1, 2020 5:15 pm

Nick, so per your comment, why do all climate modelers parameter tune. You deny that they do?

-1

Nick Stokes

Reply to Rud Istvan

March 1, 2020 7:08 pm

“why do all climate modelers parameter tune. You deny that they do?”

I said they don’t parameter tune to best hindcast. And they don’t. One recent paper that describes tuning of a specific model was Mauritsen 2012. he says specifically
“The MPI‐ESM was not tuned to better fit the 20th century. In fact, we only had the capability to run the full 20th Century simulation according to the CMIP5‐protocol after the point in time when the model was frozen.”
They did use SST for a couple of decades post 1880 to tune something.

You have never shown, with evidence, that any specific GCM tuned to best hindcast. In your earlier writings, you seemed to rely on a paper by Taylor et al on CMIP5 experiment design. But that is just misunderstanding; the paper said nothing at all about tuning. The “design” just specified what the GCMs should actually investigate, so that results could be built up and compared.

Pat Frank

Reply to Nick Stokes

March 1, 2020 9:24 pm

The MPI-ESM didn’t need much further tuning because it is based upon the ECHAM5 model, which was extensively tuned (pdf).

In any case, Rud’s point is that climate models are tuned. The abstract of Mauritsen et al, (2012) fully supports Rud, by admitting that climate models are invariably tuned to produce target climates.

You were silent on that inconvenient contradiction. Your demand for a specific model as an example merely deflects from Rud’s point.

Here’s the abstract:

“During a development stage global climate models have their properties adjusted or tuned in various ways to best match the known state of the Earth’s climate system. These desired properties are observables, such as the radiation balance at the top of the atmosphere, the global mean temperature, sea ice, clouds and wind fields. The tuning is typically performed by adjusting uncertain, or even non‐observable, parameters related to processes not explicitly represented at the model grid resolution. The practice of climate model tuning has seen an increasing level of attention because key model properties, such as climate sensitivity, have been shown to depend on frequently used tuning parameters. Here we provide insights into how climate model tuning is practically done in the case of closing the radiation balance and adjusting the global mean temperature for the Max Planck Institute Earth System Model (MPI‐ESM). We demonstrate that considerable ambiguity exists in the choice of parameters, and present and compare three alternatively tuned, yet plausible configurations of the climate model. The impacts of parameter tuning on climate sensitivity was less than anticipated. (my bold)”

-1

Gerald Browning

Reply to Nick Stokes

March 1, 2020 9:50 pm

Nick,

A peer reviewed manuscript wil soon appear that mathematically proves that climate (and weathe) models are based on the wrong dynamical system of equations. And so one must ask how can they be said to be providing any result close to reality in a hindcast. The answer has been provided at climate audit using a simple exaample. If one is allowed to choose the forcing, the one can obtain any solution one wants for any time dependent system of equations even if that system is not the correct
dynamical system. That is exactly what the climate modelers have done.

Jerry

-1

Pat Frank

Reply to Rud Istvan

March 1, 2020 8:40 pm

Rud, as you discussed back in 2015, engineers run physical experiments so as to derive the parameters needed to computationally reproduce the measured behavior over the entire range of operational limits.

The parameters are incorporated into their engineering model. This calibrates the model to produce accurate simulations of system behavior within their specification region.

They interpolate to predict the behavior of their system between the experimental points.

I’d expect good engineers to run further experiments at selected interpolation points as well, to verify that their engineering model correctly predicted behavior in the calibration region.

But engineering models are not reliable outside their calibration region.

Leo Smith had a great post on this topic, on-the-futility-of-climate-models-simplistic-nonsense/ back in 2015, which engendered lots of intense commentary by bona-fide engineers. Pretty much all of them expressed considerable critical disapproval of climate models.

-1

Joe Campbell

Reply to Pat Frank

March 2, 2020 9:12 am

Frank: “But engineering models are not reliable outside their calibration region.” Which is why, as a modeler, I have said in several comment areas that, in general, one can trust well-anchored numerical models to interpolate between known data points, but extrapolated beyond only with extreme caution…

David Dibbell

Reply to Rud Istvan

March 2, 2020 4:40 am

Rud, there is recent support for your point, “That requires parameter tuning to best hindcast.” Here is an instance of modelers explicitly tuning to obtain and report improved hindcast results. In my view, it doesn’t get any clearer than this. This is a link to Zhao et al, 2018a, in which simulation characteristics of GFDL’s AM4.0/LM4.0 components of CM4.0 are reported. At this link, one may search “tuning” and see Figure 5 and nearby.

https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2017MS001208

“We emphasize especially that the reduction of global mean bias in OLR from AM2.1 (-2.5 W/m^2) and AM3 (-4.1 W/m^2) to AM4.0 (-0.6 W/m^2) is due to the explicit tuning of AM4.0 OLR toward the value (239.6 W m−2) in CERES‐EBAF‐ed2.8.” [dd re-formatted within the parentheses to read properly here]

john cooknell

March 1, 2020 2:05 pm

Jim Butts

March 1, 2020 2:14 pm

This was not even worth the scroll down to see the comments.

MPassey

March 1, 2020 2:29 pm

Pat Frank

Does the following get to the gist of what you are saying?

AR5 presents a graph of multiple model runs, with a mean and bands at the 95% boundary. They use the term “95 percent confidence interval” for the model runs within the bands. In ordinary study design the term “95 percent confidence interval” is used when sampling a population, such that the derived mean and standard deviation produces a confidence interval (i.e. probability density function) that indicates the statistical likelihood that the sample mean distribution matches the true population value.

Climate modelers erroneously use the term “95 percent confidence interval” to indicate the probability that the model mean matches future temperature, i.e. there is a 95% chance that future temperature will be within the bands. In reality, the only thing they have shown is that is that there is a 95% probability that the next model run will be within the bands.

Clyde Spencer

Reply to MPassey

March 1, 2020 4:36 pm

MPassey
I agree completely with your interpretation in your last paragraph.

Clyde Spencer

Reply to Clyde Spencer

March 2, 2020 8:41 am

While the 95% probability envelope characterizes the variance (and hence, precision — low) of the ensemble runs, it begs the question of the accuracy of the mean. That is, we can say something about the repeatability of the multiple runs, and probability of future runs; however, without comparison of the post-calibration mean to actual measurements, we don’t have a quantitative measure of accuracy. If the variance envelope is large enough, it will encompass reality. Unfortunately, that isn’t very useful for predicting what we can expect in the future unless we can say, with confidence, what the standard deviation is of the run(s) that do correspond to reality! It is necessary to calculate the bias and slope of the mean and correct the results accordingly. Climatologists routinely adjust historical temperature data, why not future data? It isn’t necessary to wait 30 years (Although, we do have Hansen’s predictions!). A decade should give us a fair idea of whether the bias and slope of the mean are even reasonable. Indeed, a decade out can be expected to be more accurate than three decades.

Tim Gorman

Reply to Clyde Spencer

March 4, 2020 3:15 pm

Clyde,

“That is, we can say something about the repeatability of the multiple runs, and probability of future runs; however, without comparison of the post-calibration mean to actual measurements, we don’t have a quantitative measure of accuracy. If the variance envelope is large enough, it will encompass reality.”

Sorry, I missed this earlier. It’s pretty astute.

Clyde Spencer

Reply to Tim Gorman

March 7, 2020 9:57 pm

Tim
Thanks! I was once astutent! 🙂

Carlo, Monte

Reply to MPassey

March 1, 2020 4:36 pm

Assuming the models are in fact merely linear extrapolations of CO2 content, the variation graphs versus time published in the IPCC reports are a reflection of the operators’ opinions about what future CO2 concentrations will be. And as such, they are most certainly not random samples of identical quantities. In addition, what they call a 95% CI is merely the standard deviation of the average of all the models at a certain point of time in the future, multiplied by 2. This assumes there is a normal distribution about a mean of the different lines. It is meaningless.

Tim Gorman

Reply to MPassey

March 1, 2020 6:18 pm

MPassey,

You are correct!

Pat Frank

Reply to MPassey

March 1, 2020 6:29 pm

MPassey, you’ve pretty much got it right.

Nick Stokes

Reply to MPassey

March 1, 2020 9:33 pm

“AR5 presents a graph of multiple model runs, with a mean and bands at the 95% boundary. They use the term “95 percent confidence interval” for the model runs within the bands.”

As so often, no quote, no link, and it just isn’t true. Doesn’t seem to bother anyone. The reference is presumably to Fig TS14, also 11.25. They show a spaghetti plot and refer to a 5 to 95% range, which is just by count of model runs. They don’t use the term “95 percent confidence interval”.

Clyde Spencer

Reply to Nick Stokes

March 2, 2020 8:55 am

Stokes
“The reference is presumably to Fig TS14, also 11.25.” How about a link or citation, which you complained that others rarely use?

Now, if 90% of the multi-thousand runs fall into the displayed range, and you deny that they represent uncertainty of the mean, just what do they represent? Why are they displayed? This is more of your typical sophistry. If you take thousands of sample measurements, and create a probability distribution graph, the +/- standard deviations are commonly accepted as the uncertainty of the central measure. What is different other than the mode of display?

-1

Paul Penrose

Reply to Nick Stokes

March 2, 2020 10:46 am

You keep hand waving like that, Nick, and you are going to injure yourself!

-1

MPassey

Reply to Nick Stokes

March 2, 2020 11:12 am

Nick Stokes

Figure SPM6 shows graphs of temperature anomaly vs models with the explanation: “Model results shown are Coupled Model Intercomparison Project Phase 5 (CMIP5) multi-model ensemble ranges, with shaded bands indicating the 5 to 95% confidence intervals.”

Maybe it’s not a big deal because it’s in the Summary for Policymakers?? But here is just one example of why this misuse of the phrase “confidence interval” is important. Steven Novella runs the Neurologica Blog where he criticizes all kind of anti-science—- vaccines, GMOs, global warming, etc. He calls himself a science communicator. When it comes to climate change, he constantly invokes the IPCC consensus against all criticisms. So, for example, in a 2/2/2018 post on carbon capture he writes:

“Climate scientists have gone beyond just establishing that AGW is happening. They are trying to quantify it and project the trend lines into the future. This type of effort is always fraught with uncertainty, with the error bars increasing with greater time into the future. However, we can take a 95% confidence interval and make reasonable extrapolations of what is likely to happen.”

So, here is a smart guy, with enough knowledge of statistics to know the term “confidence interval”, who believes that the IPCC can quantify temperature trends into the future inside a 95% confidence interval band.

-1

jorgekafkazar

March 1, 2020 2:40 pm

“Rich’s emulator equation (1ʀ) is therefore completely arbitrary. It’s merely a formal construct that he likes, but is lacking any topical relevance or analytical focus.”

We’d call that ex posterior.

Nick Stokes

March 1, 2020 2:53 pm

“Let’s see if that is correct.
…
Compare eqn. (1) to eqn. (2ʀ). They are not identical.”

They are identical, except for the start point of the summation. Rich has simply given the standard convolution solution to a first order linear recurrence relation. It is well understood than in such convolutions you can equally exchange the adding index.
Sum (i=0 to t) a(t-i)b(i)
is the same as
Sum (i=0 to t) a(i)b(t-i)
I think Rich should have written the summation starting from i=1. That is a minor point.

Tim Gorman

Reply to Nick Stokes

March 1, 2020 7:06 pm

Nick,

“They are identical, except for the start point of the summation.”

Huh?

W(t₁) = (1-a)W(0) + R₁(t1) + R₂(t1) -rR₃(t1)

W(t₁) = (1-a)W(0)+(1-a)[R₁(0)+R₂(0) -rR₃(0)]

are *not* identical. It’s not just an issue of starting point.

-1

Pat Frank

Reply to Tim Gorman

March 2, 2020 1:11 pm

You’re right, Tim.

Rich’s t = i = 1 includes (1-a)*(Rn), which is not correct.

It should match his eqn. 1ʀ, but does not.

-1

Prjindigo

March 1, 2020 3:08 pm

Sorry, Pat, you’ve failed at the start.

You called the fudged linear-progressions the warmists display “models” when they do not qualify as such, they are statistical constructs based on no functioning theory just an assumption that temperature increases follow CO2 concentration increases. We know the relationship between atmospheric CO2 and temperature is non-linear, which makes their charts garbage to begin with.

Basically they’re little more than charts of margin of error.

Rick C PE

March 1, 2020 3:12 pm

I’ve now read through Pat’s post twice and it is quite complicated in some areas, but it seems to be mostly necessary to respond fully to Richard Booth’s critique. I do find Pat’s defense to be sound. The underlying issue is clearly the difference between error and uncertainty. Error is, of course, a concept that statisticians should understand. However, Uncertainty of Measurement is a subject studied in metrology and engineering. Not many statisticians spend time designing or using measurement instruments.

I took a lot of math courses including probability and statistics. I did not encounter Measurement Uncertainty until I started working in laboratories. The fact is no one worried much about MU outside of national standards bureaus (e.g. NIST), physicists, electrical engineers and some Quality Control specialists. The global adoption of ISO 17025 as the basis for accrediting laboratories and calibration agencies in the 1990’s included a requirement to determine and clearly state Measurement Uncertainty for all certified measurements and calibrations. It took more than a decade for this to become common practice and it is still frequently misapplied or ignored.

Philo

Reply to Rick C PE

March 1, 2020 7:18 pm

That’s funny. My freshman Physics class had a lab that basically spent the whole semester learning about Uncertainty of Measurement. For example, using a 1 ft ruler measure a 10 ft bench. The result was obviously 10 ft +/1 the estimated error in the ruler. It was a requirement to report say 10 measures of the bemch and estimate how much of the error came from the ruler and how much came from how it was used- the errors always had to be stacked, adding together all the calculated error for each time the bench was measured. That was the total possible error.

It was a good demonstration of how hard reliable measurements can sometimes be.

Kip Hansen

Editor

March 1, 2020 3:46 pm

On Thermometers: I did a Station Survey for the Surface Stations Project. I did Santo Domingo, Dominican Republic. I spoke to the emeritus national meteorologist — he was no longer paid, but still turned up for work each day at the office of the national meteorological station. He took it upon himself to show my wife and I the currently in service Stevenson screen with its glass hi-lo thermometers. (They had had an electronic weather stations for a few years, but a hurricane had blown it away).

He showed me the system — each day at a certain time (1o am I think) one of the junior weathermen would come out to the screen, bringing the log book, open the screen, and read the thermometers — record the values, and push the reset button. The log book showed that all readings were done to 0.5 degrees. All looked straight-forward, until I noticed the concrete block on the ground next to the screen.

I asked about the block…”Oh, : he said, “that’s for the short guys…..” Me: “Huh?” Him: “Yes, the short ones have to stand on the block to be at eye level to read the thermometer….or they get a false reading…of course, none of them will … it is embarrassing, so many readings are off a degree or so…:

True story….

Tim Gorman

Reply to Kip Hansen

March 1, 2020 6:28 pm

Kip,

And yet we are to believe that the CGM’s can take inputs +/- 1deg and come out with a precision of 0.01deg or better for a global average!

-1

Nick Stokes

Reply to Tim Gorman

March 1, 2020 9:22 pm

“we are to believe that the CGM’s can take inputs +/- 1deg “
GCMs do not take surface temperatures as inputs at all. So these stories are irrelevant.

Phoenix44

Reply to Nick Stokes

March 2, 2020 2:04 am

So their starting conditions don’t bother to start with starting conditions? They generate the starting surface temperatures directly from physical processes somehow?

I think not.

-1

Tim Gorman

Reply to Nick Stokes

March 2, 2020 5:42 am

Nick,

Did you see the word “surface” in my reply anywhere?

If the inputs to CGM’s are not temperature related then exactly what are they? And what are their resolution in significant digits and what are their uncertainty interval?

-1

Kevin kilty

Reply to Kip Hansen

March 1, 2020 6:31 pm

Very reminiscent of Irving Langmuir’s discussion with Joseph Rhine about his experiments with ESP. There was a file in the office holding the results of people who didn’t like him and reported results too low….

Clyde Spencer

Reply to Kip Hansen

March 1, 2020 8:08 pm

Let it not be me who calls them block heads.

David Dibbell

March 1, 2020 3:53 pm

Pat Frank,
Your previous posts on this topic, and Richard Booth’s post on February 7th, attracted extensive comments referring to CFD (Computational Fluid Dynamics). The point of those comments, generally, seemed to be to describe GCM’s as a special case of CFD. Perhaps there will be similar comments here at this new post.

The purpose of my comment here is to make reference to this web site linked below. “Tutorial on CFD Verification and Validation”.
https://www.grc.nasa.gov/www/wind/valid/tutorial/tutorial.html

In this section of the tutorial, “Uncertainty and Error in CFD Simulations”, I find support for the distinction you consistently emphasize between uncertainty and error.
https://www.grc.nasa.gov/www/wind/valid/tutorial/errors.html

One more reference from this website, “Validation Assessment.”
https://www.grc.nasa.gov/www/wind/valid/tutorial/valassess.html

In this section, “Applying the code to flows beyond the region of validity is termed prediction.” This is the case for the use of GCM’s for projecting future air temperatures in response to greenhouse gas forcing, as the tuning is based on hindcasts (with historical estimated forcings) and on longer term “preindustrial control” simulations (with no change in greenhouse gas or other anthropogenic forcings.) Anything beyond those cases is prediction.

So as I see it, references to the accepted use of CFD simply reinforce the relevant questions about uncertainty and error. If a CFD simulation is proposed for prediction, a method for conditioning of the model output for calibration error would have to be applied, to estimate its reliability.

David Dibbell

Reply to David Dibbell

March 2, 2020 5:00 am

I’m replying to my own comment here to clarify that I don’t regard any GCM as having been “validated,” or in any sense confirmed valid for diagnosing or projecting the impact of greenhouse gas emissions, by hindcasting.

1 2 3 Next »

wpDiscuz

Share this:

Related Posts

The Model That Works

Unverified and Unvalidated

Are Climate Models “Just Physics”?

SSP5-8.5: Garbage In, Doomcasting Out