Why Roy Spencer’s Criticism is Wrong

12 October 2019

Pat Frank

A bit over a month ago, I posted an essay on WUWT here about my paper assessing the reliability of GCM global air temperature projections in light of error propagation and uncertainty analysis, freely available here.

Four days later, Roy Spencer posted a critique of my analysis at WUWT, here as well as at his own blog, here. The next day, he posted a follow-up critique at WUWT here. He also posted two more critiques on his own blog, here and here.

Curiously, three days before he posted his criticisms of my work, Roy posted an essay, titled, “The Faith Component of Global Warming Predictions,” here. He concluded that, [climate modelers] have only demonstrated what they assumed from the outset. They are guilty of “circular reasoning” and have expressed a “tautology.”

Roy concluded, “I’m not saying that increasing CO doesn’t cause warming. I’m saying we have no idea how much warming it causes because we have no idea what natural energy imbalances exist in the climate system over, say, the last 50 years. … Thus, global warming projections have a large element of faith programmed into them.

Roy’s conclusion is pretty much a re-statement of the conclusion of my paper, which he then went on to criticize.

In this post, I’ll go through Roy’s criticisms of my work and show why and how every single one of them is wrong.

So, what are Roy’s points of criticism?

He says that:

1) My error propagation predicts huge excursions of temperature.

2) Climate Models Do NOT Have Substantial Errors in their TOA Net Energy Flux

3) The Error Propagation Model is Not Appropriate for Climate Models

I’ll take these in turn.

This is a long post. For those wishing just the executive summary, all of Roy’s criticisms are badly misconceived.

1) Error propagation predicts huge excursions of temperature.

Roy wrote, “Frank’s paper takes an example known bias in a typical climate model’s longwave (infrared) cloud forcing (LWCF) and assumes that the typical model’s error (+/-4 W/m2) in LWCF can be applied in his emulation model equation, propagating the error forward in time during his emulation model’s integration. The result is a huge (as much as 20 deg. C or more) of resulting spurious model warming (or cooling) in future global average surface air temperature (GASAT). (my bold)

For the attention of Mr. And then There’s Physics, and others, Roy went on to write this: “The modelers are well aware of these biases [in cloud fraction], which can be positive or negative depending upon the model. The errors show that (for example) we do not understand clouds and all of the processes controlling their formation and dissipation from basic first physical principles, otherwise all models would get very nearly the same cloud amounts.” No more dismissals of root-mean-square error, please.

Here is Roy’s Figure 1, demonstrating his first major mistake. I’ve bolded the evidential wording.

clip_image002

Roy’s blue lines are not air temperatures emulated using equation 1 from the paper. They do not come from eqn. 1, and do not represent physical air temperatures at all.

They come from eqns. 5 and 6, and are the growing uncertainty bounds in projected air temperatures. Uncertainty statistics are not physical temperatures.

Roy misconceived his ±2 Wm-2 as a radiative imbalance. In the proper context of my analysis, it should be seen as a ±2 Wm-2 uncertainty in long wave cloud forcing (LWCF). It is a statistic, not an energy flux.

Even worse, were we to take Roy’s ±2 Wm-2 to be a radiative imbalance in a model simulation; one that results in an excursion in simulated air temperature, (which is Roy’s meaning), we then have to suppose the imbalance is both positive and negative at the same time, i.e., ±radiative forcing.

A ±radiative forcing does not alternate between +radiative forcing and -radiative forcing. Rather it is both signs together at once.

So, Roy’s interpretation of LWCF ±error as an imbalance in radiative forcing requires simultaneous positive and negative temperatures.

Look at Roy’s Figure. He represents the emulated air temperature to be a hot house and an ice house simultaneously; both +20 C and -20 C coexist after 100 years. That is the nonsensical message of Roy’s blue lines, if we are to assign his meaning that the ±2 Wm-2 is radiative imbalance.

That physically impossible meaning should have been a give-away that the basic supposition was wrong.

The ± is not, after all, one or the other, plus or minus. It is coincidental plus and minus, because it is part of a root-mean-square-error (rmse) uncertainty statistic. It is not attached to a physical energy flux.

It’s truly curious. More than one of my reviewers made the same very naive mistake that ±C = physically real +C or -C. This one, for example, which is quoted in the Supporting Information: “The author’s error propagation is not] physically justifiable. (For instance, even after forcings have stabilized, [the author’s] analysis would predict that the models will swing ever more wildly between snowball and runaway greenhouse states. Which, it should be obvious, does not actually happen).

Any understanding of uncertainty analysis is clearly missing.

Likewise, this first part of Roy’s point 1 is completely misconceived.

Next mistake in the first criticism: Roy says that the emulation equation does not yield the flat GCM control run line in his Figure 1.

However, emulation equation 1 would indeed give the same flat line as the GCM control runs under zero external forcing. As proof, here’s equation 1:

clip_image003

In a control run there is no change in forcing, so DFi = 0. The fraction in the brackets then becomes F0/F0 = 1.

The originating fCO₂ = 0.42 so that equation 1 becomes, DTi(K) = 0.42´33K´1 + a = 13.9 C +a = constant (a = 273.1 K or 0 C).

When an anomaly is taken, the emulated temperature change is constant zero, just as in Roy’s GCM control runs in Figure 1.

So, Roy’s first objection demonstrates three mistakes.

1) Roy mistakes a rms statistical uncertainty in simulated LWCF as a physical radiative imbalance.

2) He then mistakes a ±uncertainty in air temperature as a physical temperature.

3) His analysis of emulation equation 1 was careless.

Next, Roy’s 2): Climate Models Do NOT Have Substantial Errors in their TOA Net Energy Flux

Roy wrote, “If any climate model has as large as a 4 W/m2 bias in top-of-atmosphere (TOA) energy flux, it would cause substantial spurious warming or cooling. None of them do.”

I will now show why this objection is irrelevant.

Here, now, is Roy’s second figure, again showing the perfect TOA radiative balance of CMIP5 climate models. On the right, next to Roy’s figure, is Figure 4 from the paper showing the total cloud fraction (TCF) annual error of 12 CMIP5 climate models, averaging ±12.1%. [1]

clip_image005

Every single one of the CMIP5 models that produced average ±12.1% of simulated total cloud fraction error also featured Roy’s perfect TOA radiative balance.

Therefore, every single CMIP5 model that averaged ±4 Wm-2 in LWCF error also featured Roy’s perfect TOA radiative balance.

How is that possible? How can models maintain perfect simulated TOA balance while at the same time producing errors in long wave cloud forcing?

Off-setting errors, that’s how. GCMs are required to have TOA balance. So, parameters are adjusted within their uncertainty bounds so as to obtain that result.

Roy says so himself: “If a model has been forced to be in global energy balance, then energy flux component biases have been cancelled out, …”

Are the chosen GCM parameter values physically correct? No one knows.

Are the parameter sets identical model-to-model? No. We know that because different models produce different profiles and integrated intensities of TCF error.

This removes all force from Roy’s TOA objection. Models show TOA balance and LWCF error simultaneously.

In any case, this goes to the point raised earlier, and in the paper, that a simulated climate can be perfectly in TOA balance while the simulated climate internal energy state is incorrect.

That means that the physics describing the simulated climate state is incorrect. This in turn means that the physics describing the simulated air temperature is incorrect.

The simulated air temperature is not grounded in physical knowledge. And that means there is a large uncertainty in projected air temperature because we have no good physically causal explanation for it.

The physics can’t describe it; the model can’t resolve it. The apparent certainty in projected air temperature is a chimerical result of tuning.

This is the crux idea of an uncertainty analysis. One can get the observables right. But if the wrong physics gives the right answer, one has learned nothing and one understands nothing. The uncertainty in the result is consequently large.

This wrong physics is present in every single step of a climate simulation. The calculated air temperatures are not grounded in a physically correct theory.

Roy says the LWCF error is unimportant because all the errors cancel out. I’ll get to that point below. But notice what he’s saying: the wrong physics allows the right answer. And invariably so in every step all the way across a 100-year projection.

In his September 12 criticism, Roy gives his reason for disbelief in uncertainty analysis: “All of the models show the effect of anthropogenic CO2 emissions, despite known errors in components of their energy fluxes (such as clouds)!

“Why?

“If a model has been forced to be in global energy balance, then energy flux component biases have been cancelled out, as evidenced by the control runs of the various climate models in their LW (longwave infrared) behavior.

There it is: wrong physics that is invariably correct in every step all the way across a 100-year projection, because large-scale errors cancel to reveal the effects of tiny perturbations. I don’t believe any other branch of physical science would countenance such a claim.

Roy then again presented the TOA radiative simulations on the left of the second set of figures above.

Roy wrote that models are forced into TOA balance. That means the physical errors that might have appeared as TOA imbalances are force-distributed into the simulated climate sub-states.

Forcing models to be in TOA balance may even make simulated climate subsystems more in error than they would otherwise be.

After observing that the “forced-balancing of the global energy budget is done only once for the “multi-century pre-industrial control runs,” Roy observed that models world-wide behave similarly despite a, “WIDE variety of errors in the component energy fluxes…”

Roy’s is an interesting statement, given there is nearly a factor of three difference among models in their sensitivity to doubled CO₂. [2, 3]

According to Stephens [3], “This discrepancy is widely believed to be due to uncertainties in cloud feedbacks. … Fig. 1 [shows] the changes in low clouds predicted by two versions of models that lie at either end of the range of warming responses. The reduced warming predicted by one model is a consequence of increased low cloudiness in that model whereas the enhanced warming of the other model can be traced to decreased low cloudiness. (original emphasis)

So, two CMIP5 models show opposite trends in simulated cloud fraction in response to CO₂ forcing. Nevertheless, they both reproduce the historical trend in air temperature.

Not only that, but they’re supposedly invariably correct in every step all the way across a 100-year projection, because their large-scale errors cancel to reveal the effects of tiny perturbations.

In Stephen’s object example we can see the hidden simulation uncertainty made manifest. Models reproduce calibration observables by hook or by crook, and then on those grounds are touted as able to accurately predict future climate states.

The Stephens example provides clear evidence that GCMs plain cannot resolve the cloud response to CO₂ emissions. Therefore, GCMs cannot resolve the change in air temperature, if any, from CO₂ emissions. Their projected air temperatures are not known to be physically correct. They are not known to have physical meaning.

This is the reason for the large and increasing step-wise simulation uncertainty in projected air temperature.

This obviates Roy’s point about cancelling errors. The models cannot resolve the cloud response to CO₂ forcing. Cancellation of radiative forcing errors does not repair this problem. Such cancellation (from by-hand tuning) just speciously hides the simulation uncertainty.

Roy concluded that, “Thus, the models themselves demonstrate that their global warming forecasts do not depend upon those bias errors in the components of the energy fluxes (such as global cloud cover) as claimed by Dr. Frank (above).“I

Everyone should now know why Roy’s view is wrong. Off-setting errors make models similar to one another. They do not make the models accurate. Nor do they improve the physical description.

Roy’s conclusion implicitly reveals his mistaken thinking.

1) The inability of GCMs to resolve cloud response means the temperature projection consistency among models is a chimerical artifact of their tuning. The uncertainty remains in the projection; it’s just hidden from view.

2) The LWCF ±4 Wm-2 rmse is not a constant offset bias error. The ‘±’ alone should be enough to tell anyone that it does not represent an energy flux.

The LWCF ±4 Wm-2 rmse represents an uncertainty in simulated energy flux. It’s not a physical error at all.

One can tune the model to produce (simulation minus observation = 0) no observable error at all in their calibration period. But the physics underlying the simulation is wrong. The causality is not revealed. The simulation conveys no information. The result is not any indicator of physical accuracy. The uncertainty is not dismissed.

3) All the models making those errors are forced to be in TOA balance. Those TOA-balanced CMIP5 models make errors averaging ±12.1% in global TCF.[1] This means the GCMs cannot model cloud cover to better resolution than ±12.1%.

To minimally resolve the effect of annual CO₂ emissions, they need to be at about 0.1% cloud resolution (see Appendix 1 below)

4) The average GCM error in simulated TCF over the calibration hindcast time reveals the average calibration error in simulated long wave cloud forcing. Even though TOA balance is maintained throughout, the correct magnitude of simulated tropospheric thermal energy flux is lost within an uncertainty interval of ±4 Wm-2.

Roy’s 3) Propagation of error is inappropriate.

On his blog, Roy wrote that modeling the climate is like modeling pots of boiling water. Thus, “[If our model] can get a constant water temperature, [we know] that those rates of energy gain and energy loss are equal, even though we don’t know their values. And that, if we run [the model] with a little more coverage of the pot by the lid, we know the modeled water temperature will increase. That part of the physics is still in the model.

Roy continued, “the temperature change in anything, including the climate system, is due to an imbalance between energy gain and energy loss by the system.

Roy there implied that the only way air temperature can change is by way of an increase or decrease of the total energy in the climate system. However, that is not correct.

Climate subsystems can exchange energy. Air temperature can change by redistribution of internal energy flux without any change in the total energy entering or leaving the climate system.

For example, in his 2001 testimony before the Senate Environment and Public Works Committee on 2 May, Richard Lindzen noted that, “claims that man has contributed any of the observed warming (ie attribution) are based on the assumption that models correctly predict natural variability. [However,] natural variability does not require any external forcing – natural or anthropogenic. (my bold)” [4]

Richard Lindzen noted exactly the same thing in his, “Some Coolness Concerning Global Warming. [5]

The precise origin of natural variability is still uncertain, but it is not that surprising. Although the solar energy received by the earth-ocean-atmosphere system is relatively constant, the degree to which this energy is stored and released by the oceans is not. As a result, the energy available to the atmosphere alone is also not constant. … Indeed, our climate has been both warmer and colder than at present, due solely to the natural variability of the system. External influences are hardly required for such variability to occur.(my bold)”

In his review of Stephen Schneider’s “Laboratory Earth,” [6] Richard Lindzen wrote this directly relevant observation,

A doubling CO₂ in the atmosphere results in a two percent perturbation to the atmosphere’s energy balance. But the models used to predict the atmosphere’s response to this perturbation have errors on the order of ten percent in their representation of the energy balance, and these errors involve, among other things, the feedbacks which are crucial to the resulting calculations. Thus the models are of little use in assessing the climatic response to such delicate disturbances. Further, the large responses (corresponding to high sensitivity) of models to the small perturbation that would result from a doubling of carbon dioxide crucially depend on positive (or amplifying) feedbacks from processes demonstrably misrepresented by models. (my bold)”

These observations alone are sufficient to refute Roy’s description of modeling air temperature in analogy to the heat entering and leaving a pot of boiling water with varying amounts of lid-cover.

Richard Lindzen’s last point, especially, contradicts Roy’s claim that cancelling simulation errors permit a reliably modeled response to forcing or accurately projected air temperatures.

Also, the situation is much more complex than Roy described in his boiling pot analogy. For example, rather than Roy’s single lid moving about, clouds are more like multiple layers of sieve-like lids of varying mesh size and thickness, all in constant motion, and none of them covering the entire pot.

The pot-modeling then proceeds with only a poor notion of where the various lids are at any given time, and without fully understanding their depth or porosity.

Propagation of error: Given an annual average +0.035 Wm-2 increase in CO₂ forcing, the increase plus uncertainty in the simulated tropospheric thermal energy flux is (0.035±4) Wm-2. All the while simulated TOA balance is maintained.

So, if one wanted to calculate the uncertainty interval for the air temperature for any specific annual step, the top of the temperature uncertainty interval would be calculated from +4.035 Wm-2, while the bottom of the interval would be -3.9065 Wm-2.

Putting that into the right side of paper eqn. 5.2 and setting F0=33.30 Wm-2, then the single-step projection uncertainty interval in simulated air temperature is +1.68 C/-1.63 C.

The air temperature anomaly projected from the average CMIP5 GCM would, however, be 0.015 C; not +1.68 C or -1.63 C.

In the whole modeling exercise, the simulated TOA balance is maintained. Simulated TOA balance is maintained mainly because simulation error in long wave cloud forcing is offset by simulation error in short wave cloud forcing.

This means the underlying physics is wrong and the simulated climate energy state is wrong. Over the calibration hindcast region, the observed air temperature is correctly reproduced only because of curve fitting following from the by-hand adjustment of model parameters.[2, 7]

Forced correspondence with a known value does not remove uncertainty in a result, because causal ignorance is unresolved.

When error in an intermediate result is imposed on every single step of a sequential series of calculations — which describes an air temperature projection — that error gets transmitted into the next step. The next step adds its own error onto the top of the prior level. The only way to gauge the effect of step-wise imposed error is step-wise propagation of the appropriate rmse uncertainty.

Figure 3 below shows the problem in a graphical way. GCMs project temperature in a step-wise sequence of calculations. [8] Incorrect physics means each step is in error. The climate energy-state is wrong (this diagnosis also applies to the equilibrated base state climate).

The wrong climate state gets calculationally stepped forward. Its error constitutes the initial conditions of the next step. Incorrect physics means the next step produces its own errors. Those new errors add onto the entering initial condition errors. And so it goes, step-by-step. The errors add with every step.

When one is calculating a future state, one does not know the sign or magnitude of any of the errors in the result. This ignorance follows from the obvious difficulty that there are no observations available from a future climate.

The reliability of the projection then must be judged from an uncertainty analysis. One calibrates the model against known observables (e.g., total cloud fraction). By this means, one obtains a relevant estimate of model accuracy; an appropriate average root-mean-square calibration error statistic.

The calibration error statistic informs us of the accuracy of each calculational step of a simulation. When inaccuracy is present in each step, propagation of the calibration error metric is carried out through each step. Doing so reveals the uncertainty in the result — how much confidence we should put in the number.

When the calculation involves multiple sequential steps each of which transmits its own error, then the step-wise uncertainty statistic is propagated through the sequence of steps. The uncertainty of the result must grow. This circumstance is illustrated in Figure 3.

clip_image007Figure 3: Growth of uncertainty in an air temperature projection. clip_image008is the base state climate that has an initial forcing, F0, which may be zero, and an initial temperature, T0. The final temperature Tn is conditioned by the final uncertainty ±et, as Tn±et.

Step one projects a first-step forcing F1, which produces a temperature T1. Incorrect physics introduces a physical error in temperature, e1, which may be positive or negative. In a projection of future climate, we do not know the sign or magnitude of e1.

However, hindcast calibration experiments tell us that single projection steps have an average uncertainty of ±e.

T1 therefore has an uncertainty of clip_image010

The step one temperature plus its physical error, T1+e1, enters step 2 as its initial condition. But T1 had an error, e1. That e1 is an error offset of unknown sign in T1. Therefore, the incorrect physics of step 2 receives a T1 that is offset by e1. But in a futures-projection, one does not know the value of T1+e1.

In step 2, incorrect physics starts with the incorrect T1 and imposes new unknown physical error e2 on T2. The error in T2 is now e1+e2. However, in a futures-projection the sign and magnitude of e1, e2 and their sum remain unknown.

And so it goes; step 3, …, n add in their errors e3 +, …, + en. But in the absence of knowledge concerning the sign or magnitude of the imposed errors, we do not know the total error in the final state. All we do know is that the trajectory of the simulated climate has wandered away from the trajectory of the physically correct climate.

However, the calibration error statistic provides an estimate of the uncertainty in the results of any single calculational step, which is ±e.

When there are multiple calculational steps, ±e attaches independently to every step. The predictive uncertainty increases with every step because the ±e uncertainty gets propagated through those steps to reflect the continuous but unknown impact of error. Propagation of calibration uncertainty goes as the root-sum-square (rss). For ‘n’ steps that’s clip_image012. [9-11]

It should be very clear to everyone that the rss equation does not produce physical temperatures, or the physical magnitudes of anything else. it is a statistic of predictive uncertainty that necessarily increases with the number of calculational steps in the prediction. A summary of the uncertainty literature was commented into my original post, here.

The growth of uncertainty does not mean the projected air temperature becomes huge. Projected temperature is always within some physical bound. But the reliability of that temperature — our confidence that it is physically correct — diminishes with each step. The level of confidence is the meaning of uncertainty. As confidence diminishes, uncertainty grows.

Supporting Information Section 10.2 discusses uncertainty and its meaning. C. Roy and J. Oberkampf (2011) describe it this way, “[predictive] uncertainty [is] due to lack of knowledge by the modelers, analysts conducting the analysis, or experimentalists involved in validation. The lack of knowledge can pertain to, for example, modeling of the system of interest or its surroundings, simulation aspects such as numerical solution error and computer roundoff error, and lack of experimental data.” [12]

The growth of uncertainty means that with each step we have less and less knowledge of where the simulated future climate is, relative to the physically correct future climate. Figure 3 shows the widening scope of uncertainty with the number of steps.

Wide uncertainty bounds mean the projected temperature reflects a future climate state that is some completely unknown distance from the physically real future climate state. One’s confidence is minimal that the simulated future temperature is the ‘true’ future temperature.

This is why propagation of uncertainty through an air temperature projection is entirely appropriate. It is our only estimate of the reliability of a predictive result.

Appendix 1 below shows that the models need to simulate clouds to about ±0.1% accuracy, about 100 times better than ±12.1% the they now do, in order to resolve any possible effect of CO₂ forcing.

Appendix 2 quotes Richard Lindzen on the utter corruption and dishonesty that pervades AGW consensus climatology.

Before proceeding, here’s NASA on clouds and resolution: “A doubling in atmospheric carbon dioxide (CO2), predicted to take place in the next 50 to 100 years, is expected to change the radiation balance at the surface by only about 2 percent. … If a 2 percent change is that important, then a climate model to be useful must be accurate to something like 0.25%. Thus today’s models must be improved by about a hundredfold in accuracy, a very challenging task.

That hundred-fold is exactly the message of my paper.

If climate models cannot resolve the response of clouds to CO₂ emissions, they can’t possibly accurately project the impact of CO₂ emission on air temperature?

The ±4 Wm-2 uncertainty in LWCF is a direct reflection of the profound ignorance surrounding cloud response.

The CMIP5 LWCF calibration uncertainty reflects ignorance concerning the magnitude of the thermal flux in the simulated troposphere that is a direct consequence of the poor ability of CMIP5 models to simulate cloud fraction.

From page 9 in the paper, “This climate model error represents a range of atmospheric energy flux uncertainty within which smaller energetic effects cannot be resolved within any CMIP5 simulation.

The 0.035 Wm-2 annual average CO₂ forcing is exactly such a smaller energetic effect.

It is impossible to resolve the effect on air temperature of a 0.035 Wm-2 change in forcing, when the model cannot resolve overall tropospheric forcing to better than ±4 Wm-2.

The perturbation is ±114 times smaller than the lower limit of resolution of a CMIP5 GCM.

The uncertainty interval can be appropriately analogized as the smallest simulation pixel size. It is the blur level. It is the ignorance width within which nothing is known.

Uncertainty is not a physical error. It does not subtract away. It is a measure of ignorance.

The model can produce a number. When the physical uncertainty is large, that number is physically meaningless.

All of this is discussed in the paper, and in exhaustive detail in Section 10 of the Supporting Information. It’s not as though that analysis is missing or cryptic. It is pretty much invariably un-consulted by my critics, however.

Smaller strange and mistaken ideas:

Roy wrote, “If a model actually had a +4 W/m2 imbalance in the TOA energy fluxes, that bias would remain relatively constant over time.

But the LWCF error statistic is ±4 Wm-2, not (+)4 Wm-2 imbalance in radiative flux. Here, Roy has not only misconceived a calibration error statistic as an energy flux, but has facilitated the mistaken idea by converting the ± into (+).

This mistake is also common among my prior reviewers. It allowed them to assume a constant offset error. That in turn allowed them to assert that all error subtracts away.

This assumption of perfection after subtraction is a folk-belief among consensus climatologists. It is refuted right in front of their eyes by their own results, (Figure 1 in [13]) but that never seems to matter.

Another example includes Figure 1 in the paper, which shows simulated temperature anomalies. They are all produced by subtracting away a simulated climate base-state temperature. If the simulation errors subtracted away, all the anomaly trends would be superimposed. But they’re far from that ideal.

Figure 4 shows a CMIP5 example of the same refutation.

clip_image014

Figure 4: RCP8.5 projections from four CMIP5 models.

Model tuning has made all four projection anomaly trends close to agreement from 1850 through 2000. However, after that the models career off on separate temperature paths. By projection year 2300, they range across 8 C. The anomaly trends are not superimposable; the simulation errors have not subtracted away.

The idea that errors subtract away in anomalies is objectively wrong. The uncertainties that are hidden in the projections after year 2000, by the way, are also in the projections from 1850-2000 as well.

This is because the projections of the historical temperatures rest on the same wrong physics as the futures projection. Even though the observables are reproduced, the physical causality underlying the temperature trend is only poorly described in the model. Total cloud fraction is just as wrongly simulated for 1950 as it is for 2050.

LWCF error is present throughout the simulations. The average annual ±4 Wm-2 simulation uncertainty in tropospheric thermal energy flux is present throughout, putting uncertainty into every simulation step of air temperature. Tuning the model to reproduce the observables merely hides the uncertainty.

Roy wrote, “Another curious aspect of Eq. 6 is that it will produce wildly different results depending upon the length of the assumed time step.

But, of course, eqn. 6 would not produce wildly different results because simulation error varies with the length of the GCM time step.

For example, we can estimate the average per-day uncertainty from the ±4 Wm-2 annual average calibration of Lauer and Hamilton.

So, for the entire year (±4 Wm–2)2 = clip_image016, where ei is the per-day uncertainty. This equation yields, ei = ±0.21 Wm–2 for the estimated LWCF uncertainty per average projection day. If we put the daily estimate into the right side of equation 5.2 in the paper and set F0=33.30 Wm-2, then the one-day per-step uncertainty in projected air temperature is ±0.087 C. The total uncertainty after 100 years is sqrt[(0.087)2´365´100] = ±16.6 C.

The same approach yields an estimated 25-year mean model calibration uncertainty to be sqrt[(±4 Wm–2)2´25] = ±20 Wm2. Following from eqn. 5.2, the 25-year per-step uncertainty is ±8.3 C. After 100 years the uncertainty in projected air temperature is sqrt[(±8.3)2´4)] = ±16.6 C.

Roy finished with, “I’d be glad to be proved wrong.

Be glad, Roy.

Appendix 1: Why CMIP5 error in TCF is important.

We know from Lauer and Hamilton that the average CMIP5 ±12.1% annual total cloud fraction (TCF) error produces an annual average ±4 Wm-2 calibration error in long wave cloud forcing. [14]

We also know that the annual average increase in CO₂ forcing since 1979 is about 0.035 Wm-2 (my calculation).

Assuming a linear relationship between cloud fraction error and LWCF error, the ±12.1% CF error is proportionately responsible for ±4 Wm-2 annual average LWCF error.

Then one can estimate the level of resolution necessary to reveal the annual average cloud fraction response to CO₂ forcing as:

[(0.035 Wm-2/±4 Wm-2)]*±12.1% total cloud fraction = 0.11% change in cloud fraction.

This indicates that a climate model needs to be able to accurately simulate a 0.11% feedback response in cloud fraction to barely resolve the annual impact of CO₂ emissions on the climate. If one wants accurate simulation, the model resolution should be ten times small than the effect to be resolved. That means 0.011% accuracy in simulating annual average TCF.

That is, the cloud feedback to a 0.035 Wm-2 annual CO₂ forcing needs to be known, and able to be simulated, to a resolution of 0.11% in TCF in order to minimally know how clouds respond to annual CO₂ forcing.

Here’s an alternative way to get at the same information. We know the total tropospheric cloud feedback effect is about -25 Wm-2. [15] This is the cumulative influence of 67% global cloud fraction.

The annual tropospheric CO₂ forcing is, again, about 0.035 Wm-2. The CF equivalent that produces this feedback energy flux is again linearly estimated as (0.035 Wm-2/25 Wm-2)*67% = 0.094%. That’s again bare-bones simulation. Accurate simulation requires ten times finer resolution, which is 0.0094% of average annual TCF.

Assuming the linear relations are reasonable, both methods indicate that the minimal model resolution needed to accurately simulate the annual cloud feedback response of the climate, to an annual 0.035 Wm-2 of CO₂ forcing, is about 0.1% CF.

To achieve that level of resolution, the model must accurately simulate cloud type, cloud distribution and cloud height, as well as precipitation and tropical thunderstorms.

This analysis illustrates the meaning of the annual average ±4 Wm-2 LWCF error. That error indicates the overall level of ignorance concerning cloud response and feedback.

The TCF ignorance is such that the annual average tropospheric thermal energy flux is never known to better than ±4 Wm-2. This is true whether forcing from CO₂ emissions is present or not.

This is true in an equilibrated base-state climate as well. Running a model for 500 projection years does not repair broken physics.

GCMs cannot simulate cloud response to 0.1% annual accuracy. It is not possible to simulate how clouds will respond to CO₂ forcing.

It is therefore not possible to simulate the effect of CO₂ emissions, if any, on air temperature.

As the model steps through the projection, our knowledge of the consequent global air temperature steadily diminishes because a GCM cannot accurately simulate the global cloud response to CO₂ forcing, and thus cloud feedback, at all for any step.

It is true in every step of a simulation. And it means that projection uncertainty compounds because every erroneous intermediate climate state is subjected to further simulation error.

This is why the uncertainty in projected air temperature increases so dramatically. The model is step-by-step walking away from initial value knowledge, further and further into ignorance.

On an annual average basis, the uncertainty in CF feedback is ±144 times larger than the perturbation to be resolved.

The CF response is so poorly known, that even the first simulation step enters terra incognita.

Appendix 2: On the Corruption and Dishonesty in Consensus Climatology

It is worth quoting Lindzen on the effects of a politicized science. [16]”A second aspect of politicization of discourse specifically involves scientific literature. Articles challenging the claim of alarming response to anthropogenic greenhouse gases are met with unusually quick rebuttals. These rebuttals are usually published as independent papers rather than as correspondence concerning the original articles, the latter being the usual practice. When the usual practice is used, then the response of the original author(s) is published side by side with the critique. However, in the present situation, such responses are delayed by as much as a year. In my experience, criticisms do not reflect a good understanding of the original work. When the original authors’ responses finally appear, they are accompanied by another rebuttal that generally ignores the responses but repeats the criticism. This is clearly not a process conducive to scientific progress, but it is not clear that progress is what is desired. Rather, the mere existence of criticism entitles the environmental press to refer to the original result as ‘discredited,’ while the long delay of the response by the original authors permits these responses to be totally ignored.

A final aspect of politicization is the explicit intimidation of scientists. Intimidation has mostly, but not exclusively, been used against those questioning alarmism. Victims of such intimidation generally remain silent. Congressional hearings have been used to pressure scientists who question the ‘consensus’. Scientists who views question alarm are pitted against carefully selected opponents. The clear intent is to discredit the ‘skeptical’ scientist from whom a ‘recantation’ is sought.“[7]

Richard Lindzen’s extraordinary account of the jungle of dishonesty that is consensus climatology is required reading. None of the academics he names as participants in chicanery deserve continued employment as scientists. [16]

If one tracks his comments from the earliest days to near the present, his growing disenfranchisement becomes painful and obvious.[4-7, 16, 17] His “Climate Science: Is it Currently Designed to Answer Questions?” is worth reading in its entirety.

References:

[1] Jiang, J.H., et al., Evaluation of cloud and water vapor simulations in CMIP5 climate models using NASA “A-Train” satellite observations. J. Geophys. Res., 2012. 117(D14): p. D14105.

[2] Kiehl, J.T., Twentieth century climate model response and climate sensitivity. Geophys. Res. Lett., 2007. 34(22): p. L22710.

[3] Stephens, G.L., Cloud Feedbacks in the Climate System: A Critical Review. J. Climate, 2005. 18(2): p. 237-273.

[4] Lindzen, R.S. (2001) Testimony of Richard S. Lindzen before the Senate Environment and Public Works Committee on 2 May 2001. URL: http://www-eaps.mit.edu/faculty/lindzen/Testimony/Senate2001.pdf Date Accessed:

[5] Lindzen, R., Some Coolness Concerning Warming. BAMS, 1990. 71(3): p. 288-299.

[6] Lindzen, R.S. (1998) Review of Laboratory Earth: The Planetary Gamble We Can’t Afford to Lose by Stephen H. Schneider (New York: Basic Books, 1997) 174 pages. Regulation, 5 URL: https://www.cato.org/sites/cato.org/files/serials/files/regulation/1998/4/read2-98.pdf Date Accessed: 12 October 2019.

[7] Lindzen, R.S., Is there a basis for global warming alarm?, in Global Warming: Looking Beyond Kyoto, E. Zedillo ed, 2006 in Press The full text is available at: https://ycsg.yale.edu/assets/downloads/kyoto/LindzenYaleMtg.pdf Last accessed: 12 October 2019, Yale University: New Haven.

[8] Saitoh, T.S. and S. Wakashima, An efficient time-space numerical solver for global warming, in Energy Conversion Engineering Conference and Exhibit (IECEC) 35th Intersociety, 2000, IECEC: Las Vegas, pp. 1026-1031.

[9] Bevington, P.R. and D.K. Robinson, Data Reduction and Error Analysis for the Physical Sciences. 3rd ed. 2003, Boston: McGraw-Hill.

[10] Brown, K.K., et al., Evaluation of correlated bias approximations in experimental uncertainty analysis. AIAA Journal, 1996. 34(5): p. 1013-1018.

[11] Perrin, C.L., Mathematics for chemists. 1970, New York, NY: Wiley-Interscience. 453.

[12] Roy, C.J. and W.L. Oberkampf, A comprehensive framework for verification, validation, and uncertainty quantification in scientific computing. Comput. Methods Appl. Mech. Engineer., 2011. 200(25-28): p. 2131-2144.

[13] Rowlands, D.J., et al., Broad range of 2050 warming from an observationally constrained large climate model ensemble. Nature Geosci, 2012. 5(4): p. 256-260.

[14] Lauer, A. and K. Hamilton, Simulating Clouds with Global Climate Models: A Comparison of CMIP5 Results with CMIP3 and Satellite Data. J. Climate, 2013. 26(11): p. 3823-3845.

[15] Hartmann, D.L., M.E. Ockert-Bell, and M.L. Michelsen, The Effect of Cloud Type on Earth’s Energy Balance: Global Analysis. J. Climate, 1992. 5(11): p. 1281-1304.

[16] Lindzen, R.S., Climate Science: Is it Currently Designed to Answer Questions?, in Program in Atmospheres, Oceans and Climate. Massachusetts Institute of Technology (MIT) and Global Research, 2009, Global Research Centre for Research on Globalization: Boston, MA.

[17] Lindzen, R.S., Can increasing carbon dioxide cause climate change? Proc. Nat. Acad. Sci., USA, 1997. 94(p. 8335-8342.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
528 Comments
Inline Feedbacks
View all comments
October 23, 2019 6:18 am

Hi Pat, it’s me again, thought you’d be pleased 🙂

Before writing anything substantive, I’ll ask a question about your statement:

“Off-setting errors, that’s how. GCMs are required to have TOA balance. So, parameters are adjusted within their uncertainty bounds so as to obtain that result.”

I think this may be close to the crux of the problem, so I want to understand your take on it. Let us assume, for the sake of argument, that there is one parameter ‘p’, whose adjustment gives the required TOA balance. Now, the GCM simulates across many years. Did p get adjusted at the start of the run, or on every year of the run? If the former, then how can its balancing act be effective at the end of the run, and if the latter then what mechanism in the GCM actually does that?

Rich.

Reply to  See - owe to Rich
October 23, 2019 7:11 pm

I can’t say how they do it, Rich, and it doesn’t matter to the uncertainty analysis.

October 23, 2019 6:27 am

This comment will have limited interest to most readers, and is addressed to kribaez.

kribaez, when I finally got round to studying properly your Oct8 5:42am comment on the previous thread, I found a quicker way than yours to prove non-zero covariance in your Problem 4 (you started, without proof, that ΔXi’s have lag-1 autocorrelation -0.5). It is:

Var(X_i) = 4
Var(ΔX_i) = Var(X_i-X_{i-1}) = Var(X_i) + Var(X_{i-1}) = 8
4 = Var(X_i) = Var(X_{i-1}+ΔX_i) = Var(X_{i-1}) + Var(ΔX_i) + 2Cov(X_{i-1},ΔX_i) = 12 + 2Cov(X_{i-1},ΔX_i)
Cov(X_{i-1},ΔX_i) = -4
Cor(X_{i-1},ΔX_i) = -4/sqrt(4*8) = -1/sqrt(2) = -0.7071

So the proper error propagation all comes down to how the models, in approximating reality, derive T_i from T_{i-1}. I previously wrote an equation

M(t) = a M(t-1) + B(t;z,s)

Pat has a = 1, and your Problem 4 has a = 0, which Pat can reasonably argue is hugely different. In a future comment I hope to explore intermediate values of a.

Reply to  See - owe to Rich
October 23, 2019 7:23 pm

Rich, “So the proper error propagation all comes down to how the models, in approximating reality, derive T_i from T_{i-1}. I previously wrote an equation …

Models do not derive T_i from T_{i-1}. They derive all T’s from the forcing inputs. Any final T_n is a linear sum of T_0 + sum over[(i=1->n)ΔT_i], where each ΔT_i is derived from its ΔF_i.

However models do it, T_0 -> T_n comes out as a linear extrapolation of fractional change in inputted forcing. Whatever models do inside is irrelevant. The outputs are linear with inputs. Proper uncertainty analysis does not require knowing more than that.

David Steele
October 24, 2019 10:39 am

I am absolutely not knowledgeable but I greatly appreciate your patience and your striving for clarity.

Ulises
Reply to  David Steele
October 25, 2019 2:54 am

David,

how, with your stated level of background, can you then assess clarity ?

Reply to  Ulises
October 25, 2019 7:17 am

It might be a bit like art: “I know what I like when I see it”.

Am I allowed to believe that his appreciation is directed towards me? He could be referring to Pat, as his 7:23pm comment falls between mine and David’s.

October 25, 2019 7:12 am

I have something very important (I believe) to say about model emulators and their error propagation, but first I want to agree with kribaez where he writes “sampled error from the input space yields the uncertainty spread in the output space…there is no magic uncertainty which is not rendered visible by M[onte]C[arlo] sampling”. Tim Gorman keeps banging on about uncertainty being a number which can therefore have no covariance with anything, but as I wrote on the previous thread with various examples to justify, uncertainty is properly a distribution of a random variable which sometimes but not always describes an “error”, and the spread, or variance, or standard deviation, of the uncertainty is sometimes taken, as in the +/-u notation, to be the “uncertainty” itself. But this is to throw away information, because having done that different u’s cannot be properly compounded except by magic.

However, in this comment, I am not going to dwell on covariance, and for now I am going to give Pat Frank a free pass on that. Instead I am going to dwell on the nature, value, and fidelity, of any climate model emulator, and introduce a new one to you for comparison. Pat has written “Models do not derive T_i from T_{i-1}. They derive all T’s from the forcing inputs.”, and while that is true for GCMs, Pat’s own emulator effectively does derive T_i from T_{i-1}.

His emulator can fairly be written, I believe, as

(1): T(t) = T(t-1) + b(f(t)-f(t-1)) + U(t)

where T(t) is a good emulator of the mean of an ensemble of GCMs, b is a constant, f(t) is the total and known anomaly in GHG forcing at time t, and U(t) is an error term. Though setting U(t) = 0 gives a good fit overall, T(t) does not then exactly match the ensemble mean, so U(t) is a necessary correcting error term. Pat conflates uncertainty distribution with uncertainty value and therefore writes +/-u_t in place of U(t), but this notational difference is not problematic. I shall assume that U(t) has zero mean a variance of s^2 independent of t (and Pat has derived credible estimates of s from the +/-4 W/m^2 cloud forcing errors). (1) then implies that:

T(t) = sum_0^{t-1} U(t-i) + T(0) + b(f(t)-f(0))

We can choose our anomaly baselines so that T(0)=0 and f(0)=0, so

(2): T(t) = sum_0^{t-1} U(t-i) + b f(t)

Now, under the assumption of no covariance between different U(j)’s we can derive

(3): E[T(t)] = b f(t), Var[T(t)] = sum_0^{t-1} Var[U(t-i)] = ts^2

Later on I’ll use the simplification f(t) = td for some constant d, with the implication E[T(t)] = bdt. I don’t believe that Pat should have a problem with the above, as it verifies his results under the given assumptions.

Now I present to you a new emulator:

(4): X(t) = (1-a)X(t-1) + c f(t) + G(t)

What is the purpose of this? Why can’t I just let Pat’s emulator be? Well, suppose in some sense it is a better emulator: wouldn’t we want to use it instead?

Comparing (4) with (1), my X, c, G replace T, b, U respectively, merely so that we know which model we are referring to. ‘a’ is a new number between 0 and 1, and it reflects the saying “today’s warmth is due to today’s sunshine, not yesterday’s”. (OK, I just made that one up.) The point is that temperatures don’t really add together directly. Radiative forcing influences temperature, temperature influences storage of heat, and storage of heat influences radiative forcing. So some of yesterday’s sunshine gets retained in the earth or the sea, and some of that may be returned to augment sensible temperature today. But if the sun ain’t out today, yesterday’s heat only helps a little.

My G will generalize U by allowing a non-zero mean z as well as variance s^2. Now, assuming X(0) = 0,

(5): X(t) = sum_0^{t-1} (1-a)^i (cf(t-i) + G(t-i))

Then with the simplifying f(t) = dt and some algebra it can be shown that:

(6): E[X(t)] = cd(at + a – 1 + (1-a)^(t+1))/a^2 + z(1 – (1-a)^t)/a,
Var[X(t)] = s^2(1-(1-a)^(2t))/(2a-a^2)

Assuming that a > 0, the following asymptotics hold as t increases:

(7): E[X(t)] = cdt/a + O(1), Var[X(t)] = s^2/(2a-a^2)

If we choose c = ab then we get:

(8): E[X(t)] = bdt + O(1)

and that is identical to E[T(t)] = bdt derived below (3), apart from the O(1) term.

How can we tell these two fine emulators apart, since they both match the GCM ensemble mean very well? The answer is the variance, which when square rooted gives the standard deviation alias “uncertainty bound”. T(t) has s.d. s sqrt(t), X(t) has s.d. tending upwards to the limit s/sqrt(2a-a^2).

An emulator is of no value unless, as well as fitting GCM runs for the past, it reasonably predicts the spread from running them into the future – that, after all, is surely what an uncertainty spread means? Future running of GCMs into the future, or past archived results of running GCMs into the future, would surely settle the question of whether T(t) or X(t), or neither, is a faithful emulator of GCMs. If climate science thinks this is an important question, then I’m sure that amongst the billions spent on it some computer time could be devoted to answering this.

To recap, the emulator (by taking c = ab in (4)) which is

(9): X(t) = (1-a)X(t-1) + ab f(t) + G(t)

has a better physical justification than Pat’s (1), emulates model temperatures almost identically, and has a much smaller “uncertainty bound”.

Reply to  See - owe to Rich
October 25, 2019 8:19 am

Rich,

“uncertainty is properly a distribution of a random variable which sometimes but not always describes an “error”,”

Sorry, this is just plain wrong. You still have not bothered to read the JCGM all the way through, have you?

From the JCGM:

“6.1.2 Although uc(y) can be universally used to express the uncertainty of a measurement result, in some commercial, industrial, and regulatory applications, and when health and safety are concerned, it is often necessary to give a measure of uncertainty that defines an interval about the measurement result that may be expected to encompass a large fraction of the distribution of values that could reasonably be attributed to the measurand. The existence of this requirement was recognized by the Working Group and led to paragraph 5 of Recommendation INC‑1 (1980). It is also reflected in Recommendation 1 (CI‑1986) of the CIPM.”

“6.2.1 The additional measure of uncertainty that meets the requirement of providing an interval of the kind indicated in 6.1.2 is termed expanded uncertainty and is denoted by U. The expanded uncertainty U is obtained by multiplying the combined standard uncertainty uc(y) by a coverage factor k:

U = ku_c(y) (18)

The result of a measurement is then conveniently expressed as Y = y ± U, which is interpreted to mean that the best estimate of the value attributable to the measurand Y is y, and that y − U to y + U is an interval that may be expected to encompass a large fraction of the distribution of values that could reasonably be attributed to Y. Such an interval is also expressed as y − U ≤ Y ≤ y + U.”

Please note carefully that the guide is talking about an INTERVAL and not a random variable with a mean and a standard deviation.

“the spread, or variance, or standard deviation, of the uncertainty is sometimes taken, as in the +/-u notation, to be the “uncertainty” itself. But this is to throw away information, because having done that different u’s cannot be properly compounded except by magic.”

Nothing is being thrown away. See the JCGM. And of course uncertainty can be compounded in an iterative process. You claim they can’t but you give no proof.

From the University of North Carolina physics department “Introduction to Measurements and Error Analysis”: “Note that the relative uncertainty in f, as shown in (b) and (c) above, has the same form for
multiplication and division: the relative uncertainty in a product or quotient is the square root of the
sum of the squares of the relative uncertainty of each individual term, as long as the terms are not
correlated. ”

Pat also referenced Bevington and Robinson, 2003 in his statement “The final change in projected air temperature is just a linear sum of the linear projections of intermediate temperature changes. Following from equation 4, the uncertainty “u” in a sum is just the root-sum-square of the uncertainties in the variables summed together, i.e., for c = a + b + d + … + z, then the uncertainty in c is ±uc=sqrt(ua^2+ub^2+ud^2+…+uz^2) (Bevington and Robinson, 2003). The linearity that completely describes air temperature projections justifies the linear propagation of error. Thus, the uncertainty in a final projected air temperature is the root-sum-square of the uncertainties in the summed intermediate air temperatures.”

“My G will generalize U by allowing a non-zero mean z as well as variance s^2. Now, assuming X(0) = 0,”

You are still falling into the trap of trying to make an uncertainty interval into a random variable with a probability function, a mean, and a standard deviation. The uncertainty interval is *NOT* a random variable, it has no probability function, nor does it have a mean and a standard deviation. If it *did* have these then based on the central limit theorem you could predict the most accurate result from the model.

Would it help if we started talking about a confidence interval instead of an uncertainty interval?

Ulises
Reply to  Tim Gorman
October 27, 2019 11:22 am

Tim, (October 25, 2019 at 8:19 am )
first of all thanks to refer to the JCGM document, such that we have something palpable to discuss about.

Tim:”Please note carefully that the guide is talking about an INTERVAL and not a random variable with a mean and a standard deviation.”

Please read carefully: “6.1.2 Although uc(y) can be universally used to express the uncertainty of a measurement result,……..”
So there IS a distribution of the measurand “y”, combined (“c”) in the conventional way from parent distributions, and given as the standard deviation “u”. This is necessary and suffficent to describe a Normal Distribution.

Then it goes on there: “….in SOME commercial, industrial, and regulatory applications, and when health and safety are concerned, it is OFTEN necessary to give a measure of uncertainty that defines an interval about the measurement result that may be expected to encompass a large fraction of the distribution of values that could reasonably be attributed to the measurand. ” [uppercase mine U.]
In selected contexts, “some”, “often” : You see this not a universal recipe, as you would like to have it.
It is, in my view, a recommendation HOW TO PRESENT the analysis to the customer/public/decision takers, given in the precautionary sense that the thin ends of the tails of the distribution are not to be ignored in a careless attempt, while extreme events, as low as their probability of occurrence may be, can have severe consequences .

The cited following Section 6.2.1 in JCGM recommends to calculate the range as multiples of sd=u (sic! distribution!) and call it “expanded uncertainty”, and to give the range of it.
Nothing substantial, just presentation.

Rich (cited by Tim): “the spread, or variance, or standard deviation, of the uncertainty is sometimes taken, as in the +/-u notation, to be the “uncertainty” itself. But this is to throw away information, because having done that different u’s cannot be properly compounded except by magic.”

Tim :” Nothing is being thrown away.”
Right. Even if the recommendation is followed (not relevant in our context), the respective distribution is there, just deliberatly not presented to the public, and if not lost by accident, can be referenced or processed subsequently.

Tim : “And of course uncertainty can be compounded in an iterative process.”

Yes, compounded as summing the variances of the means being summed (distributions!). Note that it is the variance of the MEAN which is processed, i.e. u(mean)^2 = u^2/n, u^2 being the variance of the sample or experiment . The root of the sum then delivers a sd=u of the mean. No iteration needed.
(The above is the simplest case. Generally, covariances and weighting factors may have to be considered.)

Tim : “You are still falling into the trap of trying to make an uncertainty interval into a random variable with a probability function, a mean, and a standard deviation. The uncertainty interval is *NOT* a random variable, it has no probability function, nor does it have a mean and a standard deviation. ”

I hope I have shown that the opposite is true. There is nothing particular with the u-notation but for the naming, all uncertainty evaluations are identical with those you would do in the errror-deviation world. Do you think it is a wise concept to sum up variances, just to obtain an empty range for final result ? Could be more economically done with just ranges as input.

Tim : “If it *did* have these [mean and a standard deviation] then based on the central limit theorem you could predict the most accurate result from the model. ”

Yes, in principle, but call it precise !!! (Pat, as you know, has no respect for those who don’t get this distinction right! Wereas he himself is is rather relaxed in the usage)
And no, in practice it would be ridiculous to drive precision toward zero sd, apart from the costs, when there is lots of bias involved. I think they are still fairly busy now with the hiatus issue.

Tim : “Would it help if we started talking about a confidence interval instead of an uncertainty interval?”

The question goes to Rich, but my suggestion is that you try to get rid of this burnt-in reflex of “uncertainty is not error !” (I really wonder where you got it from). Take another pass over the JCGM, and you’ll see that it’s all about means and variances : distributions. And do we agree that a +/- k*sd is *not* a confidence interval according to the literature ? The CI is based on the sd of the mean, the one that shrinks with n.

Looking forward to your progress !
U.

Reply to  Ulises
October 27, 2019 1:22 pm

“So there IS a distribution of the measurand “y”, combined (“c”) in the conventional way from parent distributions, and given as the standard deviation “u”. This is necessary and suffficent to describe a Normal Distribution.”

You keep missing that this is talking about a MEASUREMENT and not the input to or the output of a mathematical model!

When the inputs to a mathematical model contain uncertainty, as Pat Frank has shown, the output then has uncertainty. The JCGM is useful in talking about some aspects of uncertainty in a mathematical model but it not definitive on that subject, it is aimed at something else. Look at the examples given in the document, not a single one speaks to determining the uncertainty of a mathematical model, but instead on how to measure a temperature or voltage.

“It is, in my view, a recommendation HOW TO PRESENT the analysis to the customer/public/decision takers, given in the precautionary sense that the thin ends of the tails of the distribution are not to be ignored in a careless attempt, while extreme events, as low as their probability of occurrence may be, can have severe consequences .”

Once again we see you conflating the concepts of a random variable with a probability distribution with the concepts of uncertainty. When I tell you the model inputs have an uncertainty of +/- 4 W/m^2 exactly what does that tell you about any supposed probability distribution of the output, e.g. the “tails of the distribution”?

The uncertainty interval tells give you a range in which the output might exist, it does not tell you the probability of each point in the interval which is what a probability distribution would do.

“u^2/n”

Uncertainty is not summed and then divided by n. It is a root-sum-square. No denominator. It is a vector addition of independent, orthogonal values, not some kind of convolution of probability distributions.

“Yes, in principle, but call it precise !!!”

This can be done with MEASUREMENTS, i.e. using a micrometer to measure the thickness of a sheet of media. It simply cannot be done with uncertainty because uncertainty is not a probability distribution.

“And no, in practice it would be ridiculous to drive precision toward zero sd, apart from the costs, when there is lots of bias involved. I think they are still fairly busy now with the hiatus issue.”

It is impractical because uncertainty isn’t subject to the central limit theorem since it isn’t a probability distribution. Please, *please* keep in mind the difference between taking the measurement of a voltage with a digital meter and determining the uncertainty interval for the output of a mathematical model. They are *not* the same.

“The question goes to Rich, but my suggestion is that you try to get rid of this burnt-in reflex of “uncertainty is not error !” (I really wonder where you got it from). ”

It is a burnt-in reflex because it is based on reality. I got it from reality. I got it from designing fish plates to connect steel girders. I can measure the length of the girders down to a gnat’s behind using the techniques in the JCGM. But when mixing different runs of girders, each of which I can measure down to the gnat’s behind, in an iterative span of any specific number of girders then the connecting fishplates better designed in such a manner as to be able to handle the uncertainty of length that mix of girders will provide. There *is* a difference between measurement precision and outcome uncertainty. In the physical world this becomes quite obvious very quickly.

“ake another pass over the JCGM, and you’ll see that it’s all about means and variances : distributions.”

MEANS AND VARIANCES OF MEASUREMENTS!. Not of uncertainty of the output of iterative runs of a mathematical model.

“And do we agree that a +/- k*sd is *not* a confidence interval according to the literature ? The CI is based on the sd of the mean, the one that shrinks with n.”

No, I do *not* agree. When I see the output of a mathematical model expressed as X degC +/- u degC I see a confidence interval which tells me where the true value might lie. That interval has no probability distribution, no mean, no standard deviation and therefore no n. The output of the model is *not* a measurement whose error can be driven to zero using the central limit theorem. If the central limit theorem doesn’t apply then it is not a probability distribution.

What that uncertainty interval tells me is that when they speak of the model being able to resolve differences over a number of iterations where the differences are smaller than the uncertainty interval then someone is blowing smoke up your butt! A model trying to resolve 0.1 degC differences with an input of +/- 1 degC uncertainty and an uncertainty in the output that is greater than the input uncertainty is a joke.

Reply to  See - owe to Rich
October 25, 2019 11:36 am

Rich,
I think my proof in the appendix here is relevant. You can construct a differential equation of arbitrary uncertainty propagation characteristic which includes any prescribed solution, which could be your past values.

Reply to  Nick Stokes
October 25, 2019 8:30 pm

An appendix that angech showed is misleading here, wherein he finished with this congratulatory statement, “Well done.
Particularly all the speil while swapping the peas.

Also here, where he concluded, regarding your effort, Nick, “Well deflected.
All I can do is to point out your inconsistencies.

Reply to  See - owe to Rich
October 25, 2019 8:22 pm

Rich, “I want to agree with kribaez where he writes “sampled error from the input space yields the uncertainty spread in the output space…there is no magic uncertainty which is not rendered visible by M[onte]C[arlo] sampling”.

He’s wrong, and so are you, Rich. Tim Gorman refuted kribaez here, where he wrote, “Again, the MC analysis can only tell you the sensitivity of the model to changes in the input. If I put in A and get out B and then put in C and get out D then each of the outputs, B and D, *still* have an uncertainty associated with each.

The uncertainty derived from a calibration experiment is entirely independent from the model variation due to variations in inputs.

I also refuted the idea, Here, writing, “[The input space] metric gives us no more than the output coherence of models forced into calibration similarity. That has nothing whatever to do with model predictive accuracy.

Also here, “The method you’re describing is not uncertainty propagation. It is merely variation about an ensemble mean. It’s not even error, because no one has any idea where the correct value lays.

Honestly, I think it’s a bit disingenuous of you to proceed as though kribaez’ view had been unexamined.

You wrote, “uncertainty is properly a distribution of a random variable” Not when it’s derived from an empirical calibration error. You continue to treat error in science with the closed form ideas of statistics. They are useful only as a guide, not as a bound.

I’m going to paraphrase Einstein to try to get the point across. “Statistics without contact with science becomes an empty scheme. Science without statistics is—insofar as it is thinkable at all—primitive and muddled. However, no sooner has the statistician, who is seeking a clear system, fought his way through to such a system, than he is inclined to interpret the thought-content of science in the sense of his system and to reject whatever does not fit into his system. The scientist, however, cannot afford to carry his striving for statistical systematic that far. He accepts gratefully the statistician’s conceptual analysis; but the external conditions, which are set for him by the facts of experience, do not permit him to let himself be too much restricted in the construction of his conceptual world by the adherence to a statistical system. He therefore must appear to the systematic statistician as a type of unscrupulous opportunist…

Your continued strict recourse to statistical ideas is inappropriate, Rich. They limit your thinking.

Physical science deals with a messy physical world; a world that is much more messy than statistics allows. Approximations and estimates are central to success in science.

Calibration uncertainty is not a random variable. Statistical methods are used to determine calibration error, but the structure of the error itself violates statistical axioms.

Physical scientists don’t care about that violation because the uncertainty estimate is useful, indeed central, to an appraisal of predictive reliability.

You wrote, “ I am going to give Pat Frank a free pass on [covariance]

You’re not giving me a free pass on anything. There is no covariance in a constant calibration uncertainty. It doesn’t vary.

Yo wrote, “His emulator can fairly be written, I believe, as

(1): T(t) = T(t-1) + b(f(t)-f(t-1)) + U(t)

But my emulator is eqn. 1, and eqn. 1 has no error term, Rich.

Let’s compare your equation with eqn. 1: ΔT = f_CO2 x 33K x {[F_0+(sum over ΔF_i)]/F_0}.

That’s nothing like your equation. No uncertainty term. No T on the right side at all, except the greenhouse 33 K.

Let’s look at an individual emulation step, added to an intermediate term of step “i”:

T(t) = T(i-1)+ {f_CO2 x 33K x [F_0+ΔF_(1->(i-1))+ΔF_i]/F_0}.

That’s nothing like what you wrote. Your b(f(t)-f(t-1)) is nothing like {f_CO2 x 33K x [F_0+ΔF_(1->(i-1))+ΔF_i]/F_0}

Why you think yours is “fairly written is anyone’s guess, when mere inspection shows that it is not.

Now, let’s compare your eqn with eqn. 5, which actually does include an uncertainty term (positionally equivalent to, but not identical with your U(t)).

ΔT_i ±u_i = [f_CO2 x 33K x (F_0+ΔF_i)/F_0] ±[( f_CO2 x 33 K 4 W/m^2)/F_0] .

Eqn. 5 looks like eqn. 1, doesn’t it, except for the addition of the uncertainty due to model calibration uncertainty.

So, eqn. 5 doesn’t look like your equation, either.

Once again, you’re imposing a false model onto my work, yet again arguing from a straw man position.

You wrote, “ Though setting U(t) = 0 gives a good fit overall, T(t) does not then exactly match the ensemble mean, so U(t) is a necessary correcting error term.

Wrong analogy, then, Rich.

If your (1) does not fit an ensemble mean without your U(t), then it not only does not do as well as paper eqn. 1, but it also requires a term that has no counterpart in paper eqn. 1.

Eqn. 1 will nicely match any ensemble mean because f_CO2 varies with the individual projection. It will be different for an ensemble mean relative to its value for a single projection run.

Take a look at paper Figure 7: very nice emulations of the CMIP5 ensemble mean. With no uncertainty term to correct the emulation.

If one does a “perturbed physics” series using a single model, eqn. 1 can fit every single one of them.

And uncertainty, your U(t), makes no contribution to any of the emulations of paper eqn. 1.

You wrote, “Pat conflates uncertainty distribution with uncertainty value …

A calibration uncertainty interval is not a distribution. It’s an empirical value. Yet another example of you continuing to impose your incorrect meanings onto my work.

You wrote, “ I shall assume that U(t) has zero mean …

An assumption without any relevance to an uncertainty from empirical calibration error. You’re just imposing assumptions that necessarily lead to your desired conclusion, Rich. That’s called a tendentious argument.

You wrote, “it reasonably predicts the spread from running them into the future – that, after all, is surely what an uncertainty spread means?

No, it surely doesn’t mean that. The uncertainty interval is a spread within which the correct value somewhere lays (though we do not know where and the interval mean is not the most probable value).

Once again, and like so many others, you’ve mistakenly supposed the uncertainty interval is equal to model projection spread — the spread of predicted outcomes.

You wrote, “ [Your emulator] has a better physical justification than Pat’s (1), emulates model temperatures almost identically, and has a much smaller “uncertainty bound”

Your “abf(t)” employs the same forcing terms, which provides no better physical justification.

Your “G(t)” is an assumption — your invention, really — and has no physical justification at all. The ±4 W/m^2 is a known calibration uncertainty directly derived from GCM simulations.

That means the uncertainty bound calculated by propagating that calibration uncertainty is a physically true conditional of GCM air temperature projections.

Your argument is wrong throughout.

Reply to  Pat Frank
October 26, 2019 10:08 am

Pat, you have given a pretty full reply which will take me a little time to digest (I was busy today so far). But there is one thing I can ask you about immediately, to do with your “Once again, and like so many others, you’ve mistakenly supposed the uncertainty interval is equal to model projection spread — the spread of predicted outcomes.”

Now, if the model projection spread can include randomization of inputs within the confines of limits on errors in its parameters (calibration errors I suppose), are you still saying that at the far end your uncertainty interval does not relate to the model projection spread? If so, I am struggling to understand any useful meaning of your paper. So if you could clarify this point that would certainly help. If you could write some maths by way of example, that would help even more, because as you can see, I am trying to understand how the maths fits together to support your conclusions.

Reply to  See - owe to Rich
October 26, 2019 12:39 pm

Rich, “are you still saying that at the far end your uncertainty interval does not relate to the model projection spread?

Yes.

In fact, every single one of the uncertainty intervals all along the projection, does not represent model air temperature projection spread at that point.

Instead, each interval represents the width of ignorance about the value (the position within the interval) of the physically correct temperature. The correct value is lost within that interval.

If you could write some maths by way of example, that would help even more, because as you can see, I am trying to understand how the maths fits together to support your conclusions.

Look at the papers extracted here.

They will give you the analytical approach to uncertainty, and its meaning.

My conclusion cannot be understood by strict reference to mathematics, Rich.

Physics is not about math. It’s about causality. It’s about objective knowledge about what we have observed. That means physical sciences must have a way to represent residual ignorance, so as to condition their conclusions.

That’s what an uncertainty analysis does. It provides an ignorance interval. One does not know, within that interval, where the physically correct answer lays.

Instrumental resolution is an example of such uncertainty. A claimed measurement magnitude that is smaller than the given instrument can resolve has no physical meaning.

If a classical liquid-in-glass (LiG) thermometer resolution is (+/-)0.25 C, it can produce no data more accurate than that. No data means, that interval is the pixel size. Everything inside is a uniform blur. A temperature reading taken from that thermometer and written as, e.g., 25.1 C, is meaningless past the decimal.

In terms of practicalities, that (+/-)0.25 C acknowledges that the thermometer capillary is not of uniform width; that the inner surface of the glass is not perfectly smooth and uniform; that the liquid inside is not of constant purity; that the entire thermometer body is not at constant temperature.

All these things are uncontrolled variables and add up to produce errors of unknown size in a measurement. They vary with the thermometer, and with the age of a thermometer. Which is why thermometers must be periodically re-calibrated.

Even if one can visually estimate the distance between the inscribed lines to (+/-)0.1 C, the reading has no meaning because the position of the liquid is not at the correct spot for the external temperature.

Lin and Hubbard have discussed analogous sources of error and resolution limits in modern electronic thermometers: (2004) Sensor and Electronic Biases/Errors in Air Temperature Measurements in Common Weather Station Networks J. Atmos Ocean Technol. 21, 1025-1032.

All physical scientists deal with this — resolution limits, errors, and uncertainty — as a matter of course in their work. Uncertainty can be expressed using statistics, but is outside the realm where statistical mathematics applies exactly.

One can’t deal with uncertainty from a purely statistics perspective. One must approach uncertainty by way of empirical calibration experiments. And then the uncertainty interval violates all the rules of statistical inference.

Uncertainty does not represent random error, does not include iid values, ideas of stationarity do not apply. Uncertainty has no distribution and its mean is not the most probable physical value.

If you like, take a look at Rukhin (2009) Weighted means statistics in interlaboratory studies Metrologia 46, 323-331; doi: 10.1088/0026-1394/46/3/021.

Section 6 discusses Type B (systematic) errors. Along the way, Rukhin observes that if the type B error does not have a mean of zero, “ then all weighted means statistics become biased, and [the measurement mean] itself cannot be estimated.” That is exactly the case with a calibration uncertainty interval. Even worse, the interval mean has no discrete physical significance at all.

Rukhin’s interlaboratory analysis has real-world significance. Geoff Sherrington will tell you all about the terrifying outcome from tests of interlaboratory coherence of analytical results, when the moon rocks were being analyzed.

Your approach to the problem from inside statistics is inappropriate, Rich. It won’t lead you anywhere useful.

Reply to  Pat Frank
October 27, 2019 3:47 am

Pat, I have just read this and will make a very quick reply and then think harder. I see now that you are worrying about the accuracy of the models rather than (or as well as) their precision, and to be fair I believe you made a comment along those lines to kribaez, but I hadn’t noticed this strongly in your paper so I had in fact ignored that. Nevertheless, when I was formulating my model Equation (4) above, I had considered including a “reality” term, but rejected that; I can now reconsider.

But I should like to return to my original question, and ask you to answer how you would compare my emulator based on (4) with your emulator based on (1) – you have been discarding the error terms (U(t) for (1)) for the purpose of your emulator but then effectively reintroducing it later when considering uncertainty (e.g. equation 6 in your paper). This may possibly be justifiable. But in any case, I believe that if I treat my emulator in the same way as you have yours, I come up with a lower uncertainty interval. Do you accept that, and how would you distinguish between your emulator and mine?

I am going to continue to press mathematical models and statistics to the limit, and if in the end I have to give up I can always fall back, perhaps unfairly, on your intriguing quote from Einstein: “He therefore must appear to the systematic statistician as a type of unscrupulous opportunist…”.

Reply to  Pat Frank
October 27, 2019 12:43 pm

Further reply to Pat Oct26 12:39pm

Pat, here is a more considered reply, prepending your comments with P: and mine with R:.

P: Instead, each interval represents the width of ignorance about the value (the position within the interval) of the physically correct temperature. The correct value is lost within that interval.

R: I am happy with that, apart from a detail which will become apparent further down.

P(R): “If you could write some maths by way of example, that would help even more, because as you can see, I am trying to understand how the maths fits together to support your conclusions.”

Look at the papers extracted here.
They will give you the analytical approach to uncertainty, and its meaning.

R: I’m quite comfortable with that approach, because for example “X=X_i(measured) (+/-)dX_i (20:1)” clearly shows they are using statistical theory under the bonnet, where the 20:1 is defined as a probability that the “true” value will be in the interval, and there is reference to that being “2-sigma” which shows that the underlying distribution is normal (approximately).

P: My conclusion cannot be understood by strict reference to mathematics, Rich.

R: Then that’s sad – I don’t understand how you can expect any credibility in that case. I don’t care what Einstein may have said about statistics, because when it came to the crunch he was always very careful with his mathematics, to the point of learning new stuff like manifold theory.

P: Physics is not about math. It’s about causality. It’s about objective knowledge about what we have observed. That means physical sciences must have a way to represent residual ignorance, so as to condition their conclusions.

R: Yes, but that representation is through mathematics, which includes the possibilities that “uncertainties” are correlated. The null hypothesis is that they are not, and it may be that in your case you may have demonstrated it in your paper, or it may be true, or both. In any case I am not concerned about that right now.

P: That’s what an uncertainty analysis does. It provides an ignorance interval. One does not know, within that interval, where the physically correct answer lays.
Instrumental resolution is an example of such uncertainty. A claimed measurement magnitude that is smaller than the given instrument can resolve has no physical meaning.
If a classical liquid-in-glass (LiG) thermometer resolution is (+/-)0.25 C, it can produce no data more accurate than that. No data means, that interval is the pixel size. Everything inside is a uniform blur. A temperature reading taken from that thermometer and written as, e.g., 25.1 C, is meaningless past the decimal.
In terms of practicalities, that (+/-)0.25 C acknowledges that the thermometer capillary is not of uniform width; that the inner surface of the glass is not perfectly smooth and uniform; that the liquid inside is not of constant purity; that the entire thermometer body is not at constant temperature.
All these things are uncontrolled variables and add up to produce errors of unknown size in a measurement. They vary with the thermometer, and with the age of a thermometer. Which is why thermometers must be periodically re-calibrated.

R: The fact that many errors add together to make up the total “uncertainty” is precisely why your statement “everything inside is a uniform blur” is false, because the sum of the errors is well approximated by a normal distribution. And that is why a +/-u (20:1) uncertainty is quoted. Not only is the true value not uniform inside the interval, it is not even guaranteed to be inside that interval (there’s a 5% chance it’s outside). This got discussed on the previous thread with Tim Gorman and his 12+/-1” rulers. To measure 10 feet I proposed using 10 independent rulers, and the uncertainty was then not +/-10” but a much smaller value.

P: Even if one can visually estimate the distance between the inscribed lines to (+/-)0.1 C, the reading has no meaning because the position of the liquid is not at the correct spot for the external temperature.

R: This (“no meaning”) is not true because the uncertainty from the visual estimation has to get added to the instrumental uncertainty, and sqrt(0.25^2+0.1^2) is smaller than sqrt(0.25^2+0.25^2). Nevertheless it shows that it is futile to attempt too great a visual resolution, because the advantage rapidly diminishes in the +/-sqrt(0.25^2+e^2) as e is reduced.

P: Lin and Hubbard have discussed analogous sources of error and resolution limits in modern electronic thermometers: (2004) Sensor and Electronic Biases/Errors in Air Temperature Measurements in Common Weather Station Networks J. Atmos Ocean Technol. 21, 1025-1032.
All physical scientists deal with this — resolution limits, errors, and uncertainty — as a matter of course in their work. Uncertainty can be expressed using statistics, but is outside the realm where statistical mathematics applies exactly.

R: I agree that the mathematics cannot be applied exactly, because there is uncertainty in the uncertainties (e.g. how close to normal is the actual distribution of relevance). But the theory is generally GEFGU (Good Enough For Government Use). It saves people money.

P: One can’t deal with uncertainty from a purely statistics perspective. One must approach uncertainty by way of empirical calibration experiments. And then the uncertainty interval violates all the rules of statistical inference.

R: Please explain which rules of statistical inference it violates. The root-sum-of-squares rule fits in very nicely with statistical theory, provided that your next paragraph is contradicted.

P: Uncertainty does not represent random error, does not include iid values, ideas of stationarity do not apply. Uncertainty has no distribution and its mean is not the most probable physical value.

R: Tim Gorman gave some examples trying to support that thesis, but in every case I was able to demonstrate an underlying statistical model. For example I devised a scheme wherein I could, with sufficient purchasing power, test a manufacturer’s claim that their rulers were 12+/-x” (where x was 1 but could have been a different fixed number).

P: If you like, take a look at Rukhin (2009) Weighted means statistics in interlaboratory studies Metrologia 46, 323-331; doi: 10.1088/0026-1394/46/3/021.
Section 6 discusses Type B (systematic) errors. Along the way, Rukhin observes that if the type B error does not have a mean of zero, “ then all weighted means statistics become biased, and [the measurement mean] itself cannot be estimated.” That is exactly the case with a calibration uncertainty interval. Even worse, the interval mean has no discrete physical significance at all.

R: That conclusion depends on whether the mean was incorrectly assumed to be zero and whether any attempts were made to detect or estimate the bias through, as you say, interlaboratory analysis. A ruler manufacturer can (in theory) go to the NPL in Teddington, Middlesex (where my sister happens to live) to get a good estimate on the bias as well as mean error in his rulers. But certainly bad assumptions can lead to invalid results.

P: Rukhin’s interlaboratory analysis has real-world significance. Geoff Sherrington will tell you all about the terrifying outcome from tests of interlaboratory coherence of analytical results, when the moon rocks were being analyzed.
Your approach to the problem from inside statistics is inappropriate, Rich. It won’t lead you anywhere useful.

R: Well, we’ll see. I still think it already illuminated some features in the rulers problem, and my next posting will be on progress on emulator models. God willing – one must bear in mind the uncertainty of living to complete the work!

Reply to  See - owe to Rich
October 27, 2019 1:38 pm

“Tim Gorman gave some examples trying to support that thesis, but in every case I was able to demonstrate an underlying statistical model.”

Actually you didn’t Rich. If you take a number of girders from different manufactures you can measure the length between their connecting holes down to a gnat’s behind using statistical methods as described in the JCGM. And each of those girders will have small differences in their length, perhaps a small difference in girders from the same manufacturer but difference nonetheless.

When you design the fishplates to connect those girders together in an iterative process you better include an uncertainty factor to allow for the various lengths you know so precisely and for the mixing of those precisely measured girders of different lengths. No amount of statistics, calculation of means, and of standard deviations will help you in such a process. You can calculate what the uncertainty interval is and that is about all. That uncertainty interval will run from all short girders to all long girders and there is no amount of statistics that will help you design those fishplates any better than that uncertainty interval. They better be designed to handle anywhere in that uncertainty interval.

You simply never showed how statistics could help in such a case. It just went ignored.

Pat is correct. In physical engineering there are uncertainties that are not subject to statistics. They just *are* and you need to recognize what they are or you run into big trouble.

Reply to  Pat Frank
October 27, 2019 6:23 pm

Rich, I’m not going to concern myself with your emulator.

The LWCF uncertainty interval I use derives directly from the calibration of climate models against measured observables.

That calibration uncertainty defines a resolution lower limit of climate models.

The uncertainty does not enter the emulation at all.

Your definition of an uncertainty bound as something that, “reasonably predicts the spread from running [GCMs] into the future” is just an epidemiological variation in model output. A predictive pdf.

Your U, G is not a predictive uncertainty bound as understood in the physical sciences, unless one has a perfectly correct and complete physical theory. Which climate modelers do not. By far.

Any number of times, it’s been pointed out that yours is not the meaning of a predictive uncertainty bound derived from propagated calibration error.

And yet you continue to go back to it.

I don’t know how to say this except baldly, Rich, but every single one of your statistics based analyses has been thoroughly malapropos.

I wish you would stop trying to force your ideas into an arena where they most thoroughly do not belong. Your push is pure square-peg-round-hole-ism.

Reply to  Pat Frank
October 27, 2019 7:06 pm

Rich, clearly shows they are using statistical theory under the bonnet,

Statistical methods, Rich. Not necessarily statistical theory. When error is not known to be normal, we still calculate an SD and report it as an uncertainty. Even though it violates the underlying statistical assumptions.

That’s what Einstein’s “unscrupulous opportunist” means.

which includes the possibilities that “uncertainties” are correlated.

LWCF uncertainty is an unvarying constant, Rich. It doesn’t correlate with anything.

I don’t understand how you can expect any credibility in that case.

I used “strict reference,”, didn’t I. I’m not worried about credibility among statisticians who must worry about closed form niceties. I’m worried about knowing whether a result is reliable or not. That’s what “strict reference” means. It means I use the method because it tells me something I need to know — reliability — even though the use violates statistical assumptions. Unscrupulous opportunist that I am.

The fact that many errors add together to make up the total “uncertainty” is precisely why your statement “everything inside is a uniform blur” is false, because the sum of the errors is well approximated by a normal distribution.

You don’t know that is true. Tim Gorman has pointed out to you almost ad nauseam, that empirical uncertainty intervals have no known distribution. And here you ignore that.

It’s clear you do not understand the meaning of resolution. Resolution is the limit of measurable data or calculational accuracy. Everything smaller than that limit is a blur.

Models that have a resolution limit of (+/-)4 W/m^2 cannot resolve a smaller perturbation. They are blind to it. If I wanted to quote that resolution limit as a 20:1 statistic, and say the limit is (+/-)8 W/m^2, that would not imply a normal distribution. It would imply only that I am applying a stricter standard.

And that is why a +/-u (20:1) uncertainty is quoted

That is not why a (+/-)20:1 uncertainty is quoted. A (+/-)20:1 uncertainty is quoted because it is a useful measure of reliability, even when the error distribution is unknown or skewed.

Not only is the true value not uniform inside the interval, it is not even guaranteed to be inside that interval (there’s a 5% chance it’s outside).

When the resolution limit is an empirical uncertainty interval, the statistical probability does not apply. Given an empirical calibration SD, one cannot say there is a 5% chance the correct answer is outside 2-sigma. Such a statement is meaningless — because the error distribution is not known to be normal.

The SD is a calculation of convenience. It is not statistically rigorous.

This (“no meaning”) is not true

It is true, because the 0.1 C is physically meaningless, not merely uncertain. The thermometer is literally incapable of producing a reading to that accuracy.

Your criteria of judgement continue to be malappropriate, Rich.

I agree that the mathematics cannot be applied exactly, because there is uncertainty in the uncertainties (e.g. how close to normal is the actual distribution of relevance).

The mathematics cannot be applied exactly because the instrument produces erroneous readings for reasons rooted in uncontrolled and unknown variables. It’s not just unknown distributions, though that is always an issue. It’s unknown sources of error.

Why do you think high-precision, high-accuracy instruments are made, but not deployed in numbers, Rich? It’s because such instruments are extremely expensive. They are used to calibrate field instruments.

Field instruments are subject to field environments that are not predictable. Field calibrations show all sorts of strange error profiles that can vary in time and space.

Please explain which rules of statistical inference it violates. The root-sum-of-squares rule fits in very nicely with statistical theory,

RSS is invariably used. Including when the uncertainty interval has no knowable distribution. Scientists’ unscrupulous opportunism again.

We are interested in useful indications of reliability. RSS of empirical calibration SDs are used without any care whether the SD meets the criteria of statistical purity, or not.

test a manufacturer’s claim that their rulers were 12+/-x

And how would you know a priori that “x” is normally distributed? And, if so in your instance, stays normally distributed?

[Rukhin’s] conclusion depends on whether the mean was incorrectly assumed to be zero.

No, it depends on when the error is not stationary.

It’s around and around the same circle, Rich.

Reply to  Pat Frank
October 28, 2019 2:40 pm

“RSS of empirical calibration SDs are used without any care whether the SD meets the criteria of statistical purity, or not.”

RSS is an easily understood way to combine independent, orthogonal values. Since they are orthogonal they form a right triangle and their sum is hypotenuse = sqrt (a^2 + b^2).

It doesn’t require any statistical purity at all!

Reply to  Pat Frank
October 28, 2019 3:25 pm

Reply to Tim Gorman Oct27 1:38pm

Tim, I have just seen this. As you can see, I have been busy further downthread. So I may start to address your comment here, but this thread is moribund, so I shall wait to see if there is a new relevant thread in the near future. I think there may be some other comments of yours which I will also have to defer. In the meantime, best wishes.

October 25, 2019 11:19 am

Tim, I don’t want to spend too much more time on this aspect, as I am more interested in people’s thoughts on how to choose between competing emulator models.

Still, thanks for offering to use the term “confidence interval”, for then you have “fallen into the trap” of using statistical terminology. A confidence interval is part of the range of the distribution of a random variable such that if some parameter lies outside it could only have happened with some small given probability. So distributions and r.v.’s do come into play. As for what that JCGM is saying, I think they are paraphrasing for scientists and engineers in order to simplify matters which arise from deeper mathematical/statistical theory, and they do talk about correlation (or lack of it), which is a feature of joint probability distributions. I’d appreciate 3rd party insights on that.

Anyway, probably best to leave it at that, and thanks for your stimulating points, especially on rulers and your wife’s car in Topeka (previous thread for passers-by here).

Rich.

Reply to  See - owe to Rich
October 25, 2019 12:20 pm

“A confidence interval is part of the range of the distribution of a random variable such that if some parameter lies outside it could only have happened with some small given probability”

But it is *still* an interval and not a random variable nor is it a mean or standard deviation.

“So distributions and r.v.’s do come into play. ”

But not with the interval itself. The interval specifies no probability for any specific value. From the JCGM:
“The result of a measurement is then conveniently expressed as Y = y ± U, which is interpreted to mean that
the best estimate of the value attributable to the measurand Y is y, and that y − U to y + U is an interval that
may be expected to encompass a large fraction of the distribution of values that could reasonably be
attributed to Y. Such an interval is also expressed as y − U u Y u y + U. ”

“As for what that JCGM is saying,”

Most of what the JCGM talks about is MEASUREMENTS. Look at the title – “Guide to the expression
of uncertainty in measurement “. The examples in the document are about how to *measure* things and about the errors and uncertainty in those measurements. That is *not* what Pat’s thesis is about and it is not what the output of the CGM’s are about. What the climate alarmists need to begin paying attention to are the uncertainties associated with the inputs they use in their calculations – which is what Pat is addressing. If you read Pat’s thesis: “Propagation of error is a standard method used to estimate the uncertainty of a prediction, i.e., its reliability, when the physically true value of the predictand is unknown (Bevington and Robinson, 2003).”

Pat’s thesis is about the propagation of error and not about uncertainties in measurement. He uses the very statistical methods you speak of in order to determine the uncertainty +/- 4W/m^2. Once that is done then the propagation of that error comes into play in the iterative process of the CGM.

The point about the uncertainty interval not being a random variable itself and not having a mean or standard deviation comes into play when you propagate the uncertainty through multiple iterations. With independent, orthogonal uncertainty *intervals* they combine as root-sum-square, not root-mean-square. They don’t combine by convolving proability functions or anything else since they don’t have a probability function nor does the central limit theory work to eliminate the uncertainty. The uncertainty grows with each iteration.

This is also why Monte Carlo runs don’t work to generate the uncertainty interval for the final output. If the inputs are uncertain then the output *has* to be uncertain – meaning an independent run in a MC analysis can’t define the uncertainty interval. If Input A gives output B +/- u then it will *always* output the same relationship. An Input A + offset1 will always give B + offset2 +/- u. The MC simply cannot define uncertainty.

“they do talk about correlation (or lack of it), which is a feature of joint probability distributions.”

Again, uncertainty intervals don’t have a probability distribution – and neither does a standard deviation. Both are *values* and not probability distributions. You can’t convolve two standard deviation values any more than you can convolve uncertainties. The correlation the JCGM talks about are associated with how to deal with the measurement when two different probability distributions are involved.

Pat’s math is correct.

Reply to  See - owe to Rich
October 25, 2019 8:32 pm

Rich, “A confidence interval is part of the range of the distribution of a random variable…

Not when it is an empirical calibration error statistic.

Experimental physical science is not statistics, Rich. Somehow that realization invariably evades you.

Johann Wundersamer
October 25, 2019 5:45 pm

The question provokes an answer.

The question provokes a question.

http://www.ams.org/publicoutreach/feature-column/fcarc-tsp

October 27, 2019 3:14 pm

Moderator: I intend to post something here tomorrow. May I assume that this thread will be open for comments until sometime on October 29th?

Thanks,
Rich.

Reply to  See - owe to Rich
October 27, 2019 7:08 pm

Threads stay open 2 weeks, Rich. It’s a WordPress thing. CtM and Anthony have no control over it.

Your purely statistical approach is never, ever, going to cover the bases of an empirically based predictive uncertainty analysis, Rich.

Reply to  Pat Frank
October 30, 2019 7:37 pm

We do have control, but that’s the time period that has been chosen as our policy.

October 28, 2019 3:42 am

Below I have distilled the essentials of Pat Frank’s long and erudite paper, and my alternative emulator, into a dozen numbered paragraphs. I hope readers will find it useful to have the basic arguments summarized.

1. There exists a GASAT (Global Average Surface Air Temperature) which we wish to model, using values at past times to fit/calibrate the model, and future values which we wish to estimate and to know a probable value of the error of our estimate. A range of probable values may be called an “uncertainty interval”.

2. GCMs (Global Circulation Models) are a type of climate model favoured by the IPCC, and within those the CMIP5 models are an important subset.

3. The anomaly in radiative forcing due to GHGs (GreenHouse Gases) is assumed to be known to high accuracy in the past, and for the future a particular “scenario” is chosen to predict their forcing. At time t, f(t) denotes this value in W/m^2.

4. It is observed that graphs of GCM values of GASAT into the future, whilst having some wiggles, are well approximated by a constant times f(t).

5. In Pat Frank’s paper this relationship is described by his Equation (1), which can be rewritten in slightly different notation as:

(1) T(t) = b f(t) + A

where T(t) is the emulated value at time (year) t. The value of constant A, an offset, is not especially important, but the value of constant b is, and Pat supplies a value for this.

6. Though T(t) in (1) here approximates the GCM values well, it does not tell us about errors in those GCMs, which might lead them to be wildly inaccurate in the future. That is, the uncertainty over how far off the real GASAT it might be at the year 2100 might be great, either because the spread of probable GCM values might be great (the precision problem), or because the GCM exhibits a nonzero bias (the mean of its envelope minus the true GSTA) which is amplified over a period of 80 years (the accuracy problem), or both.

7. In addition to the GHG forcing f(t), the GCMs use other much larger forcings, say F(t), to model temperature. Pat quotes other papers to show that for the LWCF (Long Wave Cloud Forcing) component of this, the RMSE (Root Mean Squared Error) is +/-4W/m^2 when averaged over a year (relevant if T(t) is advanced with the increment of t being 1 year).

8. Therefore in any one year there is, as well as f(t), an uncertainty of +/-4W/m^2 to be added in. When considering the change from one year to the next, the change in GASAT over that period is to be considered, so the model is for T(t)-T(t-1) and the uncertainty is applied to f(t)-f(t-1), This is the import of Pat’s Equation (5.1) which I rewrite here as

(2) T(t)-T(t-1) = b(f(t)-f(t-1)) +/- u

where u = 4W/m^2.

9. By the RSS (Root Sum of Squares) method of combining independent uncertainties, and adding the telescoping terms in (2) for successive values of t, we get

(3) T(t)-T(0) = b(f(t)-f(0)) +/- u sqrt(t)

So, for example, the uncertainty after 81 years is +/-9u = +/-36W/m^2, a large value indeed. Ergo, useless GCMs!

10. The head posting by Pat argues against Roy Spencer’s criticisms. From (3) it looks as if GCMs should wander by +/-36W/m^2, but they don’t. Pat writes “Models show TOA (Top Of Atmosphere) balance and LWCF error simultaneously”. This is certainly disturbing for the GCMs, as one wonders how they magically achieve balance in these circumstances, but it is also disturbing for Equation (3) above because it suggests that between different times the u’s might have correlation structure, contradicting the independence assumption required for (3). Again, as in my comment of Oct25 7:12am, I am not going to pursue this line for now.

11. (Whether Pat likes it or not, “uncertainty intervals” +/-u_i can be written as random variables U(i) and give the same results in the usual cases (normal, zero mean), and as I am more familiar with that notation I am going to use it here.) Consider the model

(4) T_k(t) = T_k(t-1) + b(f(t)-f(t-1)) + kU(t)

where k is 0 or 1 and T_0(0) = T_1(0) is an initial condition. If k=0 then (4) can be summed to give an emulator equation like (1) here and Pat’s (1). If k=1 then (4) is effectively the same as the “uncertainty” equation (2) or Pat’s equation (5). Using this recursion we can derive

(5) T_k(t) = T_k(0) + b(f(t)-f(0)) + k sum_1^t U(i)

It follows that T_1(t) = T_0(t) + sum U(i) and this links Pat’s equations (1) and (5) together. Now T_0(t) can be declared to be an emulator for anything, but a justification needs to be made. Pat declares his emulator to be for the ensemble mean of some CMIP5 models, and justifies this through Figure 1 of the paper. For T_1(t), the uncertainty equation, with sum_1^t U(i) replaced by +/-u sqrt(t) in his notation, he justifies it through analysis of TCF errors.

12. Now let’s turn to my new emulator again, as introduced in my Oct25 7:12am comment.

(6) X_k(t) = (1-a)X(t-1) + c f(t) + k G(t)

where G(t) has some distribution with mean z and variance s^2. Then

(7) X_k(t) = sum_0^{t-1} (1-a)^i(b f(t-i) + k G(t-i)) + (1-a)^t X_k(0)

Now assume that f(t) = dt for some constant d. Then

(8) E[X_k(t)] = cd(at+a-1)/a^2 + (1-a)^(t+1)(cd/a^2-X_k(0)) + kz(1-(1-a)^t)/a
(9) Var[X_1(t)] = (1-(1-a)^(2t)) s^2/(2a-a^2)

For 0<a<1, choosing c = ab makes X_0(t) follow a path very close to T_0(t), so it is an equally good emulator as T_0(t). But the variance of the uncertainty version X_1(t) does not grow without limit, as it tends to s^2/(2a-a^2), which is very different behaviour from T_1(t).

I’ll write a separate comment to use this exposition to respond to some of Pat’s comments above.

Reply to  See - owe to Rich
October 28, 2019 2:51 pm

“11. (Whether Pat likes it or not, “uncertainty intervals” +/-u_i can be written as random variables U(i) and give the same results in the usual cases (normal, zero mean),”

If uncertainty is a random variable then it is subject to being made more accurate using the central limit theorem. This runs into the problem of – how does a model made up of differential equations use the central limit theorem to cancel out errors in its output?

Assuming that uncertainty is a random variable with a normal distribution means that its mean is also the highest probability value – i.e. the most accurate. Thus the claim made by the climate alarmists that the CGMs are highly accurate because of the cancellation of errors should be considered to be true. But then this runs into the paradox that their outputs don’t match reality – so how can their outputs be the most accurate?

October 28, 2019 6:09 am

Replies to some of Pat’s comments upstream.

P: Rich, I’m not going to concern myself with your emulator.

R: Pat, I’m not surprised, because you show absolutely no interest in addressing the falsifiability question of your results. And it’s understandable, given how many years you have invested into this. But other readers will see that the existence of an equally good emulator (my para 12 above) which, combined with a method for calculating uncertainty which cannot be proven to be distinct from yours (i.e. RSS), gives much lower uncertainty bounds, means that your conclusion is not, as they say in the climate science trade, “robust”.

P: The LWCF uncertainty interval I use derives directly from the calibration of climate models against measured observables.

R: I agree; it’s my para. 7 above, and it’s a strong part of your paper.

P: That calibration uncertainty defines a resolution lower limit of climate models.

R: I agree; it’s not the initial uncertainty that bothers me, but the propagation.

P: The uncertainty does not enter the emulation at all.

R: Strictly speaking that is true, as it is T_0(t) in my para 11 above. But it enters into T_1(t), which differs from T_0(t) only in the inclusion of “error” or “uncertainty” terms, and it is from T_1(t) that the error/uncertainty propagation is derived. Therefore the general form of the emulation equation is important for the subsequent derivation of uncertainty.

P: Your definition of an uncertainty bound as something that, “reasonably predicts the spread from running [GCMs] into the future” is just an epidemiological variation in model output. A predictive pdf.

R: Yes, for a few days now I have been happy to take back that narrow view. This is for two reasons, first that it only addresses “precision” and not “accuracy”, and secondly that there seems to be some weird internal correction in the GCMs, which is deeply disturbing to me, which ensures that rough radiative balance is achieved at the top of the atmosphere.

P: LWCF uncertainty is an unvarying constant, Rich. It doesn’t correlate with anything.

R: It can’t be a constant because as you have said yourself, it has a +/-, i.e. +/-4W/m^2. That indicates a range, whether it be a standard deviation portion of a distribution or a uniform inviolable interval. In this case it’s the difference between what the GCMs say LWCF should be, and what it was actually observed to be, which varies from year to year. Sane mathematicians and most scientists are going to call that an error distribution. And then, of course, correlation is perfectly possible. I can see that this is a fundamental difference between us which I have just about given up on resolving.

P: I’m worried about knowing whether a result is reliable or not. That’s what “strict reference” means. It means I use the method because it tells me something I need to know — reliability — even though the use violates statistical assumptions.

R: I’m also worried about that too. I’m worried about the reliability of your very wide estimates of GCM uncertainty in the year 2100! And I don’t mind a certain amount of pragmatism, ignoring the difference between the sum of 5 uniform intervals and a normal distribution say, but one has to be very careful not to go too far.

P: When the resolution limit is an empirical uncertainty interval, the statistical probability does not apply. Given an empirical calibration SD, one cannot say there is a 5% chance the correct answer is outside 2-sigma. Such a statement is meaningless — because the error distribution is not known to be normal.

R: Well, I was merely regurgitating the sense of what you wrote in your “helpful screed”:
“* The odds are 20 to 1 against the uncertainty of X_i being larger than (+/-)dX_i.
The value of dX_i represents 2-sigma for a single-sample analysis, where sigma is the standard deviation of the population of possible measurements from which the single sample X_i was taken.
The uncertainty (+/-)dX_i Moffat described, exactly represents the (+/-)4W/m^2 LWCF calibration error statistic derived from the combined individual model errors in the test simulations of 27 CMIP5 climate models.”

But of course I agree that depending on how far from normal the distribution is the 5% will be subject to some error. But I’m pragmatic about that; the 5% still gives a general flavour.

Reply to  See - owe to Rich
October 28, 2019 2:57 pm

Rich,

” I’m also worried about that too. I’m worried about the reliability of your very wide estimates of GCM uncertainty in the year 2100!”

You shouldn’t be worried. Once the CGM’s iterative runs identify a temperature differential greater than the uncertainty interval the iterative runs should be stopped. Their outputs are not reliable past that point. That’s true for your calculation of uncertainty as well.

Reply to  Tim Gorman
October 28, 2019 3:11 pm

Tim, no my worry is not in that direction. It is in the direction that Pat has overestimated the uncertainty interval width, and the runs will never get close to those bounds. In fact, Pat has admitted that the GCMs have smaller spread than his uncertainty, but this appears to imply that they are therefore inaccurate, i.e have a bias which may be unknowable until observations down the line.

Reply to  See - owe to Rich
October 29, 2019 9:36 am

“In fact, Pat has admitted that the GCMs have smaller spread than his uncertainty,”

What Pat has pointed out is that the models all produce about the same results. That really has nothing to do with the uncertainty of the results. The CGMs are determinative, meaning if you put in A you *always* get out B. A single model doesn’t vary over a different number of runs as long as no changes are made in the input data, the fudge parameters, or the order of evaluating the differential equations.

“this appears to imply that they are therefore inaccurate, i.e have a bias which may be unknowable until observations down the line.”

It tells you that they are data matching programs being used to extrapolate into the future. Each data matching program is slightly different thus giving different extrapolations. They are *not* comprehensive models of the overall physics associated with the Earth and its sub-components. If they were they wouldn’t have different outputs. Take Gauss’ Law on electric charges. It *is* a comprehensive model of the physics. It gives exact, accurate answers every single time – as long as the inputs are exact and accurate. The problem is in measuring the input, i.e. the electric charge, exactly and accurately. Because that measurement has inexactness and inaccuracy the output has an uncertainty interval. It’s impossible to eliminate it. Tell me *exactly* and accurately what the charge on an electron is and I’ll ask you how you measured at that resolution.

October 28, 2019 7:16 am

Final thoughts on Pat Frank’s paper.

Take the year 2100. A GCM in 2019 might, by some fluke, choose a “pathway” for GHGs and solar variations which closely matches what actually happens. It will predict a GASAT value M (for model) in 2100, and there will be a measured GASAT value A (for actual) then. (If there isn’t, it will either be because humans think it too boring or irrelevant by then, or we have been wiped out by a cataclysm which in my view certainly won’t be from CAGW.)

So there will be an error M-A. If Pat’s paper doesn’t say anything about that then I have been wasting my time studying it these past several weeks. If it does, then I think it says that |M-A| will be of the order of 16K (say, the exact figure is not important here). Now M-A derives from two sources: variance – intra-model variation (or inter-model as well if an ensemble is used), and bias. We can see the variance of the models from their outputs, and it’s much smaller than radiation of +/-4W/m^2 implies. Therefore, if the models are not biased, |M-A| will turn out to be much smaller than 16K. But Pat may be right and |M-A| around 16K will actually occur. In this case the models must be biased, so the parameter z in my para 12 above (which could equally apply to U for Pat’s emulator) is non-zero. Hopefully we could detect z != 0 much earlier than 2100, and indeed there are already claims of bias in the models over the last 30 years. 16K divided by 80 years is “only” 0.2K per year, but after 10 years that becomes 2K which is a significant departure.

Returning to T_1(t), Pat’s emulator plus error, we can change it so that it tracks GASAT rather than the GCMs. If U(t) = Z + V(t) then Z can match the unknown bias and V(t) with mean 0 and variance s^2 can match the intra-model spread. The total standard deviation sqrt( Var(Z)+s^2) can match the scaled LWCF error +/-4 W/m^2, and so Var(Z) can be deduced. Over a period of some years the single realized value Z=z can be estimated, and the intra-model spread prediction +/-s*sqrt(t) tested. In this way Pat’s emulator can be improved so as to become falsifiable (and validated in the happy event that Pat’s theory is correct).

The same procedure can be done for my emulator X(t), with some particular value of ‘a’ specified, and results compared with T(t). Eventually it should be possible to discriminate between the two.

Pat will no doubt object, as usual, but the above is proper mathematical modelling which has some prospect of being validated.

Reply to  See - owe to Rich
October 28, 2019 3:10 pm

“But Pat may be right and |M-A| around 16K will actually occur. ”

Pat’s thesis doesn’t predict anything! *Any* value within the uncertainty interval is possible. Your M-A could be large or it could be small based on the uncertainty interval.

What his thesis *does* say, is that once the uncertainty interval is larger than the anomaly the CGMs are trying to calculate that the CGMs become totally unreliable. There is no use in extending their iteration interval past that point. It’s no different than trying to read millivolts on a digital meter with only two significant digits. Your uncertainty is greater than what you are trying to read! And no amount of statistical analysis can change that fact. As Pat said, the fuzziness of the pixel precludes knowing anything.

October 28, 2019 3:43 pm

Rich,

I’m pretty sure that no one that follows Pat’s analysis believes that |M-A| will be on the order of 16K by 2100. The argument, as I understand it, is that M-A ca. 2100 will be meaningless because M 2100 (as of 2019) is meaningless. And this follows because M2099 (as of 2019) is meaningless and so on. They are all meaningless because for any forecast period, the uncertainty of the cloud physics, as amply evidenced by the results of the GCMs themselves, greatly exceeds the magnitude of the forecasts. The fact that the GCMs are somehow constrained to prevent realizations commensurate with the magnitude of the missing and/or misspecified physics is of no import since Pat’s emulations indicate that the forecasted changes in GASAT are linear with the projected forcings. This means that the uncertainty of the forecasts should also accumulate accordingly. No heavy math needed, just logic.

Reply to  Frank from NoVA
October 29, 2019 1:33 am

Frank from NoVA, “just logic” eh? Very fuzzy logic to my mind, which is why I set out my paragraphs 1 to 12 to follow it, and succeeded provided I could use kosher statistical modelling rather than the concept of “uncertainty intervals” over which there has been so much disagreement in this thread.

Your logic includes “this means that the uncertainty of the forecasts should also accumulate accordingly”, and that is where the logic goes wrong. Mathematics shows that the accumulation depends on the nature of the emulation, and I produced an emulation (in fact infinitely many in the spectrum of 0<a<1) which agrees very closely with Pat's and yet has much smaller growth in uncertainty over time. I don't think at present we can distinguish between these two emulators, though I'd like to.