Guest post by Kevin Kilty
This short essay was prompted by a recent article regarding improvements to uncertainty in a global mean temperature estimate. However, much bandwidth has been spilt lately in the related topic of error propagation [2, 3, 4], and so a small portion of this essay in its concluding remarks is devoted to it as well.
Manufacturing engineers work to improve product design, make products easier to manufacture, lower costs, and maintain or improve product quality. Among the tools they use to accomplish this, many are statistical in nature, and these have pertinence to the topic of the surface temperature record and its interpretation in the light of climate model projections. One tool I plan to present here is statistical process control (SPC).
1. Ever Present Variation
Manufactured items cannot be made identically. Even in mass production under the control of machines, there are influences such as wear of the machine, variations in settings, skill of operators, incoming material property variations and so forth, which lead to variation in a final product. All precision manufacturing begins with an examination of two things. First, there is the customer specification. This includes all the important product parameters and the limits that these parameters must stay within. Functionality of a product suffers if these quality measures do not stay within limits. Second is the process capability. Any manufacturer worth the title will know how the process used to make products for a customer varies when it is in control. This leads the manufacturer to an estimate of how many products in a run will be outside tolerance, how many might be reworked and so forth. It is not possible to estimate costs and profits without knowing capability.
2. Process Capability and Control
If a manufacturer’s process can produce routinely within the specifications, perhaps only one in a hundred items, or one in a thousand or three in a million (six sigma) outside of it, whatever is cost effective and achievable, then the process is capable. If it proves not capable one might ask what cost in new machinery would make it capable, and if the answer is not cost effective one might pass on the manufacturing opportunity or have someone more capable handle it. When a process is in control, it is operating as well as is humanly possible considering one’s capability. A process in control is an important concept to our discussion.
3. Statistical Process Control
Statistical process control (SPC) is mainly a process of charting and interpreting measurements in real time. Various SPC charts become a tool through which an operator, potentially someone of modest training, can monitor a process and adjust it or stop it if indications are that it is drifting out of control. There are many different possible control charts, but a common one is the X −bar chart so named because the parameter being monitored and recorded on the chart is the mean attribute of a sample of manufactured items. Often it is paired with an R chart which shows the range within the same measurements. R is often used in manufacturing because it is capable of showing the same information about variation as say, standard deviation, but with much less calculation. Let’s discuss the X −bar chart. Figure 1 shows an example of a paired set of charts.
Figure 1. A pair of control charts for X-bar and range. The X-bar chart shows measurements exceeding control limits above and below, while the range shows no increase in variability. We conclude an operator is unnecessarily changing machine settings. Source .
The chart begins with its construction. First, there is a specified target value for the process. A process is then designed to achieve this target. Then some number of measurements are taken from this process while it is known to be operating as well as is humanly possible – i.e. in control. Measurements are gathered into consecutive groups of fixed number, N (five and seven are common), and the mean of the means, and range of the means is calculated. Dead center horizontally across the chart is the target value then horizontal lines are placed above and below at some multiple of the process standard variation, measured by range or standard deviation. These are known as the process control limits (upper and lower control limits respectively UCL, LCL).
At this point one uses the chart to monitor an ongoing process. Think of charting as recording a continuing sequence of experiments. On a schedule our fixed number of manufactured items (N) are removed from production. The mean and range of some important attribute is calculated for this sample and the results plotted on their respective charts. The null hypothesis in each experiment is that the process continues to run just as it did during the chart creation period. As work proceeds the sequence of measured and plotted samples show either a pattern that is expected of a process in control, or a pattern of unexpected variations which suggest a process with problems. Observation by an operator of an unlikely pattern, such as; cycles, drift across the chart, too many points plotting outside control limits, or hugging one side of the chart, is evidence of a process out of control. An out of control process can be stopped temporarily while the process engineer or maintenance find and rectify the problem. One thing worth emphasizing is that SPC is a highly successful tool for handling variation in processes and identifying problems.
Figure 2. “…Comparison of a large set of climate model runs (CMIP5) with several observational temperature estimates. The thick black line is the mean of all model runs. The grey region is its model spread. The dotted lines show the model mean and spread with new estimates of the climate forcings. The coloured lines are 5 different estimates of the global mean annual temperature from weather stations and sea surface temperature observations….” Figures and description: Gavin Schmidt.
4. Ensemble of Models
Let’s turn attention to the subject of climate. The oft cited ensemble of model projections is something like a control chart. It represents a spread of model projections carefully initiated to represent what we believe is a future path of mean earth temperature with credible additions of CO2. It is not a plot of the full variation that climate models might conceivably produce, but rather more controlled variation of our expectations given what we know of climate and the differential equations representing it when it is in control. It is this in control concept that makes the process control chart and the projection ensemble similar to one another. The resemblance is even more complete with an overlay of observed temperature.
Figure 3. The grey 95% bounds of Figure 2 redrawn in skewed coordinates (blue/orange) to look more like a control chart. The grey lines indicate the envelope of observations. Black line is target.
This ensemble became controversial once people began placing observed temperatures on it. Schmidt produced one in a blog post in 2015. Figure 2 shows it. What the comparison between observed and projected temperature showed, initially, was a trend of observed temperature across the ensemble. Some versions of similar graphs have observed temperatures departing from projections entirely. Figures 3 and 4 show Figure 2 rotated into skewed coordinates to look more like a control chart monitoring a process. Schmidt states that Earth temperature are well contained within the ensemble – especially so after accounting for some extraneous factors (Figure 4). Yet, this misses an important point. The measurements in Figure 3 trend in an unlikely way across the ensemble, and have gone to running along the lower limit. After eliminating the trend in Figure 4 the comparison still shows observed temperatures hugging the lower end of the projections. Despite being told often that the departure of observations from the center of the ensemble is a non-issue, with each new comparison some unlikely features remains to fuel doubt. It is difficult to avoid concluding that what is wrong is one of the following.
(1) The models do run too hot. They overestimate warming from increasing CO2, possibly because of a flawed parameterization of clouds or some other factor.
(2) The observations are running too cool. What I mean is there are factors external to the models which are suppressing temperature in the real world. The models are not complete. Figure 2 from Realclimate.org takes exogenous factors into account. Yet, note that while the inclusion of these factors reduces the improbable trend across the diagram, it leaves the improbable tendency to cling to the lower half of the diagram, which suggests item 1 in this list again.
(3) The models and observations are of slightly different things. The observations mix unrelated things together, or contain corrections and processing not duplicated in the models.
Figure 4. The dashed (forced) 95% bounds of Figure 2 redrawn in skewed coordinates (blue/orange) to look more like a control chart. The grey lines indicate the envelope of observations. Black line is target.
These charts present data only through 2014, but while observed temperatures rose into the target region of the chart with the recent El Nino, they have more lately settled back to the lower part of the chart. It takes an extraordinary event to push observations toward the target region. One more observation about these graphs seems pertinent. If as Lenssen, et al, claim the 95% uncertainty bounds of the global mean temperatures are truly as small as 0.05C, then the spread in the various observations is, at times, unlikely itself.
Our little experiment here cannot settle the question of whether models run too hot, but two of our three possibilities suggest they do. It ought to be important to figure if this possibility is the truth.
The first draft of this essay concluded with the previous section. However, in the past few weeks there has been a lengthy discussion at WUWT about propagation of error, or what one could call propagation of uncertainty. The ensemble of model results is, in one point of view, an important monitor (like SPC) of health of our planet. If we believe that the assumptions going into production of the models are a true representation of how the Earth works, and if we are certain that our measurements represent the same thing the ensemble represents, then we arrive at the following: A trend across our control chart toward higher values suggests a worrying problem; a trend across the chart toward lower values suggests otherwise. But without some credible measure of bounds and resolution, no such use is reasonable.
One response of the climate science community to the apparent divergence of observations to models is to argue that there really is no divergence because the ensemble bounds could be widened to show true variability of the climate, and once this is done the ensemble limits will happily enclose observations. Or, they argue, there is no divergence if one takes into account exogenous factors ex post. But in my view arguing this way makes modeling pointless because it removes one’s ability to test anything. There is certainly a conflict between the desire to make uncertainties small, thus making a definitive scientific statement, and a desire to make the bounds larger to include the correct answer. The same point Vasquez and Whiting make here  …
”…Usually, it is assumed that the scientist has reduced the systematic error to a minimum, but there are always irreducible residual systematic errors. On the other hand, there is a psychological perception that reporting estimates of systematic errors decreases the quality and credibility of the experimental measurements, which explains why bias error estimates are hardly ever found in literature data sources….”
is what Henrion and Fischoff  found to be so in the measurement of physical constants over 30 years ago. Propagation of error plays an important role in the interpretation of the bounds and resolution of models and data. It is more than just initiation errors being damped out in a GCM. But to discuss its pertinence would make this post too long. Perhaps we‘ll return in a week or two when that topic cools off.
(1) Nathan J. L. Lenssen, et al., (2019) Improvements in the GISTEMP Uncertainty Model. JGR Atmospheres, 124, 6307-6326.
(2) Pat Frank https://wattsupwiththat.com/2019/09/19/emulation4-w-m-long-wave-cloud-forcing-error-and-meaning/
(3) R.C. Spencer, https://wattsupwiththat.com/2019/09/13/a-stovetop-analogy-to-climate-models/
(4) Nick Stokes, https://wattsupwiththat.com/2019/09/16/how-errorpropagation-works-with-differential-equations-and-gcms/
(5) AT&T Statistical Quality Control Handbook, Western Electric Co. Inc., 1985 Ed.
(6) RealClimate, NOAA temperature record updates and the ‘hiatus’ 4 June 2015, Accessed September 18, 2019.
(7) Ken Gregory, Epic Failure of the Canadian Climate Model,
(8) Victor R. Vasquez and Wallace B. Whiting, 2005, Accounting for Both Random Errors and Systematic Errors in Uncertainty Propagation Analysis of Computer Models Involving Experimental Measurements with Monte Carlo Methods, Risk Analysis, Volume25, Issue 6, Pages 1669-1681.
(9) Henrion, M., & Fischoff, B. (1986). Assessing uncertainty in physical constants. American Journal of Physics, 54( 9), 791– 798.