Reposted from Dr. Judith Curry’s Climate Etc.
Posted on October 29, 2019 by curryja
by Judith Curry
“Letting go of the phantastic mathematical objects and achievables of model- land can lead to more relevant information on the real world and thus better-informed decision- making.” – Erica Thompson and Lenny Smith
The title and motivation for this post comes from a new paper by Erica Thompson and Lenny Smith, Escape from Model-Land. Excerpts from the paper:
“Model-land is a hypothetical world (Figure 1) in which mathematical simulations are evaluated against other mathematical simulations, mathematical models against other (or the same) mathematical model, everything is well-posed and models (and their imperfections) are known perfectly.”
“It also promotes a seductive, fairy-tale state of mind in which optimising a simulation invariably reflects desirable pathways in the real world. Decision-support in model-land implies taking the output of model simulations at face value (perhaps using some form of statistical processing to account for blatant inconsistencies), and then interpreting frequencies in model-land to represent probabilities in the real-world.”
“It is comfortable for researchers to remain in model-land as far as possible, since within model-land everything is well-defined, our statistical methods are all valid, and we can prove and utilise theorems. Exploring the furthest reaches of model-land in fact is a very productive career strategy, since it is limited only by the available computational resource.”
“For what we term “climate-like” tasks, the realms of sophisticated statistical processing which variously “identify the best model”, “calibrate the parameters of the model”, “form a probability distribution from the ensemble”, “calculate the size of the discrepancy” etc., are castles in the air built on a single assumption which is known to be incorrect: that the model is perfect. These mathematical “phantastic objects”, are great works of logic but their outcomes are relevant only in model-land until a direct assertion is made that their underlying assumptions hold “well enough”; that they are shown to be adequate for purpose, not merely today’s best available model. Until the outcome is known, the ultimate arbiter must be expert judgment, as a model is always blind to things it does not contain and thus may experience Big Surprises.”
The Hawkmoth Effect
The essential, and largely unrecognized, problem with global climate models is model structural uncertainty/error, which is referred to by Thompson and Smith as the Hawkmoth Effect. A poster by Thompson and Smith provides a concise description of the Hawkmoth effect:
“The term “butterfly effect”, coined by Ed Lorenz, has been surprisingly successful as a device for communication of one aspect of nonlinear dynamics, namely, sensitive dependence on initial conditions (dynamical instability), and has even made its way into popular culture. The problem is easily solved using probabilistic forecasts.
“A non-technical summary of the Hawkmoth Effect is that “you can be arbitrarily close to the correct equations, but still not be close to the correct solutions”.
“Due to the Hawkmoth Effect, it is possible that even a good approximation to the equations of the climate system may not give output which accurately reflects the future climate.”
From their (2019) paper:
“It is sometimes suggested that if a model is only slightly wrong, then its outputs will correspondingly be only slightly wrong. The Butterfly Effect revealed that in deterministic nonlinear dynamical systems, a “slightly wrong” initial condition can yield wildly wrong outputs. The Hawkmoth Effect implies that when the mathematical structure of the model is only “slightly wrong”, then even the best formulated probability forecasts will be wildly wrong in time. These results from pure mathematics hold consequences not only for the aims of prediction but also for model development and calibration, ensemble interpretation and for the formation of initial condition ensembles.”
“Naïvely, we might hope that by making incremental improvements to the “realism” of a model (more accurate representations, greater details of processes, finer spatial or temporal resolution, etc.) we would also see incremental improvement in the outputs. Regarding the realism of short- term trajectories, this may well be true. It is not expected to be true in terms of probability forecasts. The nonlinear compound effects of any given small tweak to the model structure are so great that calibration becomes a very computationally-intensive task and the marginal performance benefits of additional subroutines or processes may be zero or even negative. In plainer terms, adding detail to the model can make it less accurate, less useful.”
JC note: This effect relates to the controversy surrounding the very high values of ECS in the latest CMIP6 global model simulations (see section 5 in What’s the worst case?), which is largely related to incorporation of more sophisticated parameterizations of cloud-aerosol interactions.
Fitness for purpose
From the Thompson and Smith paper:
“How good is a model before it is good enough to support a particular decision – i.e., adequate for the intended purpose (Parker, 2009)? This of course depends on the decision as well as on the model, and is particularly relevant when the decision to take no action at this time could carry a very high cost. When the justification of the research is to inform some real-world time-sensitive decision, merely employing the best available model can undermine (and has undermined) the notion of the science-based support of decision making, when limitations like those above are not spelt out clearly.”
“Is the model used simply the “best available” at the present time, or is it arguably adequate for the specific purpose of interest? How would adequacy for purpose be assessed, and what would it look like? Are you working with a weather-like task, where adequacy for purpose can more or less be quantified, or a climate-like task, where relevant forecasts cannot be evaluated fully? How do we evaluate models: against real-world variables, or against a contrived index, or against other models? Or are they primarily evaluated by means of their epistemic or physical foundations? Or, one step further, are they primarily explanatory models for insight and understanding rather than quantitative forecast machines? Does the model in fact assist with human understanding of the system, or is it so complex that it becomes a prosthesis of understanding in itself?”
“Using expert judgment, informed by the realism of simulations of the past, to define the expected relationship of model with reality and critically, to be very clear on the known limitations of today’s models and the likelihood of solving them in the near term, for the questions of interest.”
My report Climate Models for Laypersons, addressed the issue of fitness for purpose of global climate models for attribution of 20th century global warming:
“Evidence that the climate models are not fit for the purpose of identifying with high confidence the relative proportions of natural and human causes to the 20th century warming is as follows:
- substantial uncertainties in equilibrium climate sensitivity (ECS)
- the inability of GCMs to simulate the magnitude and phasing of natural internal variability on decadal-to-century timescales
- the use of 20th century observations in calibrating/tuning the GCMs
- the failure of climate models to provide a consistent explanation of the early 20th century warming and the mid-century cooling.”
From my article in the CLIVAR Newsletter:
“Assessing the adequacy of climate models for the purpose of predicting future climate is particularly difficult and arguably impossible. It is often assumed that if climate models reproduce current and past climates reasonably well, then we can have confidence in future predictions. However, empirical accuracy, to a substantial degree, may be due to tuning rather than to the model structural form. Further, the model may lack representations of processes and feedbacks that would significantly influence future climate change. Therefore, reliably reproducing past and present climate is not a sufficient condition for a model to be adequate for long-term projections, particularly for high-forcing scenarios that are well outside those previously observed in the instrumental record.”
With regards to 21st century climate model projections, Thompson and Smith make the following statement:
“An example: the most recent IPCC climate change assessment uses an expert judgment that there is only approximately a 2/3 chance that the actual outcome of global average temperatures in 2100 will fall into the central 90% confidence interval generated by climate models. Again, this is precisely the information needed for high-quality decision support: a model-based forecast, completed by a statement of its own limitations (the Probability of a “Big Surprise”).”
While the above statement is mostly correct, the IPCC does not provide a model-based forecast, since they admittedly ignore future volcanic and solar variability.
Personally I think that the situation with regards to 21st century climate projections is much worse. From Climate Models for Laypersons:
“The IPCC’s projections of 21st century climate change explicitly assume that carbon dioxide is the control knob for global climate. The CMIP climate model projections of the 21st century climate used by the IPCC are not convincing as predictions because of:
- failure to predict the warming slowdown in the early 21st century
- inability to simulate the patterns and timing of multidecadal ocean oscillations
- lack of account for future solar variations and solar indirect effects on climate
- neglect of the possibility of volcanic eruptions that are more active than the relatively quiet 20th century
- apparent oversensitivity to increases in greenhouse gases”
With regards to fitness for purpose of global/regional climate models for climate adaptation decision making, there are two particularly relevant articles:
- The Myopia of Imperfect Climate Models, by Frigg, Smith and Stainforth
- On the use and misuse of climate change projections in international development by Nissan et al.
“When a long-term view genuinely is relevant to decision making, much of the information available is not fit for purpose. Climate model projections are able to capture many aspects of the climate system and so can be relied upon to guide mitigation plans and broad adaptation strategies, but the use of these models to guide local, practical adaptation actions is unwarranted. Climate models are unable to represent future conditions at the degree of spatial, temporal, and probabilistic precision with which projections are often provided which gives a false impression of confidence to users of climate change information.”
Pathways out of model land and back to reality
Thompson and Smith provide the following criteria for identifying whether you are stuck in model land with a model that is not adequate for purpose:
“You may be living in model-land if you…
- try to optimize anything regarding the future;
- believe that decision-relevant probabilities can be extracted from models;
- believe that there are precise parameter values to be found;
- refuse to believe in anything that has not been seen in the model;
- think that learning more will reduce the uncertainty in a forecast;
- explicitly or implicitly set the Probability of a Big Surprise to zero; that there is nothing your model cannot simulate;
- want “one model to rule them all”;
- treat any failure, no matter how large, as a call for further extension to the existing modelling strategy.”
“Where we rely more on expert judgment, it is likely that models with not-too-much complexity will be the most intuitive and informative, and reflect their own limitations most clearly.”
“In escaping from model-land do we discard models completely: rather, we aim to use them more effectively. The choice is not between model-land or nothing. Instead, models and simulations are used to the furthest extent that confidence in their utility can be established, either by quantitative out-of-sample performance assessment or by well-founded critical expert judgment.”
Thompson and Smith focus on the desire to provide probabilistic forecasts to support real-world decision making, while at the same time providing some sense of uncertainty/confidence about these probabilities. IMO once you start talking about the ‘probability of the probabilities,’ then you’ve lost the plot in terms of anything meaningful for decision making.
Academic climate economists seem to want probabilities (with or without any meaningful confidence in them), and also some who are in the insurance sector and the broader financial sector. Decision makers that I work with seem less interested in probabilities. Those in the financial sector want a very large number of scenarios (including plausible worst case) and are less interested in actual probabilities of weather/climate outcomes. In non financial sectors, they mostly want a ‘best guess’ with a range of uncertainty (nominally the ‘very likely’ range); this is to assess to what degree they should be concerned about local climate change relative to other concerns.
As argued in my paper Climate Change: What’s the Worst Case?, model inadequacy and an inadequate number of simulations in the ensemble preclude producing unique or meaningful probability distributions from the frequency of model outcomes of future climate. I further argued that statistical creation of ‘fat tails’ from limited information about a distribution can produce very misleading information. I argued for creating a possibility distribution of possible scenarios, that can be created in a variety of ways (including global climate models), with a ‘necessity’ function describing the level and type of justification for the scenario.
Expert judgment is unavoidable in dealing with projections of future climates, but expert judgment on model adequacy for purpose is arguably more associated with model ‘comfort’ than with any rigorous assessment (see my previous post Culture of building confidence in climate models .)
The ‘experts’ are currently stymied by the latest round of CMIP6 climate model simulations, where about half of them (so far) have equilibrium climate sensitivity values exceeding 4.7C – well outside the bounds of long-established likely range of 1.5-4.5C. It will be very interesting to see how this plays out – do you toss out the climate model simulations, or the long-standing range of ECS values that is supported by multiple lines of evidence?
Application of expert judgment to assess the plausibility of future scenario outcomes, rather than assessing the plausibility of climate model adequacy, is arguably more useful.
Alternative scenario generation methods
An earlier paper by Smith and Stern (2011) argues that there is value in scientific speculation on policy-relevant aspects of plausible, high-impact scenarios, even though we can neither model them realistically nor provide a precise estimate of their probability. A surprise occurs if a possibility that had not even been articulated becomes true. Efforts to avoid surprises begin with ensuring there has been a fully imaginative consideration of possible future outcomes.
For examples of alternative scenario generation that are of particular relevance to regional climatic change (which is exceptionally poorly simulated by climate models), see these previous posts:
Historical and paleoclimate data, statistical forecast models, climate dynamics considerations and simple climate models can provide the basis for alternative scenario generation.
Given the level and types of uncertainty, efforts to bound the plausible range of future scenarios makes more sense for decision making than assessing the probability of probabilities, and statistically manufacturing ‘fat tails.’
Further this approach is a heck of lot less expensive than endless enhancements to climate models to be run on the world’s most powerful supercomputers that don’t address the fundamental structural problems related to the nonlinear interactions of two chaotic fluids.
Kudos to Thompson and Smith for their insightful paper and drawing attention to this issue.

My particular experience writing ‘models’ was in the financial world.
Even single purpose models projecting workload volumes had an extremely short shelf life. Literally days.
Which means I ran some models daily, using actual data through yesterday.
Chaotic variables, in my case human nature decisions and experiences drastically affect workloads. Changes due to human factors can be roughly estimated based upon historical data; but are never simulated with any accuracy.
Workhour and workhour cost estimates were trash as they came off the printer. Work hour costs are dependent upon workloads and upon manager/supervisor applying labor to properly process workload. A few bad decisions quickly magnify workhour costs.
All that a daily workhour cost model displayed of value, was how bad workhour usage had been up through yesterday. Even then, payroll adjustments took at least three days until a day’s workhour costs were roughly accurate.
Bad messages to deliver to the bosses, especially as they didn’t want to hear the caveats.
Simple models, using excellent highly detailed historical data.
Modeling climate is not simple. Available data is not highly detailed and often is of questionable accuracy.
Modifying (adjusting) historical data to feed claimed simulations of climate, is a travesty.
Being proud of forcing an immensely complex model simulating an extremely large near infinite complex situation to meet the modeler’s assumptions is sheer hubris.
I am amused that the graphic’s central black spot resembles a pacman.
Very apropos!
Well done Thompson and Smith article here and Dr. Curry’s posting and description are masterful.
From my point-of-view, the climate models are just grandiose exercises in extrapolation (and there is good evidence they are nothing but first-order regression, see P. Frank) — scores of variables are empirically adjusted so that the output resembles available data, then they are run into the future to see what happens. As anyone familiar with basic statistical linear regression can tell you (dependent variable Y on independent variable X), a linear regression is pretty much useless except inside the interval of the X data. And extrapolations from higher-order polynomial regressions, such as 3rd, 4th, 5th, etc., are typically wildly wrong.
The notion of looking at the standard deviation of an “ensemble” of different models and calculating a confidence interval of future results is absurd. Again, basic statistics tells that the standard deviation of a single linear regression blows up outside of the X data interval. The various models are certainly not random samplings as they are completely independent of each other. Thus, no probability distribution can be determined (or even defined), which is required for transforming a standard deviation into a confidence interval, such as a 95% two-sigma interval.
Carlo –> “The various models are certainly not random samplings as they are completely independent of each other. Thus, no probability distribution can be determined (or even defined), which is required for transforming a standard deviation into a confidence interval, such as a 95% two-sigma interval.”
You have just hit upon a significance that I am working on with the actual temperature data. Each temperature reading is basically a stand alone population of 1. It has no probability distribution associated with it so you can not use it as a “sample” to create an uncertainty of the mean calculation. At best any averaging simply carries with it the uncertainty of each individual temperature reading taken. You can “create” a population from the individual readings but calculating a standard deviation from this doesn’t remove any uncertainty either.
“Each temperature reading is basically a stand alone population of 1.”
I agree completely; in terms of the Guide to the Expression of Uncertainty in Measurement (the BIPM GUM), this is kind of situation is handled by assuming a population distribution (the Type B) based on “other than standard deviations” using experience and engineering judgement. Many times these come down to a rectangular distribution between two upper and lower limits, within which the result is estimated to lie anywhere with equal probability. For temperature, a Type B uncertainty could then be expressed as +/- 3C, for example (the GUM tells how to convert an interval like this into an uncertainty).
And the sigma/root(n) expression for an uncertainty is only valid if the n different measurements are all made under identical conditions. If the temperature being measured is constantly changing with time, n can only be equal to 1.
You’re repeating some of the things I am working on in a multipart post.
I shall be looking forward to reading them.
““It is comfortable for researchers to remain in model-land as far as possible, ”
Of course it is. Just like a lot of video gamers prefer the comfort of their virtual world to working a real job. The unknowns of life are uncomfortable; virtual reality is fun.
It has just occurred to me that there is potential to expose the weakness of the AGW paradigm through an interesting exercise i.e. “prove” that increasing atmospheric CO2 levels result in global cooling.
Suppose for the moment that cooling of 1 c has occurred over the last century. Would the alarmists have followed the same path to substantiate their agenda? I believe they would have and will have found as much evidence (complete with models) as that supporting their existing theory on warming .
Here is an interesting exercise for the physicists out there. I expect that you would work with water vapour, cloud, latent heat, plus vegetation, photosynthetic organisms and impact of marine temperature.
I am picking that with some work, someone can come up with a theory just as robust and with just as much evidence as the AGW one, exposing the bunk for what it is. There is plenty of substantiating literature out there to cherry pick. One would follow exactly the IPPC method but with different key search terms.
It requires a mind-set reversal. The climate is cooling and will continue to do so catastrophically due to increasing CO2. Save the planet!
Sometime in the future there may well be exactly this situation after negative feedbacks really kick in and over-compensate. Patterns in natural systems indicate they will -at least for a period of time.
Consider the exercise a computer game 🙂
M
My comment here on WUWT is reposted from the remarks I made over on Judith Curry’s blog.
—————————————————-
Another year has gone by and it’s now the Fall of 2019. But winter will be here in another month. Having escaped the mountainous snow country of my youth for the dry boring flatlands of the US Northwest, I can’t say I miss it.
However, it’s time once again to put up ‘Beta Blocker’s Parallel Offset Universe Climate Model’, a graphical GMT prediction tool first posted on Climate Etc. and on WUWT in the summer of 2015.
Judith Curry’s blog post ‘Escape from model land’ seems like an appropriate place for my annual repost of this graph here on WUWT. So here it is:
Beta Blocker’s Parallel Offset Universe Climate Model
Referring to the illustration, three alternative GMT prediction scenarios for the year 2100 are presented on the same graphic.
— Scenario #1 predicts a +3C rise in GMT by the year 2100 from the year 2015, roughly equivalent to a +4C rise from the year 1860, which should be considered the pre-industrial baseline year for this graphical analysis.
— Scenario #2 predicts a +2C rise from 2015, roughly equivalent to a +3C rise from 1860.
— Scenario #3 predicts a +1C rise from 2015, roughly equivalent to a +2C rise from 1860.
The above illustration is completely self-contained. Nothing is present which can’t be inferred or deduced from something else also contained in the illustration.
For example, for Beta Blocker’s Scenario #1, the rise in GMT of + 0.35 Degrees C / Decade is nothing more than a line which starts at 2016 and which is drawn graphically parallel to the rate of increase in CO2 which occurs in the post-2016 timeframe. Scenario #1’s basic assumption is that “GMT follows CO2 from Year 2016 forward.”
Beta Blocker’s Scenario #2 parallels Scenario #1 but delays the start of the strong upward rise in GMT through use of an intermediate slower rate of warming between 2025 and 2060 that is also common to Scenario #3. Scenario #2’s basic assumption is that “GMT follows CO2 but with occasional pauses.”
Beta Blocker’s Scenario #3 is simply the repeated pattern of the upward rise in GMT which occurred between 1860 and 2015. That pattern is reflected into the 2016–2100 timeframe, but with adjustments to account for an apparent small increase in the historical upward rise in GMT which occurred between 1970 and 2000.
Scenario #3’s basic assumption is that “Past patterns in the rise of GMT occurring prior to 2015 will repeat themselves from 2016 on through 2100, but with a slight upward turn as the 21st Century progresses.”
That’s it. That’s all there is to it. What could be more simple, eh?
All three Beta Blocker scenarios for Year 2100 lie within the IPCC AR5 model boundary range — which, it should also be noted, allows the trend in GMT in the 2000–2030 timeframe to stay essentially flat while still remaining within the error margins of the IPCC AR5 projections. (For all practical purposes, anyway.)
Scenario #3 should be considered as the bottom floor of the three scenarios, which is approximately a two degree C rise from pre-industrial CO2 concentration levels. It is also the scenario I suspect is most likely to occur.
The earth has been warming for more than 150 years. IMHO, the earth won’t stop warming just because some people think we are at or near the top of a long-term natural fluctuation cycle. The thirty-year running average of GMT must decline steadily for a period of thirty years or more before we can be reasonably certain that a long-term reversal of current global warming has actually occurred.
How did Beta Blocker’s Parallel Offset Universe Climate Model come about?
Back in 2015, I had been criticizing the IPCC’s climate models as being a messy hodge-podge of conflicting scientific assumptions and largely assumed physical parameterizations. Someone at work said to me, “If you don’t like the IPCC’s models, why don’t you write your own climate model?”
So I did. However, not having access to millions of dollars of government funding and a well-paid staff of climate scientists and computer programmers to write the modeling code, I decided to do the whole thing graphically. Back in 2015, the illustration you see above took about thirty hours to produce. In October, 2019, I updated its labeling to directly include the 1860 pre-industrial baseline datum.
If I’m still around in the year 2031, I will take some time to update the illustration to reflect the very latest HadCRUT numbers published through 2030, including whatever adjusted numbers the Hadley Centre might publish for the period of 1860 through 2015.
In the meantime, I’ll see you all next year in the fall of 2020 when the topic of ‘Are the IPCC’s models running too hot’ comes around once again.
And, given that the topic of climate change will be an important issue in the 2020 elections — unless it isn’t — then nothing in this world is more certain but that in another year’s time, the topic will in fact come around once again.
———————————–
Brilliant piece of work! The obvious question that arises when you see the projections is: when will warming accelerate to twice the current rate to match the predictions? It hasn’t happened yet. The longer we go at the current benign rate of warming, the higher the future acceleration in warming must be to match the predictions. The obvious conclusion: the models are wrong.
Stinkerp, you note correctly that the longer we go at the current rate of warming, the higher the future acceleration in warming must be to match the IPCC’s predictions.
It seems to me that this characteristic of the IPCC’s models is a factor which ought to be addressed in evaluating the uncertainties of those models, and hence their value and credibility for use in public policy decision making.
Please note as well that the trend lines for each projected temperature rise beyond 2016 are based upon assumed trends of peak hottest years, and are either partially or wholly linearized across the 2016 – 2100 time span.
Where have we heard a lot of discussions recently about how trend linearization affects the level of uncertainty associated with a climate model?
It’s been said that those who control the assumptions control the world.
Beta Blocker’s Parallel Offset Universe climate model is based 100% on assumptions. Change a few of its assumptions and the model changes accordingly, which is the means by which each of the three alternative scenarios is being produced.
And yet, the Parallel Offset Universe projections for the year 2100 lie within the boundaries of the IPCC model projections. Does this characteristic of the Beta Blocker model add to its credibility? I suppose that depends on who is looking at the model, and for what reasons.
It’s been my view for some time now that as long as the thirty year running average trend in GMT is above + 0.1 C / decade, then mainstream climate scientists will continue to claim that real-world temperature observations verify the IPCC models.
From IPCC AR5 (2013), these pearls:
because the climate system is inherently nonlinear and chaotic, predictability of the climate system is inherently limited. Even with arbitrarily accurate models and observations, there may still be limits to the predictability of such a nonlinear system – Annex III p.1460
The IPCC AR5 Technical Summary, Box TS.3, p.64, displayed this graphic comparing model outputs to measured temperatures showing how poorly the models perform, validating the statement above:
The foundation of all the apocalyptic claims of climate alarmists…er…”scientists” rests on the climate models. Measurements of temperature, sea level rise, ocean “acidity”, extreme weather, etc. contradict the model projections, contradict the findings of related research based on the CMIP models, and contradict all the claims of the alarmists. The climate models are the modern equivalent of haruspicy, though one could reasonably argue that a haruspex may be more accurate. At least the haruspex gets some tasty mutton out of the bargain. All the modelers get is existential angst.
“Using expert judgment, informed by the realism of simulations of the past, to define the expected relationship of model with reality and critically, to be very clear on the known limitations of today’s models and the likelihood of solving them in the near term, for the questions of interest.”
Does not parse. Perhaps the last phrase should be: “…form the questions of interest.”
The models are not completely useless, they make very good random number generators.
The failure of most models has been picking the wrong molecule. Since both have been accurately measured worldwide, the increase in water vapor molecules has been about 37 times more effective at global warming than the increase in CO2 molecules.
Judith Curry is wonderful, and I greatly appreciate her insight. However, she is missing the real point. The world’s politicians do not want better informed decision making. They see an opportunity to gain absolute control over the unwashed masses, and they’re seizing it with zeal.