A Modest Proposal—Forget About Tomorrow

Guest Post by Willis Eschenbach

There’s a lovely 2005 paper I hadn’t seen, put out by the Los Alamos National Laboratory entitled “Our Calibrated Model has No Predictive Value” (PDF).

Figure 1. The Tinkertoy Computer. It also has no predictive value.

The paper’s abstract says it much better than I could:

Abstract: It is often assumed that once a model has been calibrated to measurements then it will have some level of predictive capability, although this may be limited. If the model does not have predictive capability then the assumption is that the model needs to be improved in some way.

Using an example from the petroleum industry, we show that cases can exist where calibrated models have no predictive capability. This occurs even when there is no modelling error present. It is also shown that the introduction of a small modelling error can make it impossible to obtain any models with useful predictive capability.

We have been unable to find ways of identifying which calibrated models will have some predictive capacity and those which will not.

There are three results in there, one expected and two unexpected.

The expected result is that models that are “tuned” or “calibrated” to an existing dataset may very well have no predictive capability. On the face of it this is obvious—if we could tune a model that simply then someone would be predicting the stock market or next month’s weather with good accuracy.

The next result was totally unexpected. The model may have no predictive capability despite being a perfect model. The model may represent the physics of the situation perfectly and exactly in each and every relevant detail. But if that perfect  model is tuned to a dataset, even a perfect dataset, it may have no predictive capability at all.

The third unexpected result was the effect of error. The authors found that if there are even small modeling errors, it may not be possible to find any model with useful predictive capability.

To paraphrase, even if a tuned (“calibrated”) model is perfect about the physics, it may not have predictive capabilities. And if there is even a little error in the model, good luck finding anything useful.

This was a very clean experiment. There were only three tunable parameters. So it looks like John Von Neumann was right, you can fit an elephant with three parameters, and with four parameters, make him wiggle his trunk.

I leave it to the reader to consider what this means about the various climate models’ ability to simulate the future evolution of the climate, as they definitely are tuned or as the study authors call them “calibrated” models, and they definitely have more than three tunable parameters.

In this regard, a modest proposal. Could climate scientists please just stop predicting stuff for maybe say one year? In no other field of scientific endeavor is every finding surrounded by predictions that this “could” or “might” or “possibly” or “perhaps” will lead to something catastrophic in ten or thirty or a hundred years. Could I ask that for one short year, that climate scientists actually study the various climate phenomena, rather than try to forecast their future changes? We still are a long ways from understanding the climate, so could we just study the present and past climate, and leave the future alone for one year?

We have no practical reason to believe that the current crop of climate models have predictive capability. For example, none of them predicted the current 15-year or so hiatus in the warming. And as this paper shows, there is certainly no theoretical reason to think they have predictive capability.

The models, including climate models, can sometimes illustrate or provide useful information about climate. Could we use them for that for a while? Could we use them to try to understand the climate, rather than to predict the climate?

And 100 and 500 year forecasts? I don’t care if you do call them “scenarios” or whatever the current politically correct term is. Predicting anything 500 years out is a joke. Those, you could stop forever with no loss at all

I would think that after the unbroken string of totally incorrect prognostications from Paul Ehrlich and John Holdren and James Hansen and other failed serial doomcasters, the alarmists would welcome such a hiatus from having to dream up the newer, better future catastrophe. I mean, it must get tiring for them, seeing their predictions of Thermageddon™ blown out of the water by ugly reality, time after time, without interruption. I think they’d welcome a year where they could forget about tomorrow.

Regards to all,

w.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
299 Comments
Inline Feedbacks
View all comments
Frank K.
November 1, 2011 5:15 pm

Leif Svalgaard says:
November 1, 2011 at 4:44 pm
“Computing time. If we had 1000 layers and a spatial resolution of a few meters and 4096-bit floating point numbers, there is little doubt that we can go to a much smaller time step as we can go to, say, a 500th-order Runge-Kutta method, instead of the 4th order used now. There is nothing magical about the 5 minutes.”
Well, only partly true. Sure you can’t use a time step of 10^-20 seconds. But, as you probably know, one should do a mesh independence study (by running the code on a series of meshes with different resolutions) to determine how small the mesh and time step should be to produce a “mesh independent” solution (where you would choose a suitable metric to determine the sensitivity). I’m sure GCM modelers do this all the time.
(Your comment about “500th order Runge-Kutta” makes no sense to me. What are you trying to say? By the way, don’t the GCM modelers all use leap frog time marching? Of course, using an explicit Runge-Kutta approach would make a lot of sense given it’s wide use in conventional CFD applications).
Also, if you are required by stability to take a really small time step, thereby making the numerical solution intractable, then you have a stiff or ill-posed problem on your hands. We haven’t talked about mathematical ill-posedness, which is yet another unknown in the climate modeling world…

Tom in South Jersey
November 1, 2011 5:18 pm

I’m late to the party, but I shall have my say anyway. You often see such predictions in the healthcare industry. For that I profoundly miss Junkfoodscience.com

Theo Goodwin
November 1, 2011 5:25 pm

Leif Svalgaard says:
November 1, 2011 at 4:27 pm
Theo Goodwin says:
November 1, 2011 at 3:36 pm
because the science is simply not there NOW.
“But it will be eventually, and that is my point. On parameters: if the models were constantly tuned to match observations they would always be correct ‘up to yesterday’. The parameters are constrained by other considerations and the models are thus not tuned to the data, but to reasonable physics, such as radiative balance [what goes in must come out].”
We have returned to a confusion between theory and model. Watch closely:
Our solar system is a model of the physical theory that describes it.
As a model of (part of) our (total) physical theory, our solar system is a set of objects (whose behavior) renders true all the universal generalizations in (that part of) our (total) physical theory.
Obviously, then, our physical theory is true (not just useful) because all of its universal generalizations are true of that special model which is our actual solar system.
So, Leif, using your words, the question is: how does “reasonable physics” constrain the parameters and, thereby, the models?
The first of two problems for your account is that you cannot tell us what that reasonable physics is; that is, you cannot tell us without citing a text book. But texts are not helpful here. The only thing that will serve here is a rigorously formulated set of hypotheses (part of a physical theory) that actually constrains these particular parameters. (My diagnosis is that when you take action to constrain the parameters you tell yourself that it is done on the basis of reasonable physics but you have never demanded of yourself that you write down a rigorous formulation of it. This is no put down. Einstein did the same thing on occasion.)
The second problem for your account is that you cannot specify some chain of reasoning that leads from the reasonable physics to the constraints on the parameters. If you had a physical theory, the relationship would be that the theory plus initial conditions implies the constraints. But what you have is your brilliant, highly educated gut. And you will not have anything more until you demand of yourself (and your colleagues) that you produce rigorously formulated chains of reasoning that lead from the reasonable physics to the parameters that they constrain.
The solution to your first problem is to continue doing what you are doing, and maybe some more things, until you have the necessary rigorously formulated physical hypotheses. Now, for the good of humanity, please announce that you do not have them now but even sceptics believe that you or your students will have them someday. For the solution to the second problem, see the solution to the first.

Steve in SC
November 1, 2011 5:26 pm

Wow! Willis both you and Leif are overly touchy today.
I must comment for the both of you.
There is no model that is 100% all the time.
There are models that are based on sound principles and are quite useful, particularly in reducing the cost of something or estimating a result. For instance, you can run your simulation software and fine tune your design based on those runs. But, in the end you really need to test that nuclear bomb to make sure it goes KaBoom instead of just Boom.
Chill out boys, Don’t be so crabby.

November 1, 2011 5:29 pm

Willis Eschenbach says:
November 1, 2011 at 5:13 pm
Pielke Sr. on how models are tuned …
I think he just proves my point:
“Some parameters (such as the von Karman “constant”) are assumed to be universal, but most are just values that provide the best fit of a parametrization with the observed data used in its construction. The second type of parametrization is the same as the first (their division into two types is artificial), except there is no observational data to make the tuning. “

November 1, 2011 5:35 pm

Theo Goodwin says:
November 1, 2011 at 5:25 pm
The first of two problems for your account is that you cannot tell us what that reasonable physics is; that is, you cannot tell us without citing a text book.
It takes a thick text book to tell you.

Theo Goodwin
November 1, 2011 5:53 pm

From the article about models of financial markets:
“he assumed, reasonably, that the process would simply produce the same parameters that had been used to produce the data in the first place. But it didn’t. It turned out that there were many different sets of parameters that seemed to fit the historical data. And that made sense, he realized–given a mathematical expression with many terms and parameters in it, and thus many different ways to add up to the same single result, you’d expect there to be different ways to tweak the parameters so that they can produce similar sets of data over some limited time period.”
OMG! Of course there are an indefinitely large number of ways to tweak a model that is not a model of a specific physical theory (or whatever kind of theory). All tweaks are based on common sense or enlightened common sense or enlightened genius about the objects that make up the model. But no tweak has any meaning unless the model being tweaked is a model of a physical theory. It is the physical theory and only the physical theory that gives a context to the model and that gives a specific meaning to each object in the model. Once there is a physical theory, each object is necessary to render true all the sentences in the physical theory. Prior to the existence of such a physical theory, the so-called model is just a collection of objects with no particular meaning.

Theo Goodwin
November 1, 2011 5:57 pm

Leif Svalgaard says:
November 1, 2011 at 5:35 pm
Theo Goodwin says:
November 1, 2011 at 5:25 pm
“The first of two problems for your account is that you cannot tell us what that reasonable physics is; that is, you cannot tell us without citing a text book.
It takes a thick text book to tell you.”
So, there is an “Axiomatization of the Theory of Climate Change?” That would not be a text book. Each of the axioms would be shown in its unique context in the overall theory that is the unique true account of climate change. In textbooks, you get principles with some explanation of how they are applied. Big difference.

November 1, 2011 7:04 pm

Leif writes “I think he just proves my point:”
WTF? Did you not read the first part?
“but most are just values that provide the best fit of a parametrization with the observed data used in its construction.”
So because some parameters might be “ok” then you’re right? Wow. What about all the other parameters that are coarse reflections of reality?

Theo Goodwin
November 1, 2011 7:35 pm

Willis Eschenbach says:
November 1, 2011 at 5:11 pm
From Environmental Research Letters:
At present, climate models are tuned to achieve agreement with observations. This means that parameter values that are weakly restricted by observations are adjusted to generate good agreement with observations for those parameters that are better restricted, with the TOA radiative balance belonging to the latter category.
“Yep. Sure ’nuff …”
Yep, unlimited rejiggering with nothing to supply context or give meaning to the individual rejiggers.

Eric Anderson
November 1, 2011 7:42 pm

In addition to whether all the physics are understood and properly included in the models (which is very possibly asking for more than can ever be achieved — but let’s set that aside now), we also have to include in the models precise information about the starting points. That has very little to do with physics calculations, and everything to do with physical facts, such as current temperatures, current aerosols, current vegetation and on and on, nearly ad infinitum. Unless we get the current state input properly, we can’t expect to run our physics calculations to generate a predictive view of a future state. Further, there are plenty of known unknowns, such as future volcanic eruptions, future changes in aerosols, vegetation, solar activity, just to name a few. By definition, it is impossible to include these in a model with precision.
The idea that the climate models will have good long-term predictive value if they can but manage to “get the physics right” is simply wishful thinking.

November 1, 2011 9:17 pm

Willis Eschenbach says:
November 1, 2011 at 7:39 pm
Leif, models are tuned to observations. Apparently, everyone but you got the memo. Note that better-tuned climate models (give results closer to observations) are claimed (without substantiation) by the IPCC to give better predictions.
Perhaps it is time to make explicit with a tuned model is. I offer the following definition: A model M is tuned if some observable output X from the model depends on some parameter P, and if the actual observed value X’ being different from X causes P to be adjusted to P’ so that the difference D = |X-X’] is smaller than without the tuning of P to P’. For this kind of tuning to be useful it must be done continuously. I grant that I have not carefully examined the current top-of-the-line models, but I have studied in great detail the physical basis for atmospheric modeling [e.g. as given by Jacobsen] and to my knowledge those models are not tuned as per the definition above. There are many parameters that encapsulate physics that can be expressed by empirical relationships [thus obviating the need for calculating them from the microphysics – which often cannot even be done], but those do not qualify. If you know specifically which parameters in current climate models are adjusted to minimize the measure D, I would be glad to be educated.

November 1, 2011 9:31 pm

Eric Anderson says:
November 1, 2011 at 7:42 pm
Further, there are plenty of known unknowns, such as future volcanic eruptions, future changes in aerosols, vegetation, solar activity, just to name a few. By definition, it is impossible to include these in a model with precision.
Those you deal with by running ‘scenarios’, e.g. by assuming a large volcanic at a given time and see what its effect will be. That is where models actually can shine.

November 1, 2011 10:08 pm

Leif writes : “A model M is tuned if some observable output X from the model depends on some parameter P, and if the actual observed value X’ being different from X causes P to be adjusted to P’ so that the difference D = |X-X’] is smaller than without the tuning of P to P’.”
I’m thinking you really ought to define what a curve fit is too …because this describes curve fitting applied to an imperfect process compared to imperfect measurements.
Your underlying assumption appears to be that P-P’ (necessarily moving away from anything that may have started as a “pure” physical representation) is so small that even over many iterations the accumulated error is either negligible or cancels out.

Eric Anderson
November 1, 2011 11:35 pm

Leif: “Those [known unknowns] you deal with by running ‘scenarios’, e.g. by assuming a large volcanic at a given time and see what its effect will be. That is where models actually can shine.”
Well, I don’t know about “shine.” The ‘scenarios’ may be interesting, perhaps even educational, but that they have any predictive value does not follow. Further, we have seen that in practice (not in the pristine idealized world of how “science” should operate) these ‘scenarios’ get pushed by those with an agenda as predictions or likely outcomes and then used to demand particular political/economic actions.
And we’re still not dealing with the formidable challenge of inputting the original parameters in a comprehensive enough way to generate valuable predictions.
Look, the models may have some value as tools to help us think about how the climate works and may even help us think about climate possibilities. But right now (1) we don’t have all the physics included, (2) we don’t have all the initial physical parameters included, and (3) we don’t know whether the ‘scenarios’ run with unknowns will bear any relation to future reality. I’m happy to let folks keep working on computer scenarios, but let’s acknowledge the very real limitations. They can keep improving things and then get back to us once they have a model that has actually been successful at forecasting a decade’s worth of climate (or was it 17 years that was needed to see an actual trend . . . or perhaps 30 :)). Until then, the onus is squarely and properly on those who market their scenarios/predictions to demonstrate why anyone else should take any of the scenarios/predictions seriously.

Richard S Courtney
November 2, 2011 1:27 am

Lief:
You have made several posts since my post at November 1, 2011 at 4:05 pm which was a response to your bluster at November 1, 2011 at 11:12 am but you have not replied to my post.
Therefore, in accordance with my post that you have not replied, I assume your bluster is a clear admission by you that you know you are wrong.
Richard

Frank
November 2, 2011 3:34 am

Frank said:

Willis,
I’d say this only shows that their objective functions are not suitable.

Willis said:

Unless you can show where they are are wrong instead of just asserting that they must be wrong, I’m going with their conclusions.

Okay, here’s what I understand of it:
For generating the “measurements” / “truth cases”, they take 3 parameters h, kp and kg, plus a grid of porosity and permeability values and feed them into the ECLIPSE simulator, getting a monthly set of 3 production rates during 3 years (36 sets in total, the “history set”) and a yearly set of the 3 production rates during 7 years (7 sets in total, the “future set”. For the “with error measurements”, they only modify the grid porosity and permeability values to within 1% error and run the same simulator.
Then, they use a genetic algorithm to try and find the optimum values for h, kp and kg that match the production rate sets by running the simulation over and over with different trial parameters. Figs 2 and 4 show more or less a likelihood that the tried parameters are correct. The objective functions are used to calculate this likelihood. For Figs. 2a and 4a, they use the history set, for Figs. 2b and 4b, they use the future set.
So far so good. But now comes the part where I don’t agree.
They say the model has no predictive power, but they don’t actually compare any predictions. In this case, what you would want to predict are the production rates. So to see if the model has predictive power, you would expect a comparison of the predicted and the “truth case” production rates. But this is not done! Instead, they argue that, because tuning to the future set gives different parameter values than tuning to the history set, the predictive power must be weak. However, it may well be that a bunch of parameters that’s not entirely correct (like those in Section 3.2), does in fact give a relatively good production rates prediction.
What I see in comparing Fig. 2a and 2b is that the monthly production rates in the history set don’t have enough variability to unequivocally find the best parameters (multiple spikes), whilst in the future set, the production rates do have enough variability (single clean spike). You can see the difference as trying to match a amplitude and frequency of a sine curve to a values near x=0 or to values along its entire up-and-down domain. In other words, the measurement data doesn’t “stretch” the parameter space enough. Similarly, in Fig. 4, you only see that the history set with error definitely doesn’t have enough data, and the future set only marginally has.
So the logic of saying that a model has predictive powers only if you get the same matched parameters when you tune to past and future sets, is basically wrong. Take the extreme case of having very little past data and lots and lots of future data — the matched parameters will almost always be different.

November 2, 2011 6:03 am

Frank writes : “So the logic of saying that a model has predictive powers only if you get the same matched parameters when you tune to past and future sets, is basically wrong.”
I believe Willis is right. When Carter et al say “We conclude that for this model you can only obtain a good prediction from the truth case, and that good matches from the history matching phase have no predictive value.”
If I’m understanding their paper correctly, they are saying that there is only one model that has predictive power and that is the truth model (ie the one that has correct parameters which by definition wont be changing) and that all the other models that have different parameters,even though they match the history fail to have predictive power.
Its true that they aren’t specific about what they mean by saying they have no predictive power. I guess you’d like to argue that when they say “no” they really mean varying degrees of “some”. Taken at their word, though, no means no.

November 2, 2011 6:05 am

Richard S Courtney says:
November 2, 2011 at 1:27 am
Therefore, in accordance with my post that you have not replied, I assume your bluster is a clear admission by you that you know you are wrong.
One more time: your post did not contain a question or anything for me to reasonably respond to. It just stated your opinion. Perhaps you could be specific and say again what is troubling you?

Frank K.
November 2, 2011 7:09 am

Leif – Given all of the questions/comments this thread has elicited, do you think your colleague at Stanford, Mark Jacobson, would be up for writing a short article for WUWT on the state-fo-the-art in GCM modeling? I for one would welcome his expertise and insight into this topic area. As someone who has worked in modern industrial CFD for over 20 years, I would love to learn more about GCMs beyond the material in the textbooks. As you can tell, I am particularly interested in the entire topic of numerical stability and it’s relationship to the discretization of the underlying differential equations, which is extremely important for any time-dependent numerical model (you can’t just say a 5 minute time step seems to work OK without some rigorous justification).

Frank
November 2, 2011 8:05 am

TimTheToolMan said:

I believe Willis is right. When Carter et al say “We conclude that for this model you can only obtain a good prediction from the truth case, and that good matches from the history matching phase have no predictive value.”

I get the idea that they were getting at. Basically, a lack of peak in Figs. 2b and 4b at parameter values where there *is* a peak in Figs. 2a and 4a, means that that parameter set (tuned by the history set) actually has a large objective function delta_f, and with that, a large difference between the predicted and “measured” values — or bad predictive capacity.
However, they don’t specifically calculate the future objective function at *precisely* the predicted parameter values. They just generate another objective function map which *interpolates* values near the predicted parameter values that happened to be visited by the genetic algorithm, possibly missing a sharp peak at precisely the predicted parameter values.
Moreover, the fact that there may be larger peaks in the future objective function map actually also clouds the issue. So yes, there may be parameter values which happen to give a higher peak than the peak at the predicted values, but that doesn’t mean the prediction is bad. It just means that with *that particular* future data set (a mere 21 data points in our case), there are other solutions for the parameters that happen to give a higher peak.
Coming back to the size of the future set; when they say that in the perturbed case, “the spike at h ~ 10 has the wrong values for kp and kg” while in the unperturbed case, they do find the correct values, doesn’t that mean that they just don’t have enough data in the future set rather than that the model lacks predictive power?

Vince Causey
November 2, 2011 8:19 am

Leif,
If I understand you correctly, you are saying that these parameterizations amount to no more than making obvervations about things like albedo, then plugging those values into the models.
You are saying that this is not the same as meant by the article. If I could give an example of what that kind of parameterization is, it would be as if they put the model through thousands of runs, and each time varying the albedo until the model output more and more closely resembles observed temperature data. A bit like training a neural network. Is that about right?

November 2, 2011 9:49 am

Frank K. says:
November 2, 2011 at 7:09 am
do you think your colleague at Stanford, Mark Jacobson, would be up for writing a short article for WUWT on the state-fo-the-art in GCM modeling?
I’ll try…
Vince Causey says:
November 2, 2011 at 8:19 am
A bit like training a neural network. Is that about right?
As far as I know, no, that is not about right. It would be right if the models were truly ‘tuned’, but they are not.

Vince Causey
November 2, 2011 1:31 pm

Leif,
Ok, I understand it now.

1 6 7 8 9 10 12