Comparing Climate Models – Part Two

Guest Essay by Geoffrey H Sherrington

See Part One of June 20th at

http://wattsupwiththat.com/2013/06/20/comparing-climate-models-part-one/

In Part One, there was a challenge to find missing values for a graph that vaguely resembled the spaghetti graph of comparisons of GCMs. Here is the final result.

RESULTS

See Sherrington_PartTwo (PDF) for pictures & tables.

Each series follows the score of a football team over a season of 23 weeks. There are 18 teams (16 would make the schedule more neat) and each team plays all other teams. The insertion of some rest weeks about time slots 11,12,13 leads to a smaller average climb for those few weeks since fewer points were allocated. Win = 4 points, draw = 2 points, loss = 0 points. There has to be a result, so there are no artificial interpolations in this number set. The data are not strictly independent, because in a given week pairs of teams play in a related way, so that the pair outcome can only be 4. Because there is some system in the numbers, there are ways to attack a solution that would not be available from random numbers.

This set was chosen because the numbers were established in advance by a process that is not random, but merely constrained. (There are related sets from other years).

The exercise was done to see the responses of WUWT readers. It’s not easy to choose a number set to demonstrate what follows, without giving the game away. At first I thought that the solution, going from weeks 20 to 23, was difficult to impossible. However, the correct solution was cracked by Arnost, June 20 at 4.42 am. Congrats on a fine job.

Was the solution simply numeric, or was probability involved? I think the latter, as there appears to be no way to assign a 2-point draw in the last time slot, to team 14 rather than to team 15. Arnost’s explanation at June 20 6.56 pm mentions distributions, so probability was involved. Either that, or the scant information in his reply is a cover-up for chat with a moderator who knew the answer or Arnost guessed the data source, which is on the Internet (just joking).

……………………………..

ERROR

Now to the hard part. How to compare a sporting score layout with an ensemble of GCMs? We will assume that for the GCM comparison, X-axis is time and Y-axis is temperature anomaly, as shown on the spaghetti graph.

I’m mainly interested in error, which by training I spilt into 2 types, being “bias” and “precision”. Bias measures the displacement of a number from the average of comparable estimates. Precision relates to the statistical probability that the number is firm and does not float about on replication. In the football example, the numbers have no bias or precision errors. They are as they are. Yet, on the other hand, knowing the rules in advance, one could say that the outcome of the top team playing the lowest team is likely to be a +4 for the former. This invokes probability, probability invokes statistics. People bet on these games. So how to analyse?

Take Series 1 and fit a linear least squares to its progress over 23 weeks. Constrain the Y intercept to 0 and derive y = 2.75x and R^2 = 0.9793. Do these numbers mean anything physical? Probably not. The goodness of fit and hence R^2 here depends on the sequence in which the games are played. If all the weaker teams were met before the stronger teams, we might have a curved response, but fitting a polynomial would not show much of interest either. I have seen it asserted (but do not think it likely) that comparisons of GCMs sometimes have involved just these approaches. They are not valid.

So, imagine that the football scores are GCM results. How does one compare the results of an ensemble? The football scores have an overall average that was plotted on the demo graph. But that was affected by the variable on the X-axis, when rest days were inserted. Is there a parallel in GCMs? Maybe random or unpredictable future events like a large volcanic eruption provide a parallel. Different modellers might program this to happen at different times, so the relation of results of runs is disturbed on the X-axis. Or, if such effects as volcanos are not incorporated, the times at which step changes or inflexions pass through the computation might differ from modeller to modeller, similar outcome. We become used to studying Y-axis values more often than X-axis.

CONSTRAINTS

For the football example, the average curve can be computed, a priori – without a game being played.

One needs to know only the points available each week divided by the number of teams playing. These are in the pre-printed games schedule. The curve of the average is constrained. Similarly, if the global temperature increase was constrained by modellers – by unconscious choice, by deliberate choice, or because of guidelines – to range between 0 and 5 deg C/century, average 2.5, the average curve could be found without doing runs by dividing the sum of constrained temperatures at any time by the number of modellers inputting it and reporting it.

Do such constraints enter GCM comparisons? On the Y-axis, the points available each period can be compared to the net energy balance of the globe (energy in – energy out) from time to time, which is translated to a temperature anomaly. On the X-axis, time is time unless the modellers place discontinuities at different times as discussed. The analogy would be valid, for example, if the net energy balance was constant; and differences between GCMs were the result of noise induced by the timing of the modelling assumptions. With thanks to Bob Tisdale, we can quote from an old and simple set of GIS FAQs –

Control runs establish the basic climate of the model. Control runs are long integrations where the model input forcings (solar irradiance, sulfates, ozone, greenhouse gases) are held constant and are not allowed to evolve with time. Usually the input forcings are held fixed either at present day values (i.e., for year 2000 or 2000 Control Run) or a pre-industrial values (i.e., for 1870 or 1870 Control Run). Note that in this context, “fixed” can have two different meanings. The solar forcing values are held fixed a constant, non varying number. The sulfate, ozone and greenhouse gases values, however, are fixed to continually cycle over the same 12-month input dataset every year. The CCSM is then run for an extended period of model time, 100s of years, up to about 1000 years, until the system is close to equilibrium (i.e., with only minor drifts in deep ocean temperature, surface temperature, top-of-the-atmosphere fluxes, etc).

Climate models are an imperfect representation of the earth’s climate system and climate modelers employ a technique called ensembling to capture the range of possible climate states. A climate model run ensemble consists of two or more climate model runs made with the exact same climate model, using the exact same boundary forcings, where the only difference between the runs is the initial conditions. An individual simulation within a climate model run ensemble is referred to as an ensemble member. The different initial conditions result in different simulations for each of the ensemble members due to the nonlinearity of the climate model system. Essentially, the earth’s climate can be considered to be a special ensemble that consists of only one member. Averaging over a multi-member ensemble of model climate runs gives a measure of the average model response to the forcings imposed on the model. http://web.archive.org/web/20090901051609/http://www.gisclimatechange.org/runSetsHelp.html

I stress that these control runs are constrained, as by the requirement to approach equilibrium. I don’t know if global climate is ever at equilibrium.

So, we can pose this question: Given that CGMs are constrained to a degree, does that constraint have adequate weight to influence the estimation of an ensemble average? Put another way, can we estimate the average curve of an ensemble from its constraints without even doing a run? Reader’s views are welcomed.

Steven Mosher has already commentedIt matters little whether the ensemble mean has statistical meaning because it has practical skill, a skill that is better than any individual model. The reason why the ensemble has more skill is simple. The ensemble of models reduces weather noise and structural uncertainty.” Here, I have not even got into statistics. I’m just playing with numbers and wondering aloud.

Here’s how I’d prefer to approach some of the statistics for GCM modellers. Keep in mind that I have never done a run and so am likely to be naïve. First, let’s consider one group of modellers. The group will (presumably) do many model runs. Some of these will fail for known reasons, like typos or wrong factors in the inputs. We can exclude runs where the results are obviously wrong, even to an uniformed observer. However, after a while, a single model lab will have acquired a number of runs that look OK. These are what should be submitted for ensemble comparison, all of them, not a single run that is picked for its conformity with other modellers’ results or any other subjective reason. Do modellers swap notes? I’ve not seen this denied. In any event, there is an abundance of pressure to produce a result that is in line with past predictions and future wishes.

In my football example, the series were numbered 1-18. This was on purpose; they were numbered in order of final highest to lowest, so that some ambiguities like Arnost up above encountered were given a helping hand. Some might have been influenced by assuming (correctly) that the numbering was a leak to help them to the right result. Like swapping results would be.

If a modeller submitted all plausible run results, then a within-lab variance could be calculated via normal statistical methods. If all modellers contributed a number of runs, then the variance of each modeller could be combined by the usual statistical methods of calculating propagation of errors for the types of distributions derived from them. This would give a between-modeller variance, from which a form of bias can be derived. This conventional approach would remove some subjectivity that must be involved, to its detriment, if the modeller chooses but one run. It is likely that it would also broaden the final error bands. Both precision and bias are addressed this way.

However, the true measure of bias requires a known correct answer, not just an averaged answer, so that has to be obtained either by hindcasting or by waiting a few years after a comparison to re-examine the results against recent knowns.

The football example does not have bias, so I can’t use it here to show the point. But the statistical part I’ve just discussed has to be viewed in terms of the simple outcomes that the football model produced, especially the ability to give an average without even doing a run. If there is a component of that type of numerical outcome in the ensemble comparisons, then the average is a meaningless entity, skill or no skill.

………………………………………………………

Finally, onto projecting. The exercise was to take 20 points in a series and project to 23 points. The projection is constrained, as you can deduce. In terms of GCMs, I do not know if they advance step by step, but if they do, then they are somewhat similar to the football exercise. Each new point you calculate is pegged to the one before it, as they would be in a hypothetical serial GCM. (Here, I confess to not reading enough of the background to the spaghetti graph).

I’m used to the lines being normalised at a particular origin, so they fan out from a point at the start, or somewhere. If they don’t come together at some stage, it is hard to know how to compare them. A bias in an initial condition will affect bias of subsequent points and their relation to other runs. This places much stress on the initial condition. The football analogy starts all runs at zero and avoids this problem. It is possible to project forward from a point by deriving a statistical probability from previous points, as shown by some reader solutions of the football example. It is also possible to project forward by doing a numerical analysis of the parameters that are used to create a point. The methods of error analysis need not be similar for these two approaches.

In concluding, it is possible to find number sets generated in different ways that are quite dissimilar in the ways in which error estimation, averaging and projection are applicable. It can be difficult to discern the correct choice of method. With GCM ensembles, there is an evident problem of divergence of models from measured values over the past decade or more. Maybe some of the divergence comes from processes shown by the football example. There is no refuge in the Central Limit theorem or the so-called Law of Large Numbers. They are not applicable in the cases I have seen.

……………………………………………………

IN CONCLUSION

It has long been known that a constrained input can lead to constrained answers.

What I am missing is whether the constraints used in GCM models constrain the answers excessively. I have seen no mention of this in my limited reading. Therefore, I might well be wrong to raise these matters in this way. I hope that readers will show that I am wrong, because if I am right, then there are substantial problems.

………………………………………………….

0 0 votes
Article Rating
38 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Matthew
June 24, 2013 1:21 pm

Are we talking about football or soccer?

June 24, 2013 1:23 pm

“Here’s how I’d prefer to approach some of the statistics for GCM modellers. Keep in mind that I have never done a run and so am likely to be naïve. First, let’s consider one group of modellers. The group will (presumably) do many model runs. Some of these will fail for known reasons, like typos or wrong factors in the inputs. We can exclude runs where the results are obviously wrong, even to an uniformed observer. However, after a while, a single model lab will have acquired a number of runs that look OK. These are what should be submitted for ensemble comparison, all of them, not a single run that is picked for its conformity with other modellers’ results or any other subjective reason. Do modellers swap notes? I’ve not seen this denied. In any event, there is an abundance of pressure to produce a result that is in line with past predictions and future wishes.”
so on the basis of no idea how long it takes to do a run, based on no idea how many runs are actually completed, based on a supposition that runs are selected only if they “look ok”
you insinuate some sort of collusion. Really. you could write an email any number of modelling guys. you could look at the errata for CMIP5.
before folks start thinking that a few hours work is enough to understand or critique models, you’d best get a sense of the thing you are dealing with

Steve Jones
June 24, 2013 1:38 pm

Impressive video. Shame the models disagree so widely amongst themselves and are diverging from reality. Otherwise, they are very good.

thelastdemocrat
June 24, 2013 1:58 pm

The overall constrained model is one thing. when I first reading AGW stuff and learngin how models were constructed, it occurred to me that the devil is in the feedback details. Think about modeling a negative feedback in nature. Negative fedbacks are obviously in play. Hotter air holds more moisture, leading to an atmosphere more conducive to allowing planetary temp to be released to space, and providing more “shade” to incoming radiation.
There must be more than one feedback in a decent climate model.
Each feedback has its slow-down point and its turn-around point.
If you get just one of those off a little, the entire simulation vectors off-kilter – unless you otherwise put a constraint/feedback loop on the endpoint, global temp.

David L. Hagen
June 24, 2013 2:14 pm

Interesting exploration on constraints.
See S. Fred Singer Overcoming Chaotic Behavior of Climate Models, Science & Environmental Policy Project July 2011.
Singer finds published GCM temperature trend outputs can vary by an order of magnitude. He advises that about 400 model run years are needed to reduce most of the chaotic variation. e.g. 20 runs for 20 year horizons, 40 runs for 10 years, 10 years for 40 years etc. That suggests convergence to input parameters.
Willis Eschenbach finds GCM output being a direct function of the input CO2, a climate sensitivity to CO2 parameter and a time lag. See Model Climate Sensitivity Calculated Directly From Model Results
Those results by Singer and Eschenbach appear to indicate that a large number of runs for a given GCM will give an output constrained to the projected CO2 based on an input climate sensitivity and a time lag. i.e. without major natural driven variation.
Contrast the findings of Murry Selby, presented April 18th in Hamburg, where the CO2 lags temperature at all frequencies, and the rate of change of CO2 is due to surface parameters of temperature and moisture. Some slides and summaries at The Hockey Schtickt. Graph availalbe in Salby’s Physics of the Atmosphere and Climate (2012) Sect. 1.6, Fig. 1.43 p 67 at Amazon preview.
The Right Climate Stuff team finds that climate science is not settled.
“Houston, we have a problem.”

Gary Hladik
June 24, 2013 2:42 pm

Steven Mosher says (June 24, 2013 at 1:23 pm): “before folks start thinking that a few hours work is enough to understand or critique models, you’d best get a sense of the thing you are dealing with”
Wow. Looks like the developers of that particular model have done a great job creating a program that paints lovely, if overly reddish, animations that resemble ocean currents! Kudos! This could be my new screensaver.
Have these guys considered careers in Hollywood? James Cameron, Dreamworks, or maybe Disney could use their talents, and then the taxpayers wouldn’t be forced to support them.

June 24, 2013 2:53 pm

David L. Hagen says: June 24, 2013 at 2:14 pm
“Willis Eschenbach finds GCM output being a direct function of the input CO2, a climate sensitivity to CO2 parameter and a time lag.”

Not so. He uses total forcing, and the volcanic and ENSO parts are much more important in tracking. But Mosh’s video shows how wrong it is to say that GCM output is a simple function of forcing. GCMs model (and output) many aspects of the Earth’s physics. And while their average surface air temperature may approximately follow a fairly simple forcing/response relation, the same is true of the Earth itself.

Richard LH
June 24, 2013 3:30 pm

Steven Mosher says:
June 24, 2013 at 1:23 pm
“before folks start thinking that a few hours work is enough to understand or critique models, you’d best get a sense of the thing you are dealing withbefore folks start thinking that a few hours work is enough to understand or critique models, you’d best get a sense of the thing you are dealing with”
How about this for a challenge to the models or the World at yearly periods??
Temperature data is a 4 year Scytale rather than 1?
Scytale c.f. http://en.wikipedia.org/wiki/Scytale
Evidence from the CET daily temperature series
Ref: http://www.metoffice.gov.uk/hadobs/hadcet/
Data : http://www.metoffice.gov.uk/hadobs/hadcet/data/download.html
http://www.metoffice.gov.uk/hadobs/hadcet/cetdl1772on.dat
shows that if the data is wrapped on a 1461 day Scytale rather than the normal 365 day one then distinct cyclic patterns emerge.
This is using the whole, true solar year of 1461 days, not the often used and rather human convenient one of “365 and carry the remainder to make an underlying 4 year cycle” of Leap Years.
365 is all very well but in doing so any 4 year pattern in the data will be destroyed or at the best degraded.
Then there is the sampling methodology. Again, the human convenient one of Months suffers from the problem of not being a regular sampling period. Sampling at varied 31, 30 and 28/29 day periods will add unnecessary digital sampling ‘noise’ to any underlying signal. Therefore it is useful that computers allow us to continue to use data at daily resolution throughout without down sampling at all when examining the data.
The temperature data as recorded will comprise inputs from various well understood sources.
1. Daily Averaged out to produce this data series
2. Weather CET rarely has individual periods of Weather which are > a few weeks
3. Yearly The normal seasonal pattern of Spring, Summer, Autumn and Winter
4. Climate Related to geographical features and long term cycles
We can safely filter out the Weather from the underlying Yearly seasonal pattern by using a 28 day filter on the data which is also usefully close to the human Month (though we are not actually ‘locked’ to any real human Calendar) and long enough to cover most likely Weather that has occurred.
The simplest way to get a low pass filter that has reasonable digital characteristics is to use a cascaded 3 pole central output running average of the data. This can be achieved by using a multiplier of 1.3371 between stages to produce the series 16, 21 and 28 days spans as being the ones required to give minimal ‘square wave’ sampling errors in the final 28 day output.
Given this requirement the first output data will be for 1st Feb 1772 and the last for 27th April 2013 when using the series from 1st Jan 1772 to 31st May 2013.
Having filtered out the Weather we now have to wrap it round the Scytale to reveal any underlying patterns. If we use a 1461 day one rather than a 365 day one then we will preserve any 4 year patterns that may be present.
From 1772 to 2013 there are 61, 4 year long cycles available (in the last one we are only on day 450 at present).
We can average out all of the days in those 1461 day long sequences (such as the 1st Feb of a Leap Year) down those 61 periods to produce a full daily average for each day in the 4 year cycle with daily precision in the output and using the whole 241 year long data set to produce a ‘Normal’ 4 year pattern (if any).
The first thing to notice when you do is that not all years are the same. There is, indeed a 4 year pattern to the data. Some Summers and Winters in the 4 years ‘Leap Year Cycle’ are not the same as others.
The range of offsets to the ‘normal’ annual pattern at a daily resolution has values of +0.4C to -0.3C. The Annual pattern is from +16C to +3C around which this 4 year pattern then distributes and Weather then distributes around that.
http://s1291.photobucket.com/user/RichardLH/story/73127
http://i1291.photobucket.com/albums/b550/RichardLH/CET-Recent4YearlycyclestoMay201328dayFiltered_zpsa27e6b93.png
http://i1291.photobucket.com/albums/b550/RichardLH/CET-AverageDailyAnnualwholerecord_zps1452beec.png
http://i1291.photobucket.com/albums/b550/RichardLH/CET-4YearAverageDailyAnomolyfromAnnual_zpsa1bbe38c.png
Perhaps all other Global temperature data should also be interpreted by using a 4 year Scytale rather than a 1 year one?

X Anomaly
June 24, 2013 3:33 pm

woohoo, I got the average @ 23 by only using 10 % of the data. From dumb luck alone the IPCC should be kicking goals, but it isn’t.
Curve fitting gets it right most of the time, and yet the IPCC gets it wrong. What does that say?
Shouldn’t the models and there supercomputers be better at predicting the future than a simple curve fit? And while correlation does not equal causation, it appears that the models have failed to even reach the all important correlation milestone. Therefore, you cannot reject the IPCC models because they have not even qualified.

Geoff Sherrington
June 24, 2013 5:12 pm

Can we please constrain comments more to the matter of whether subjective choice in GCM processes has adequate weight to affect outcomes and error estimation? The Australian Rules Football example shows it dominates the average curve. I,ve been close to scientific modelling for decades but I do not know the inside detail for specific GCMs.
Hi, Nick, Mosh. Any thoughts?

Richard LH
June 24, 2013 5:17 pm

Geoff: Sorry. I apologise for interrupting the thread .Difficult to get heard In the crowd.

June 24, 2013 5:18 pm

A climate model run ensemble consists of two or more climate model runs made with the exact same climate model, using the exact same boundary forcings, where the only difference between the runs is the initial conditions.
And from RC
Initial Condition Ensemble – a set of simulations using a single GCM but with slight perturbations in the initial conditions. This is an attempt to average over chaotic behaviour in the weather.
I can criticize climate models for multiple reasons, but I find modelling climate variability by changing the weather at the model start point incomprehensible. What possible rationale could there be for doing this?

Gary Hladik
June 24, 2013 6:09 pm

Philip Bradley says (June 24, 2013 at 5:18 pm): “I can criticize climate models for multiple reasons, but I find modelling climate variability by changing the weather at the model start point incomprehensible. What possible rationale could there be for doing this?”
Just a guess: Even if we have the starting conditions in every single grid cell, the measurements are uncertain to some extent, and may represent a composite over variable microclimates. See page 10 here, for example:
“Conditions within a single cell are assumed uniform, but practical experience indicates that both the weather and climate can be very different over a distance of 200 miles, particularly in mountainous or coastal regions. Computer simulations have shown that for areas with highly diverse climate, such as Britain, it is necessary to reduce cell size by a factor of about 7, to about 30 miles on a side, to accurately simulate some aspects of climate. Reducing the length and width of cells by a factor of 7 requires an increase in the computing requirement by a factor of almost 50, assuming that no reduction is made in the height of the cells. This is beyond the current capacity of even the best supercomputers.”

James from Arding
June 24, 2013 6:12 pm

Nick Stokes says:
June 24, 2013 at 2:53 pm
“And while their average surface air temperature may approximately follow a fairly simple forcing/response relation, the same is true of the Earth itself.”
And you want us to believe that the physics modelled in the GCMs accurately match the physics of “the Earth itself”?
The pretty animation shown by Steve Mosher while superficially looks like the animations of real data that Bob Tisdale and others have shown us is just that, superficially similar (and probably deliberately so). You want me to believe that each of those geographical boundaries and swirling vortices are not constrained by the modellers equations and starting parameters but rather are a true representation of the physical world? I need more evidence.
Until the modellers can lay out clearly for interested non-modellers the principles on which these things work and demonstrate much better skill at forecasting I will remain sceptical… sorry.
PS I don’t believe the “science is settled” – certainly not well enough to justify the carbon tax insanity that we have in Australia.

David L. Hagen
June 24, 2013 6:45 pm

Nic Lewis
Re: Willis “uses total forcing, and the volcanic and ENSO parts are much more important in tracking”
Thanks for correcting my cite to Will’s paper. Mae culpa. Correcting:
“Willis Eschenbach finds GCM output being a direct function of the input total forcing CO2, a climate sensitivity to total forcing CO2 parameter and a time lag. See Model Climate Sensitivity Calculated Directly From Model Results
Those results by Singer and Eschenbach appear to indicate that a prescribed number of model-year runs for a given GCM will give an output constrained to the projected total forcing CO2 based on an input climate sensitivity and a time lag. i.e. dominated by CO2 with minorwithout major natural driven variation.”
Willis states:

The equation I used has only two parameters. One is the time constant “tau”, which allows for the fact that the world heats and cools slowly rather than instantaneously. The other parameter is the climate sensitivity itself, lambda.

Willis digitized the global climate model run outputs graphed in Otto et al, “Energy budget constraints on climate response” Letter, Nature Geoscience Vol. 6, pp 415–416 (2013) doi:10.1038/ngeo1836 19 May 2013. Otto et al. obtained the forcings from Forster et al. 2013 Evaluating adjusted forcing and model spread for historical and future scenarios in the CMIP5 generation of climate models
Forster et al. had separated out historical into historical GHG, historical natural, and historical non-ghg.
The models weight highest the variation due to CO2, with natural contributions weighted much lower. The IPCC assumed almost all CO2 was due to fossil fuel combustion.
The model input forcings and using their climate sensitivities and the starting temperature should be able to predict the consequent model output and thus of the ensemble.
Geoff:
Re: “whether subjective choice in GCM processes has adequate weight to affect outcomes and error estimation”
Roy Spencer shows 73 models from 1979 to present are ALL running much hotter than actual global temperatures with a wide spread. They appear to have a wide range of climate sensitiviteis and to seriously underestimate errors, especially Type II systematic errors. i.e. the models all over estimate climate sensitivity and are likely missing major physics. e.g. cloud variations.
Look forward to input from those more involved.

June 24, 2013 6:45 pm

Gary Hladik says:
June 24, 2013 at 6:09 pm

That’s my understanding as well, but my question was about how weather uncertainty, which I agree is substantial, morphs into climate variability.
I think part of the/my problem is that ‘ensemble’ is used to mean a set of runs of a single model and a set of runs of multiple models. In the later case the variability is due to differing assumptions in different models. Which comes back to my point that the variability in a multi-model ensemble isn’t climate variability at all. It’s a numerical quantification of the differences in opinions between climate modellers.

ferd berple
June 24, 2013 6:48 pm

Nick Stokes says:
June 24, 2013 at 2:53 pm
the same is true of the Earth itself.
=====
nonsense. your statement is not factual. It is belief. acceptance without understanding. explanation without value. the divergence between model and realities is the proof.

R2Dtoo
June 24, 2013 7:02 pm

This won’t add much to the discussion- but how about a simple “cadillac methods on horse and buggy data”!

Nick Stokes
June 24, 2013 7:28 pm

Geoff,
“Hi, Nick, Mosh. Any thoughts?”
Firstly, I’d take up your point about being run to equilibrium. There isn’t an equilibrium, and in any case they are usually run with time-varying forcing. What they do try to do is have an initial period where the effect of initial conditions (often not well known) fades. Then it’s a matter of running for long enough (decades) to get a climate average, with complication of changing forcing.
A lot of people are running models now. Model E, CCSM, facilitate download and use – I’m sure lots of students are running them. Forcing regimes tend to be carefully prescribed and stuck to for a long running history.
But I think people greatly overestimate the extent to which users can implement controls to reach a “desired” outcome. I hear people talk of prescribing RH, feedbacks etc. You just can’t do that. Again, Mosh’s video gives an idea of the complexity involved. It’s a minute fraction of a real sequence. You just can’t meddle with it.
I’ve done a lot of this in CFD; these programs solve the Navier-Stokes equations, and a big thing on any code-writer’s mind is first just to keep them running. If they don’t conserve energy, they will explode – pixels everywhere. If they don’t conserve mass, they may well collapse. The only thing you have going for you is the physics, and you just have to get it right.
Someone asked above, why don’t they go to Hollywood? Well, people do. Fluid motion is now much better implemented by solving the real physics than by artists. We did a pioneering demo quite a few years ago; things have advanced a lot since. An example is here.

June 24, 2013 7:41 pm

Mosh writes “before folks start thinking that a few hours work is enough to understand or critique models, you’d best get a sense of the thing you are dealing with” and then shows a nice video.
We’re quite good at weathe but not so good at climate. The output certainly looks pretty but over the longer term its a fantasy and misses the things it needs to accurately portray what is happening.

Nick Stokes
June 24, 2013 7:55 pm

James from Arding says: June 24, 2013 at 6:12 pm
“You want me to believe that each of those geographical boundaries and swirling vortices are not constrained by the modellers equations and starting parameters but rather are a true representation of the physical world? I need more evidence.”

The video showed, among other things, the ENSO jet. Here is a hi-res version of the real thing. You can see that the actual vortex spacing matches up etc. As I said above, you just can’t fiddle these things. If you don’t have correct physics, you have nothing.
I don’t have a video of the S Africa currents that they started with, but here is Antarctica with the Drake passage. Alternatively, here are stills where you can zoom on S Africa.

Robert Prudhomme
June 24, 2013 8:06 pm

The initial conditions are determined by manipulating data across grid areas to represent the total data value for that grid which may or may not be representative . Then you assume that co2 drives climate contrary to observed real world data . Even if that were true ,, a butterfly in Brazil may blow your model results out of the water.

Editor
June 24, 2013 8:08 pm

Steven Mosher says: “before folks start thinking that a few hours work is enough to understand or critique models, you’d best get a sense of the thing you are dealing with…”
That was an animation of sea surface temperature anomalies based on the GFDL CM2.4. I don’t believe its outputs were archived in CMIP5.
Also, being able to simulate eddies is one thing. Being able to simulate ENSO, AMOC and teleconnections are another.
Regards

June 24, 2013 8:24 pm

Nick,
I understand that, but I’m looking into the quantitative effect of constraints on inputs (or outputs).
This thread was hatched some years back.
Paper: Douglass, D.H., J.R. Christy, B.D. Pearson, and S.F. Singer, 2007: A comparison of tropical temperature trends with model predictions. International Journal of Climatology, 27: doi:10.1002/joc.1651.
The paper gives results of a comparison of Global Circulation Models under the Coupled Model Intercomparison Project (CMIP-3). The paper is here – http://www.geoffstuff.com/DOUGLASS%20MODEL%20JOC1651.pdf Please refer to Table II and these comments. Your attention is drawn to the performance of the CSIRO Mark 3 model, coded 15, against the ensemble means at various pressure altitudes.
Surface 1000 925 850 700 600 500 400 300 250 200 150 100 Pressure altitude
163 213 174 181 199 204 226 271 307 299 255 166 53 Ensemble average
156 98 166 177 191 203 227 272 314 320 307 268 78 CSIRO result
The outcomes at altitudes of 600, 500 & 400 are very close. The CSIRO model is so good here that a person not used to working with numbers might be misled to believe that this type of performance is to be expected. The units are millidegrees C per decade.
On the other hand, the outcome might be a result of what might be termed “pal-assisted”. I am not saying that it is, I am merely giving but one example of what appearance might be seen by sharing information – if that is what was done – before the appropriate time. Persons used to calculating probabilities might wish to compute the odds of the three values in a row being so close to an average of nearly 20 other models, if they are unconstrained. Of course, science often progresses by close collaboration of participants with a common interest, but once data are affected by swapping notes, they are compromised and largely invalid for later stats comparison. They are no longer i.i.d.
Does constraint invalidate them?

X Anomaly
June 24, 2013 8:25 pm

Look at the GCM ingredients:
Arctic Oscillation
North Atlantic Oscillation
Pacific/North American pattern
Antarctic Oscillation
Madden-Julian Oscillation
El Nino/ Southern Oscillation
Indo Pacific Warm Pool
Ocean Currents and Thru flow
Water Vapor /Clouds
Blocking
Sea Ice
Solar
Aerosols
GHGs
CFCs
Volcanoes
Cosmic rays?
All of which are constrained /estimated by historical variability and projected values. The output (or average) of temperature is the weighted result of all this stuff (and more).

Nick Stokes
June 24, 2013 8:59 pm

Geoff,
“On the other hand, the outcome might be a result of what might be termed “pal-assisted”. “
There is a range where it is close to average, but also a large range where it isn’t. And it’s not obvious why palship should be so limited. There is a lot of autocorrelation between levels; I think one good match will be seen as quite a spread.
If you look at the whole table, there is a lot of scatter. CSIRO would have reported without knowledge of these other results, and I don’t see how anyone could possibly have predicted the ensemble mean.
But as I say, you just can’t rig a program like that. If you distort to get one thing right, ten others will go wrong.
I don’t think the results are meant to be iid 🙂 maybe i

June 24, 2013 11:12 pm

Nick,
Maybe there is a program that can be downloaded, that students use to solve the Nick-Stokes equations. Why should I download it if it accidently shows that 1+1 = 3? Good science, as you well know, is about delivering the goods. Mosh’s animation fits this category. Without a measure of how correct it is, if 1+1 does =2, it’s just graphics like I can make here, starting with very little knowledge of inputs.
I’m searching for why the goods are not being delivered by GCMs, starting with simple concepts & analogies. As is customary, all assumptions are held valid until shown not.
Remember that if the models are wrong in certain ways, the case for CO2 to explain the gap between model and actual disappears and it might be quite wrong to advocate cessation of fossil fuel burning.

June 24, 2013 11:31 pm

These are what should be submitted for ensemble comparison, all of them, not a single run that is picked for its conformity with other modellers’ results or any other subjective reason. Do modellers swap notes? I’ve not seen this denied. In any event, there is an abundance of pressure to produce a result that is in line with past predictions and future wishes.
There is no reason to invoke deliberate or concious bias. Unconcious confirmation bias is sufficient to explain the fact most model outputs that make their way into the the IPCC ensemble are around the claimed consensus for predicted CO2 driven warming.
I’ve called climate models, confirmation bias on steroids. There is also selection bias by the IPCC.

X Anomaly
June 25, 2013 12:46 am

Here is gfdl CM 2.6, via Issac Held
http://www.gfdl.noaa.gov/flash-video?vid=cm26_v5_sst&w=940
For all it’s binary finery…it’s BS. Give me next years SST mean for the Nino 3.4 region. A mere 1.8 % of the total sea surface area.

RichardLH
June 25, 2013 1:35 am

Geoff: I don’t doubt that the models COULD be correct. The probable timescale required to PROVE they are (or not) is only a few years now.
There are good reasons to suggest that one of the reasons they are wrong is that they are modeling based on figures produce with a 365 day Scytale (the rod around which you wrap the record) rather than the correct 1461 day one of a true solar year.
This destroys any 4 year pattern in the record (and yes there is a pattern (in the CET at least).
That is what I was trying to point out.

RichardLH
June 25, 2013 2:08 am

X Anomaly says:
June 25, 2013 at 12:46 am
“Give me next years SST mean for the Nino 3.4 region. A mere 1.8 % of the total sea surface area.”
How about a prediction of UAH Global for the next 18 months instead?
http://s1291.photobucket.com/user/RichardLH/story/70051

Frank K.
June 25, 2013 5:36 am

“A lot of people are running models now. Model E, CCSM, facilitate download and use…”
Model E is a piece of junk! Nobody knows what equations it’s solving…

David L. Hagen
June 25, 2013 5:53 am

Geoff
Re Climate sensitivity
Willis in his latest The Thousand-Year Model notes:

[4] One curious aspect of this result is that it is also well known [Houghton et al., 2001] that the same models that agree in simulating the anomaly in surface air temperature differ significantly in their predicted climate sensitivity. The cited range in climate sensitivity from a wide collection of models is usually 1.5 to 4.5C for a doubling of CO2, where most global climate models used for climate change studies vary by at least a factor of two in equilibrium sensitivity. . . .
Sensitivity (transient or equilibrium) is directly proportional to the ratio of the trend of the temperature to the trend of the forcing. . . .
Strange but true, functionally it turns out that all that the climate models do to forecast the global average surface temperature is to lag and resize the forcing. . . .
They’ve left out the most important part, the control mechanism composed of the emergent thermoregulatory phenomena like thunderstorms, so their models don’t work anything like the real climate, but the core physics is right. . . .
Over the last decades, the modelers will tell you that they’ve gotten better and better at replicating the historical world. And they have, because of evolutionary tuning.

From Fig. 2, the 1000 year Crowley model has much lower climate sensitivity than most of the other models.
So run tests for 1000 years instead of 100 years to weed out the worst performers?
Or expose the presuppositions!

David L. Hagen
June 25, 2013 6:01 am

Nick Stokes
Re: “If they don’t conserve mass, they may well collapse.”
CO2 Mass Conservation
Are there any models that accurately replicate the variations in CO2 correlating with surface temperature (2 year moving averaged) with CO2 mass conservation as shown by Murry Salby (links above)?
Are there any models that show the increasing CO2 variation and differences in phase from South Pole to North Pole as identified by Fred H. Haynie in The Future of Global Climate Change?

Robany Bigjobz
June 25, 2013 9:12 am

From the article: “The group will (presumably) do many model runs. Some of these will fail for known reasons, like typos or wrong factors in the inputs. We can exclude runs where the results are obviously wrong, even to an uniformed observer.”
First off, excluding “obviously wrong” runs is cherry picking data. If the model is accurately implementing the physics then it won’t produce any “obviously wrong” runs. “Obviously wrong” runs indicate the model is wrong and can be discounted.
Secondly, Roy Spencer demonstrated that all of those 73 models he examined are “obviously wrong” because they don’t match the data and thus are all wrong and can all be discounted.

Nick Stokes
June 25, 2013 1:03 pm

David L. Hagen says: June 25, 2013 at 6:01 am
“Are there any models that accurately replicate the variations in CO2 correlating…”

I think the answer is, I don’t know. But more precise links would help. I don’t try much with Salby’s stuff because he doesn’t seem to be able to write a straightforward document explaining it. And the other is a very long ramble.

eyesonu
June 25, 2013 5:31 pm

Nick Stokes says:
June 24, 2013 at 7:28 pm
===================
The link at the end of your comment that models sinking the Titanic. What relevance does that have to the thread? Any observations there? Forgot to turn out the lights as the ship went down. I would guess this is about the caliber of the GCM’s. The video of ocean temps you linked to is cute. Observations of assumptions? Please don’t bother to respond to this and disrupt the thread.

eyesonu
June 26, 2013 4:16 pm

In my post above I should have noted that the link/graphic on ocean temps would have been provided by Steven Mosher. e.g. …”The video of ocean temps you linked to is cute. Observations of assumptions? …”