One Model, One Vote

Guest Post by Willis Eschenbach

The IPCC, that charming bunch of United Nations intergovernmental bureaucrats masquerading as a scientific organization, views the world of climate models as a democracy. It seems that as long as your model is big enough, they will include your model in their confabulations. This has always seemed strange to me, that they don’t even have the most simple of tests to weed out the losers.

Through the good offices of Nic Lewis and Piers Forster, who have my thanks, I’ve gotten a set of 20 matched model forcing inputs and corresponding surface temperature outputs, as used by the IPCC. These are the individual models whose average I discussed in my post called Model Climate Sensitivity Calculated Directly From Model Results. I thought I’d investigate the temperatures first, and compare the model results to the HadCRUT and other observational surface temperature datasets. I start by comparing the datasets themselves. One of my favorite tools for comparing datasets is the “violin plot”. Figure 1 show a violin plot of a random (Gaussian normal) dataset.

Figure 1. Violin plot of 10,000 random datapoints, with mean of zero and standard deviation of 0.12

You can see that the “violin” shape, the orange area, is composed of two familiar “bell curves” placed vertically back-to-back. In the middle there is a “boxplot”, which is the box with the whiskers extending out top and bottom. In a boxplot, half of the data points have a value in the range between the top and the bottom of the box. The “whiskers” extending above and below the box are of the same height as the box, a distance known as the “interquartile range” because it runs from the first to the last quarter of the data. The heavy black line shows, not the mean (average) of the data, but the median of the data. The median is the value in the middle of the dataset if you sort the dataset by size. As a result, it is less affected by outliers than is the average (mean) of the same dataset.

So in short, a violin plot is a pair of mirror-image density plots showing how the data is distributed, overlaid with a boxplot. With that as prologue, let’s see what violin plots can show us about the global surface temperature outputs of the twenty climate models.

For me, one of the important metrics of any dataset is the “first difference”. This is the change in the measured value from one measurement to the next. In an annual dataset such as the model temperature outputs, the first difference of the dataset is a new dataset that shows the annual CHANGE in temperature. In other words, how much warmer or cooler is a given year’s temperature compared to that of the previous year? In the real world and in the models, do we see big changes, or small changes?

This change in some value is often abbreviated by the symbol delta,”∆”, which means the difference in some measurement compared to the previous value. For example, the change in temperature would be called “∆T”.

So let’s begin by looking at the first differences of the modeled temperatures, ∆T. Figure 2 shows a violin plot of the first difference ∆T of each of the 20 model datasets, as numbers 1:20, plus the HadCRUT and random normal datasets.

Figure 2. Violin plots of 20 climate models (tan), plus the HadCRUT observational dataset (red), and a normal gaussian dataset (orange) for comparison. Horizontal dotted lines in each case show the total range of the HadCRUT observational dataset. Click any graphic to embiggen.

Well … the first thing we can say is that we are looking at very, very different distributions here. I mean, look at GDFL [11] and GISS [12], as compared with the observations …

Now, what do these differences between say GDFL and GISS mean when we look at a timeline of their modeled temperatures? Figure 3 shows a look at the two datasets, GDFL and GISS, along with my emulation of each result.

Figure 3. Modeled temperatures (dotted gray lines) and emulations of two of the models, GDFL-ESM2M and GISS-E2-R. The emulation method is explained in the first link at the top of the post. Dates of major volcanoes are shown as vertical lines.

The difference between the two model outputs is quite visible. There is little year-to-year variation in the GISS results, half or less than what we see in the real world. On the other hand, there very large year-to-year variation in the the GDFL results, up to twice the size of the largest annual changes ever seen in the observational record …

Now, it’s obvious that the distribution of any given model’s result will not be identical to that of the observations. But how much difference can we expect? To answer that, Figure 4 shows a set of 24 violin plots of random distributions, with the same number of datapoints (140 years of ∆T) as the model outputs.

Figure 4. Violin plots of different random datasets with a sample size of N = 140, and the same standard deviation as the HadCRUT ∆T dataset.

As you can see, with a small sample size of only 140 data points, we can get a variety of shapes. It’s one of the problems in interpreting results with small datasets, it’s hard to be sure what you’re looking at. However, some things don’t change much. The interquartile distance (the height of the box) does not vary a lot. Nor do the locations of the ends of the whiskers. Now, if you re-examine the GDFL (11) and GISS (12) modeled temperatures (as redisplayed in Figure 5 below for convenience), you can see that they are nothing like any of these examples of normal datasets.

Here’s a couple of final oddities. Figure 5 includes three other observational datasets—the GISS global temperature index (LOTI), and the BEST and CRU land-only datasets.

Figure 5. As in Figure 2, but including the GISS, BEST, and CRUTEM temperature datasets at lower right. Horizontal dotted lines show the total range of the HadCRUT observational dataset.

Here, we can see a curious consequence of the tuning of the models. I’d never seen how much the chosen target affects the results. You see, you get different results depending on what temperature dataset you choose to tune your climate model to … and the GISS model [12] has obviously been tuned to replicate the GISS temperature record [22]. Looks like they’ve tuned it quite well to match that record, actually. And CSIRO [7] may have done the same. In any case, they are the only two that have anything like the distinctive shape of the GISS global temperature record.

Finally, the two land-only datasets [23, 24 at lower right of Fig. 5] are fairly similar. However, note the differences between the two global temperature datasets (HadCRUT [21] and GISS LOTI [22]), and the two land-only datasets (BEST [23] and CRUTEM [24]). Recall that the land both warms and cools much more rapidly than the ocean. So as we would expect, there are larger annual swings in both of those land-only datasets, as is reflected in the size of the boxplot box and the position of the ends of the whiskers.

However, a number of the models (e.g 6, 9, & 11) resemble the land-only datasets much more than they do the global temperature datasets. This would indicate problems with the representation of the ocean in those models.

Conclusions? Well, the maximum year-to-year change in the earth’s temperature over the last 140 years has been 0.3°C, for both rising and falling temperatures.

So should we trust a model whose maximum year-to-year change is twice that, like GFDL [11]? What is the value of a model whose results are half that of the observations, like GISS [12] or CSIRO [7]?

My main conclusion is that at some point we need to get over the idea of climate model democracy, and start heaving overboard those models that are not lifelike, that don’t even vaguely resemble the observations.

My final observation is an odd one. It concerns the curious fact that an ensemble (a fancy term for an average) of climate models generally performs better than any model selected at random. Here’s how I’m coming to understand it.

Suppose you have a bunch of young kids who can’t throw all that well. You paint a target on the side of a barn, and the kids start throwing mudballs at the target.

Now, which one is likely to be closer to the center of the target—the average of all of the kids’ throws, or a randomly picked individual throw?

It seems clear that the average of all of the bad throws will be your better bet. A corollary is that the more throws, the more accurate your average is likely to be. So perhaps this is the justification in the minds of the IPCC folks for the inclusion of models that are quite unlike reality … they are included in the hope that they’ll balance out an equally bad model on the other side.

HOWEVER … there are problems with this assumption. One is that if all or most of the errors are in the same direction, then the average won’t be any better than a random result. In my example, suppose the target is painted high on the barn, and most of the kids miss below the target … the average won’t do any better than a random individual result.

Another problem is that many models share large segments of code, and more importantly they share a range of theoretical (and often unexamined) assumptions that may or may not be true about how the climate operates.

A deeper problem in this case is that the increased accuracy only applies to the hindcasts of the models … and they are already carefully tuned to create those results. Not the “twist the knobs” kind of tuning, of course, but lots and lots of evolutionary tuning. As a result, they are all pretty good at hindcasting the past temperature variations, and the average is even better at hindcasting … it’s that dang forecasting that is always the problem.

Or as the US stock brokerage ads are required to say, “Past performance is no guarantee of future success”. No matter how well an individual model or group of models can hindcast the past, it means absolutely nothing about their ability to forecast the future.

Best to all,

NOTES:

DATA SOURCE: The model temperature data is from the study entitled Evaluating adjusted forcing and model spread for historical and future scenarios in the CMIP5 generation of climate models, by Forster, P. M., T. Andrews, P. Good, J. M. Gregory, L. S. Jackson, and M. Zelinka, 2013, Journal of Geophysical Research, 118, 1139–1150, provided courtesy of Piers Forster. Available as submitted here, and worth reading.

DATA AND CODE: As usual, my R code is a snarl, but for what it’s worth it’s here , and the data is in an Excel spreadsheet here.

0 0 votes

Article Rating

88 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

cnxtim

November 21, 2013 6:10 pm

Oh dear, the hypocrisy is rife, these graphs are so out of kilter with each other as to make any findings from using them collectively totally absurd, and surely the only thing that matters is accuracy, proven over time – anything else is just plain old GiGo – thanks great post…

DocMartyn

November 21, 2013 6:33 pm

What graphics package did you use for the violin plots?
They look lovely BTW

OssQss

November 21, 2013 6:34 pm

Willis, remember those analog kids will learn and get better as time goes by.
By comparison, our digital models can’t.
They currently have a fundamental CO2 issue that their analog masters cannot overcome with adding code. Stuck in the mud, if you will.
Just my take.
As always, thanks for the good read.
PS= I will never look at old Christmas decorations the same way again. Tis the season >

Luke Warmist

November 21, 2013 6:35 pm

Thanks Willis. I always enjoy your take on data sets, and despair at my own lack of imagination.

Jquip

November 21, 2013 6:39 pm

OssQss: “By comparison, our digital models can’t.”
Exactly the reason underlying the self-correction of science. Given two things, or more, toss out the worst and try again. If we aren’t doing that, then there’s no point to the endeavour.

jorgekafkazar

November 21, 2013 6:40 pm

Ja, another great post. Well done. The models don’t simulate climate; they only emulate it, in that they wiggle up and down, just like climate. They don’t add to our knowledge; they take away. And they cost millions, so far, trillions, ultimately.

Willis Eschenbach

Author

November 21, 2013 6:44 pm

DocMartyn says:
November 21, 2013 at 6:33 pm

What graphics package did you use for the violin plots?
They look lovely BTW

It’s all done in R on my Mac …
w.

ferdberple

November 21, 2013 7:05 pm

lots and lots of evolutionary tuning
================
exactly. Any model that wants to survive must tell the model builder what the model builder expects to hear, or the model will be replaced by a modified model. Over time this evolutionary (survival of the fittest) results in models that are very good at predicting what the model builder expects to hear. however, they have no more ability to tell the future than the model builder.

ferdberple

November 21, 2013 7:21 pm

increased accuracy only applies to the hindcasts of the models … and they are already carefully tuned to create those results
==========================
the climate models confuse hindcasting with training. When you let the model see the past you are training. Of course the model can memorize the past and repeat it. A parrot can do the same.
Hindcasting occurs when you don’t show the model the past and it can predict it anyways. This proves that the model likely has some skill. To date no climate model has demonstrated this ability. Let me repeat, no climate model to date has demonstrated any skill at hindcasting.
For example, given the current position of a planet and its current motion, gravitational models can predict with some accuracy its past position. We can verify this using historical records, which gives us great confidence in the accuracy of gravitational models. We don’t need to wait to see if the model is accurate in the future, because it has correctly predicted the past without knowing the past. This gives us confidence the models can predict the future.
However, what if we trained the models by telling them the past position of the planet. Would this give us any confidence in the ability of the model to make predictions about the past or future? No, because all the model need do is parrot what it has learned. This requires no ability to predict, it requires the ability to mimic. That is what we see with climate models, they mimic the builders, they don’t have any skill to predict.

old engineer

November 21, 2013 7:41 pm

Willis, thought provoking post as usual. I enjoyed the education on violin plots.
Your comment about the kids throwing mud balls at target on barn reminded me of the poem “Hiawatha Designs an Experiment” by Maurice G. Kendall, quoted in part below:
“Hiawatha, mighty hunter
He could shoot ten arrows upwards
Shoot them with such strength and swiftness
That the last had left the bowstring
Ere the first to earth descended.
This was commonly regarded
As a feat of skill and cunning.
One or two sarcastic spirits
Pointed out to him, however,
That it might be much more useful
If he sometimes hit the target.
Why not shoot a little straighter
And employ a smaller sample?
Hiawatha, who in college
Majored in applied statistics
Consequently felt entitled
To instruct his fellow men on
Any subject whatsoever,
Waxed exceedingly indignant
Talked about the law of error,
Talked about truncated normals,
Talked of loss of information,
Talked about his lack of bias
Pointed out that in the long run
Independent observations
Even though they missed the target
Had an average point of impact
Very near the spot he aimed at
(with possible exception
of a set of measure zero)…..”
Could it be that the IPCC has hired Hiawatha?

FrankK

November 21, 2013 7:42 pm

Interesting post W.
As with most models the “tuning” (as some suggest – fudging) means very little as the result is not unique and a good hindcast “fit” doesn’t mean the model is valid. With climate models its worse because they are all based on the premise that CO2 is the prime driver of temperature. Hence their predictions are no better than a guess.

Steven Mosher

November 21, 2013 7:57 pm

the violin plots are done with vioplot package ( assuming willis uses what I use )
Willis:
“So should we trust a model whose maximum year-to-year change is twice that, like GFDL [11]? What is the value of a model whose results are half that of the observations, like GISS [12] or CSIRO [7]?”
I’m glad you plotted the variability.
Now folks need to go re look at Santer and the 17 year goal posts.

Tim

November 21, 2013 8:00 pm

FEA was also originally assumed to be an unreliable tool, but now everybody and his dog use’s it when it come to product design.
The problem with climate models (and often FEA too) is the inputs and the constraints. I don’t believe we know of half of them and without that knowledge there’s no point in doing the analysis. Its like designing a bridge without knowing what material you are building it out of, how long it needs to be and what weight it needs to support.

Nick Stokes

November 21, 2013 8:08 pm

“The difference between the two model outputs is quite visible. There is little year-to-year variation in the GISS results, half or less than what we see in the real world. On the other hand, there very large year-to-year variation in the the GDFL results, up to twice the size of the largest annual changes ever seen in the observational record …”
GISS Model E wasn’t in the Forster et al data. I see that when you have used it before, you got results from this site. That’s an ensemble mean of five runs. So variation is down.

don

November 21, 2013 8:12 pm

Interesting. I don’t see a violin plot (the classical full figured woman as it were). I see diamonds. I see a few flying saucers. I even see one dumpy pear and a few spinning tops. It’s a veritable Rorschach test and too absurd to really exist. I must be projecting. If it were not for humans, the absurd wouldn’t exist.

Willis Eschenbach

Author

November 21, 2013 8:25 pm

Nick Stokes says:
November 21, 2013 at 8:08 pm

“The difference between the two model outputs is quite visible. There is little year-to-year variation in the GISS results, half or less than what we see in the real world. On the other hand, there very large year-to-year variation in the the GDFL results, up to twice the size of the largest annual changes ever seen in the observational record …”

GISS Model E wasn’t in the Forster et al data. I see that when you have used it before, you got results from this site. That’s an ensemble mean of five runs. So variation is down.

Thanks, Nick. Actually, while one GISS model wasn’t included in the Forster et al data, the GISS result displayed above was just another dataset in the group of forcing/result pairs that I got from Piers Forster. And the reason for the lack of variation in the result is the lack of variation in the forcing used by GISS. I’m gonna write about this when I get to it, but here’s the money graph …

This shows the forcing (in W/m2) of all of the models except one (inmcm4, which doesn’t include volcanoes). You can see the lack of variation in the GISS forcing and the large variation in the GFDL-ESM2M forcing, each of which is faithfully reflected in their respective temperature results.
w.

Willis Eschenbach

Author

November 21, 2013 8:29 pm

Steven Mosher says:
November 21, 2013 at 7:57 pm

the violin plots are done with vioplot package ( assuming willis uses what I use )

Hey, Steven, always good to hear from you. Indeed I use vioplot, but I don’t use their boxplot overlay. Instead I overlay my own boxplot, which shows the relationships more clearly (to my eye at least).
w.

wbrozek

November 21, 2013 8:34 pm

Well, the maximum year-to-year change in the earth’s temperature over the last 140 years has been 0.3°C, for both rising and falling temperatures.
Is it possible some models tried to model the satellite data?
With RSS for example, 1997 was 0.103, 1998 was 0.549 and 1999 was 0.103 again. So it rose and dropped 0.446.

ferdberple

November 21, 2013 8:36 pm

old engineer says:
November 21, 2013 at 7:41 pm
Your comment about the kids throwing mud balls at target on barn reminded me of the poem “Hiawatha Designs an Experiment
================
a dart board shows the same pattern. Throw enough darts at the board and the average will be a bulls-eye. Throw enough and some may even hit the bulls eye. It doesn’t mean you have any skill at throwing bulls-eyes.
throw enough climate models at the bulls eye and on occasion some will accidentally hit the bulls eye, and the rest will be scattered about. and all the models will demonstrate the same skill as a dart board at predicting future climate.
There is a 1/3 chance the future will be hotter, 1/3 chance it will be colder, and 1/3 chance it will be unchanged. Randomly pick any set of point in the past and this will be true. Thus, if you were to forecast any time in the future, this would also hold true and it would be a foolish bet to forecast otherwise.
To argue that humans are “different” is a nonsense. The earth has suffered much worse catastrophes, yet the 1/3 rule holds. The problem is that we believe the future to be deterministic, but it isn’t. God has a sense of humor and is the biggest practical joker in the Universe.

Steven Mosher

November 21, 2013 9:30 pm

cool willis a couple of folks have posted mods to the vioplot package, you might consider it

Steven Mosher

November 21, 2013 9:35 pm

Wilis as I recall FGOALS actually includes volcanic forcing summed into its TSI forcing. did you get each component or just the sum of forcings.

dalyplanet

November 21, 2013 10:09 pm

I always enjoy your graphical posts Willis, thank you again.

Alan Smersh

November 21, 2013 10:23 pm

For the unenlightened the full
distribution packages are here
for MAC, Windows, & Linux …..
http://cran.r-project.org/
But “R” is quite complex, and so we maybe need a tool to learn “R”
and capable of operating a remote “R” server to do complex plots.
R Instructor is an Android and iPhone, iPad and iPod Touch
application that uses plain, non-technical language and over
30 videos to explain how to make and modify plots, manage data
and conduct both parametric and non-parametric statistical tests.
(other instructional packages available) Costs less than 5 Bucks !
http://www.rinstructor.com/
Or Read The Manuals here, but there’s over 3,500 pages …
http://cran.r-project.org/manuals.html
Have fun with all that, people !

AndyG55

November 21, 2013 10:31 pm

Again I will say..
Any model that hindcasts to fit pre-1979 Giss or HadCrud.. will ALWAYS create an overestimate of future temperatures.
They are stuck in a Catch 22 situation.
Either get rid of all the pre-1979 adjustments and have some sort of hope of some sort of realist projection (but abolishing the warming trend that the climate bletheren rely on) … or
Leave it as it is an keep producing models that greatly overestimate.

george e. smith

November 21, 2013 11:02 pm

I didn’t catch the reason for the “violin” plots. Isn’t all of the information contained in either half of the drawing, and in the usual probability graph orientation ?

Alan Smersh

November 21, 2013 11:27 pm

@george e. smith
With the violin plot, my take is that, for the Human observer, discrepancies are immediately made most obvious to the naked eye at a glance. Whereas with the usual graphical plot of a curve or series of curves, they all begin to look very similar. The Human brain, being a pattern recognition machine, in a sense, is very adept at seeing slight differences in a patter, such as the Violin Type of Plot, but not so good at seeing differences in the very similar curves, or sets of curves.
I could be wrong, but this is my take.

tty

November 21, 2013 11:44 pm

Willis, have You done a plot of all the models together? The reason I ask is that the “ensemble” is usually shown with an two-sigma envelope that is said to be equal to 95% probability.
Now this is only true if the underlying data is normally distributed, and it would be interesting to know how true this assumption is, particularly as climate data usually does not follow a normal distribution.

Otto Weinzierl

November 21, 2013 11:53 pm

Sorry to correct you but the “whiskers” extend 1.5 times of the height of the box. Data beyond that are classified as outliers.

tonyb

Editor

November 22, 2013 12:16 am

Volcanic activity may cause warming OR cooling depending on the location of the volcano and the distance from it where the temperature effect is observed. See figure 1
http://www.pages.unibe.ch/products/scientific_foci/qsr_pages/zielinski.pdf
how long the effect is going to last is the subject of continued debate. In most cases the leffect appears to be pretty short lived.
tonyb

Willis Eschenbach

Author

November 22, 2013 12:27 am

Otto Weinzierl says:
November 21, 2013 at 11:53 pm

Sorry to correct you but the “whiskers” extend 1.5 times of the height of the box. Data beyond that are classified as outliers.

I invite people to pull out a ruler and measure the whisker on Figure 1 and see how long it is compared to the height of the box … who you gonna believe, Otto, or your own lying eyes?
I’m talking the reality of this particular analysis, Otto, and you’re talking theory. While it is true that the whiskers are usually set to 1.5 time the IQR, you can put them where you want them. I’ve set them to a length of 1 x IQR for this analysis because to me, it is more revealing for comparing this particular group of datasets …
My best to you,
w.

Jean Parisot

November 22, 2013 12:57 am

“DATA AND CODE: As usual, my R code is a snarl, but for what it’s worth it’s here , and the data is in an Excel spreadsheet here.”
Thanks, that sentence looks like something “professional” scientist should write more often.

Jimbo

November 22, 2013 1:33 am

Would it not be a good idea for the IPCC to look at say 5 of the models that came closest to temperature observations, then compare them to the 5 models that most diverged with temperature observations; then look under the hood for the differences in code / assumptions / inputs etc. Might this not indicate why most of the models fail? Has this already been done by the IPCC? If yes what was the result?

EternalOptimist

November 22, 2013 1:38 am

I always feared this would happen. The CAGW debate has erupted into Violins

knr

November 22, 2013 1:47 am

that they don’t even have the most simple of tests to weed out the losers.
Well they have to publish something , even if its rubish , so to be fair if they were to do this they would have nothing left . So you can see why they don’t .

climatereason

Editor

November 22, 2013 1:47 am

Further to my 12.16 above.
Am I reading the two graphs correctly that show major eruptions and temperatures?
Surely there was already a temperature down turn BEFORE the eruptions?
I did some research previously on the 1258 ‘super volcano’ which was the subject of much discussion by Michael Mann and is said to have precipitated the LIA.
However, the temperature/weather had already deteriorated in the decade prior to the eruption but warmed up again a year after.
tonyb

jimmi_the_dalek

November 22, 2013 2:42 am

“ you get different results depending on what temperature dataset you choose to tune your climate model to ”
In another recent thread Nick Stokes said that the models were not fitted to the temperature record,
“GCMs are first principle models working from forcings. However, they have empirical models for things like clouds, updrafts etc which the basic grid-based fluid mechanics can’t do properly. The parameters are established by observation. I very much doubt that they fit to the temperature record; that would be very indirect. Cloud models are fit to cloud observations etc.”
so which is it?

David A

November 22, 2013 4:20 am

Jimbo says:
November 22, 2013 at 1:33 am
Would it not be a good idea for the IPCC to look at say 5 of the models that came closest to temperature observations, then compare them to the 5 models that most diverged with temperature observations; then look under the hood for the differences in code / assumptions / inputs etc. Might this not indicate why most of the models fail? Has this already been done by the IPCC? If yes what was the result?
============================================
Your suggestion makes way to much since. RGB has posted on the inane practice of using the ensemble model mean for the IPCC predictions, when all the models run wrong in the warm direction. In that since they are informative of something very basic they have wrong. If they follow your suggestion they will likely find that by tuning way down “climate sensitivity” to CO2, they can produce far more accurate predictions.

Max™

November 22, 2013 4:25 am

I always liked calling them manta plots myself, but I’m curious why anyone thinks statistical examinations have any place in modern climate science?

Jack Savage

November 22, 2013 4:28 am

http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds ????

David A

November 22, 2013 4:36 am

I hate it when a typo ruins a commonsense comment. “Your suggestion makes way to much since”
Drat- sense. David A says:
November 22, 2013 at 4:20 am

Billy Liar

November 22, 2013 4:38 am

In Figure 2 it looks like HadCRUT is a normal distribution by design – could that be true?

Nick Stokes

November 22, 2013 4:47 am

jimmi_the_dalek says: November 22, 2013 at 2:42 am
“In another recent thread Nick Stokes said that the models were not fitted to the temperature record,…
so which is it?”
We’ll I’d invite people who think they are so fitted, or tuned, to say how they think it is done, and offer their evidence.
Here’s my perspective. I’ve never been involved in writing a GCM. But I’ve spent a lot of time trying to get my CFD programs to do the right thing. Here’s how it goes:
1. Half the battle is just getting them to run to completion without going haywire. They are very complicated, and really the only thing you have going for you is the physics. It has to be consistent. Not, at that stage, necessarily totally right. But there’s a very narrow path to thread; if it runs, then you’ve surely got a lot of things right.
2. Then you check to see if various conservation relations hold. You’ve probably been doing this anyway. Again, if you’ve got that right, that’s reassuring about lots of physics. And if you haven’t, you probably haven’t got this far.
3. Then you check all boundary conditions to see if they are doing the right things. You’ll explicitly calculate stresses and fluxes etc to see if they are a, reasonable, and b, satisfy the equations you meant them to.
4. You check all the sub-models. That includes things like clouds, updrafts, gas exchange, precipitation. That’s when you’ll vary their parameters – not in response to something diffuse like average temperature, but something specifically responsive – ie cloud properties etc.
5. Then you might look at how they match global averages etc. But there’s little chance of tuning. There are not many parameters left – maybe things like thermal conductivity. Diffusive properties. And there’s not much you can tinker with without risking going back into collapse mode. Remember, the physics is the only thing keeping your program functioning. You’ve either got it right or not.
CFD programs have a mind of their own. You end up just trying to negotiate with them..

Brian H

November 22, 2013 4:47 am

My eyes aren’t so good, but I should be able to find these “horizontal dotted lines” you keep referring to. But I can’t. Where are they? To help you search, this is what I’m expecting to see: – – – – – – or . . . . . . or ………

Speed

November 22, 2013 4:48 am

Two pyramids, two masses for the dead, are twice as good as one; but not so two railways from London to York.
John Maynard Keynes
I get his point but he ignores the value of competition. With railroads, competition produces faster, more frequent and cheaper trips. With models, competition to produce the best model should improve the products but for competition to work it needs to produce winners and losers. The field is only improved when the losers are sent home.
ferdberple wrote, “a dart board shows the same pattern. Throw enough darts at the board and the average will be a bulls-eye.”
Isn’t this what the model aggregators are doing — throwing models at the board and claiming that the average is the bulls-eye? Or close enough? Superficially, this is a compelling argument and it doesn’t force anyone or any organization to make a decision or withdraw funding or work hard or quickly or take risks to get better. No one gets sent home.

David L. Hagen

November 22, 2013 4:55 am

Thanks Willis
Always insightful.
Digging deeper, Prof. Demetris Koutsoyiannis finds that natural climate persistence results in statistics that are dramatically different from random fluctuations. e.g. Modeling using Hurst Kolmogorov dynamics with a Hurst coefficient of 0.89 in a log climacogram compared to a 0.50 coefficient for random fluctuations. e.g. see Fig 9. in Hurst-Kolmogorov dynamics and uncertainty 2011
or Fig. 9-11 in Climatic variability over time scales spanning nine orders of magnitude: Connecting Milankovitch cycles with Hurst-Kolmogorov dynamics
Or in further publications & presentations by Koutsoyiannis on Hurst-Kolmogorov dynamics.
Best

Just an engineer

November 22, 2013 5:20 am

Hi Willis,
In your analogy, I wonder if it would not be more accurate that the boys were blindfolded?

David

November 22, 2013 6:17 am

Once upon a time as an engineer at a big company I was asked to evaluate a model someone had made. They have taken 40 measurements taken from the line during the manufacturing process, regressed them against 20 resultant yields. They then took the best *11* of the results, and did a multi-linear regression against the same 20 resultant yields, and came up with what they called a 80% correlation. To prove this was nonsense, I took 40 random number generators, regressed them against the same 20 yields, took the 11 best results, did a multi-linear regression and came up with an 83% “correlation”.
When I presented my data to management, they wanted to know what distribution I took my random numbers from. I told them “uniform”. *They then wanted me to see if I could improve my correlation by picking a different distribution.* I came really close to resigning.

ferdberple

November 22, 2013 6:32 am

David L. Hagen says:
November 22, 2013 at 4:55 am
Hurst-Kolmogorov dynamics and uncertainty 2011
===========
A very interesting paper, pointing out that Climate Change is a redundant and thus misleading term. Climate = Change .
The paper goes on to demonstrate why the treatment of climate as a non-stationary deterministic process leads to an underestimate of climate uncertainty. The paper goes on to argue that the problem is the underlying statistical assumption that climate (the future) is deterministic.
19th century Victorian era physics considered the future to be deterministic. the the universe was a clockwork. Wind it up and the future is predetermined by the past. However, since that time physics has come to understand the future much differently.
Consider a planet orbiting a star. An electron orbiting the hydrogen nucleolus as you will. This is a deterministic system. It is the lowest order of chaos – no chaos. Now add another planet in orbit. Another electron around the nucleolus. Suddenly you have the three body problem. You cannot be sure where the orbits will take the planets/electrons. Instead, you are left with a probability function. The Schrodinger equation, the Uncertainty Principle, Chaos.
The clockwork breaks down, our nice orderly view of the future does not exist. You cannot average the future and arrive at a meaningful result. On average if you have one foot in the oven and the other in the freezer you are comfortable.

ferdberple

November 22, 2013 6:46 am

The spaghetti graphs the IPCC publishes for the climate models are much more revealing than the ensemble mean. The spaghetti graphs tell us that the models already know that the future is uncertain. That many different futures are possible from a single set of forcings.
This is a critical issue because many scientists and policy makers are unaware of the problem inherent in time series analysis. They believe that for a single set of forcings only a single outcome is possible. Thus they talk about “climate sensitivity” as a single value. The argue that it is 1.5 or 2.5 or 3.5. When in fact it is all those values and non of those values at the same time.
If it wasn’t, if climate sensitivity was indeed a single value, then a single model would always deliver the same prediction for a given set a forcings. But the models do not do this. Every time you run a model, if it is in the least bit realistic, it will deliver a different prediction for future temperatures, without any change in the forcings. How then can climate sensitivity be a single value?
the climate models are telling us what we don’t want to hear. they are screaming at us that climate sensitivity is not and cannot exist as a single value.

Resourceguy

November 22, 2013 6:59 am

The frequency of model bias on the high side would be better connected to real world modeling issues if it was connected to the problem of extending such projections off the growth phase of multi decadal cycles. A classification system might be possible to sort out which models are most prone to last cycle take off effect to produce ridiculous runaway projections.

H.R.

November 22, 2013 7:10 am

EternalOptimist says:
November 22, 2013 at 1:38 am
“I always feared this would happen. The CAGW debate has erupted into Violins”
==============================================================
Phweeeet! 15 minute time out in the corner, no recess, and no pudding for dessert!
.
.
.
.
As Willis wrote in the main post:
“Another problem is that many models share large segments of code, and more importantly they share a range of theoretical (and often unexamined) assumptions that may or may not be true about how the climate operates.”
If a model is really good, it should run long and well enough to predict the next glaciation, eh? One of my assumptions about climate is that there will be another glaciation.

ferdberple

November 22, 2013 7:11 am

In another recent thread Nick Stokes said that the models were not fitted to the temperature record,
“GCMs are first principle models working from forcings. However, they have empirical models for things like clouds, updrafts etc which the basic grid-based fluid mechanics can’t do properly. The parameters are established by observation. I very much doubt that they fit to the temperature record; that would be very indirect. Cloud models are fit to cloud observations etc.”
=============
If that was the case, then the models would all have the same values for aerosols within a very narrow margin. They don’t. Aerosol, not CO2 are the tuning knob on the models.
First principle models work fine in simple systems. In complex systems they are hopeless because of Chaos. Round off errors quickly accumulate in even the most precise models and overwhelm the calculation. This happens even in simple linear programming models with relatively few terms, such that you must back-iterate to minimize the error. As model size grows, the problem expands exponentially and overwhelms the result.
The classic example of the failure of first principles to predict complex systems is the ocean tides. You cannot calculate the tides with any degree of accuracy by first principle, yet we propose to calculate a much more complex problem, the climate of the earth using the same failed methodology. A methodology that has already proven itself to be hopeless at predicting the tides, the economy, the market, or any other complex time series.
Early humans already discovered how to predict complex time series. Without any knowledge of first principles they learned to predict the seasons. As advanced as we believe ourselves to be, we have forgotten this fundamental lesson.

MarkW

November 22, 2013 7:12 am

If they weeded out the losers, there wouldn’t be any left.

ferdberple

November 22, 2013 7:26 am

Average Climate Sensitivity does exist as a single number, but this is a meaningless term. If you took 1000 identical earths and played out the future, you would find that on some climate sensitivity was X, and on others it was Y, and you could average this out and say average sensitivity was (X1+X2…+Xn)/n. However, this would be meaningless on each individual earth, because their sensitivity would still be X1, X2, etc. Now it could be argued that this differenece is due to natural variability, but what is natural variability if not the difference between the effect that the identical forcings would have in the future. How do we assign this to an unseen, hidden variable, and allow this to affect the climate system, while assuming that the known variables are unaffected?

gymnosperm

November 22, 2013 7:45 am

Willis, are you colorblind?
Anyway, lovin’ the mudballs.

Steven Mosher

November 22, 2013 8:22 am

“In another recent thread Nick Stokes said that the models were not fitted to the temperature record,
“GCMs are first principle models working from forcings. However, they have empirical models for things like clouds, updrafts etc which the basic grid-based fluid mechanics can’t do properly. The parameters are established by observation. I very much doubt that they fit to the temperature record; that would be very indirect. Cloud models are fit to cloud observations etc.”
so which is it?
######################
a while back I passed willis a paper on how calibration is done.
Tuning the climate of a global model
Thorsten Mauritsen,1 Bjorn Stevens,1 Erich Roeckner,1 Traute Crueger,1 Monika Esch,1
Marco Giorgetta,1 Helmuth Haak,1 Johann Jungclaus,1 Daniel Klocke,2 Daniela Matei,1
Uwe Mikolajewicz,1 Dirk Notz,1 Robert Pincus,3,4 Hauke Schmidt,1 and Lorenzo Tomassini
First a few definitions. when folks use the word tuning my sense is moost think of something like this.
Tuning: you would take the entire historical 2 meter temperature ( air and sea ) and fiddle knobs( aerosols for example) until you matched the metric.
This kind of Tuning is not what is done and that should be pretty obvious. They dont for the most part Tune to match the entire land +ocean series. That should be clear when you see
A) they dont match temperature absolute temperature very well
B) they miss hindcast peaks and valleys.
Instead, one adjusts parameters to achieve balance at the TOA and to match some subset of temperature anomalies.
Think of this as tuning to match known physics: energy out has to equal energy in along with tuning to match an initial state.
In the begining this was done by heat flux adjustments
“The need to tune models became apparent in the
early days of coupled climate modeling, when the top of
the atmosphere (TOA) radiative imbalance was so large
that models would quickly drift away from the observed
state. Initially, a practice to input or extract heat and
freshwater from the model, by applying flux-corrections,
was invented to address this problem [Sausen et al.,
1988]. As models gradually improved to a point when
flux-corrections were no longer necessary [Colman et al.,
1995; Guilyardi and Madec, 1997;Boville and Gent, 1998;
Gordon et al., 2000], this practice is now less accepted in
the climate modeling community.”
Now its done like this
the radiation balance is controlled primarily by tuning cloud-related
parameters at most climate modeling centers…IIn addition some tune by adjust ocean albedo and other tune by adjusting aersols.
But not to match the entire historical surface temps, but rather something like this
In one example they tune to hit the 1850 to 1880 temperature of 13.7C. Then they tune
so that the Energy out in the satellite era matches satillite observations.
The notion that they tune to match the entire historical series is somewhat of an urban legend. real skeptics would question that legend. ever see a skeptic challenge that assertion? nope. selective skepticism.
So. The initial state for the first 30 years is 13.7C. parameters are adjusted so that the temperature matches at the begining. Then the process is run forward and you make sure that you match a Different parameter at the end: Energy out.
Got that? so youre not tuning to match the hump in the 30s or the rise since 1976.
You are tuning to make sure that the global average of 13.7C is set correctly at the start
and then closing on a different parameter at the end: Energy out.
in between the temperature is what it is; a function of forcings.
now scream and shout, but by all means dont read the papers or back up the urban legend with some actual evidence.
The urban legend: all GCMs are tuned to match the entire historical series.
now quickly go google tuning GCMs. find a quote and use that quote as proof. but by no means should you actually read papers or write to scientists who work on models or visit
a modeling center to actually watch a tuning process. Dont do that your story might get very complicated very quickly. Stick with the simple legend.

G P Hanner

November 22, 2013 8:22 am

I knew a financial economist who was very good at tuning his investment model to accurately reflect past results. Still could not forecast investment results — at all. Knowing the past in no way gives the ability to see the future.

Epigenes

November 22, 2013 8:29 am

Eschenbach is tedious. This post is boring and detracts from the debate. How many of you studied his pictograms?
None of the warmists will give a damn. Eschenbach is preaching to the converted and on an ego trip. He needs to adopt a more political approach. Do not hold your breathe ‘cos he has no political nous whatsoever.

dbstealey

November 22, 2013 8:34 am

Epigenes,
I like Willis. He is not tedious, he is interesting. Your criticism is tedious.
Regarding ‘political’, you have no concept of that concept. How is your comment politically wise? Willis often gets hundreds of comments — almost all of them favorable — under his articles. So far, you have one comment. Mine. And it is anything but favorable.

mpainter

November 22, 2013 8:52 am

“This has always seemed strange to me, that they don’t even have the most simple of tests to weed out the losers.”
<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
That could be because they are all losers.

Alan Robertson

November 22, 2013 9:07 am

Epigenes says:
November 22, 2013 at 8:29 am
“blah blah blah”
____________________________
I looked over the graphs as I read Willis’ text.
Epigenes, you should know better…
Where is your contribution?
You say Willis is on a trip, but you sound as if you’re on Lithium, but forgot your last few doses.

Jim G

November 22, 2013 9:12 am

Interesting, but me thinks we too often try to outstatistically masturbate the warmistas. A simple plot of actual temperatures over time, as opposed to anomalies, says volumes about the over exagerations being employed to scare folks. Plotting anomalies, to me, is simply another way of rescaling to exagerate what is actually going on. And yea, yea, I know how little change in temperature is needed to substantially affect our way of life. But in reality that is mostly if the change is to the colder temps which would put the hurt on our growing seasons, food supplies, disease, and probably some sabre ratteling and war in the end.

Alan Robertson

November 22, 2013 9:13 am

Nick Stokes says:
November 22, 2013 at 4:47 am
jimmi_the_dalek says: November 22, 2013 at 2:42 am
“In another recent thread Nick Stokes said that the models were not fitted to the temperature record,…
so which is it?”
We’ll I’d invite people who think they are so fitted, or tuned, to say how they think it is done, and offer their evidence.
____________________________________
I won’t mention harry.readme, because I’m not much concerned with the practice of tuning models to fit the data. What I have a problem with, is your crowd’s practice of tuning the data to fit the models.

Max Hugoson

November 22, 2013 9:19 am

Willis: Did you get the book I sent you, through Anthony (“Rapid Interpretation of EKG’s”). I sent it to General Delievery in Chico CA. Should be there (according to Amazon). Let me note, this isn’t just a book on reading EKG’s, its the most concise cardiac education you can get in one book, and (with your abilities) go through in about 3 days!

David A

November 22, 2013 9:34 am

Mosher ends a long post with, “Stick with the simple legend.”[ a snide perspective that the skeptics do not understand the real work done by the modelers.) Yet the truth is it is the models that “Stick with the simple legend.”
They have a simple legend about the power of CO2 and the C.S. to that anthropogenic increase. So, no matter how they “tune” the past, and most skeptics I know never said the attempted to match the entire and uncertain historic record, their forward runs from their tuning point, uniformly run considerably higher then observations.
I predict that the one factor that would improve all the models is tuning down climate sensitivity to CO2. Yet the IPCC models stick with the “simple legend” of CAGW, and have the nerve to run their disaster scenarios based on the ensemble model mean of already failed projections.
Mr. Mosher, step away from the trees, so you can see the forest.

Duster

November 22, 2013 10:40 am

DocMartyn says:
November 21, 2013 at 6:33 pm
What graphics package did you use for the violin plots?
They look lovely BTW

In R you can load specific libraries for specialized tasks. If you look at Willis’ code, he has a line “require(vioplot)” . That indicates he loads the “vioplot” package, which is – more or less – an extension to R downloadable from CRAN and has to be installed in the local R installation to be used. There are two other related packages that do similar jobs. If the data points available are so limited that a violin or boxplot is liable to be misleadingly generalized, you can use viopoints instead. They are an underused form of graphic.

jimmi_the_dalek

November 22, 2013 10:47 am

The comments by Nick Stokes and Steven Mosher on tuning GCMs are interesting. I knew hat they could not be fitting the entire temperature record – this is what the cyclomaniacs do. Perhaps someone should write a more comprehensive account. Davis A above says “one factor that would improve all the models is tuning down climate sensitivity to CO2.” but I thought that the sensitivity was one of the results not one of the inputs . Is this correct? I can see why, as Mosher says, they initialise to start in the ~1850s or thereabouts (no accurate data prior to that), but the problem with starting so recently is that, if there are long term cycles in ENSO or the like, which are believe to be important, they would not be detected. Long term cycles would have to emerge from the model – they cannot be an input – if the physics is right. Has anyone ever started a GCM at a point thousands of years back, even though the initial conditions may be poorly defined, just to see if any cyclical behaviour emerges? Or are the models not sufficiently numerically stable to allow that.

Duster

November 22, 2013 10:50 am

Epigenes says:
November 22, 2013 at 8:29 am
Eschenbach is tedious. This post is boring and detracts from the debate. How many of you studied his pictograms?
None of the warmists will give a damn. Eschenbach is preaching to the converted and on an ego trip. He needs to adopt a more political approach. Do not hold your breathe ‘cos he has no political nous whatsoever.
What precisely is your point, other than that you skipped the introductory statistics courses and never learned about the process of exploratory data analysis, and don’t want to? A “political approach” is what distinguishes “consensus” science from the real thing. It is irrelevant how many “believe” something and far more important whether belief is true to fact or not. Willis is comparing products of belief with fact here. You want to be political, he’s offering you the ammunition.

Duster

November 22, 2013 11:03 am

Otto Weinzierl says:
November 21, 2013 at 11:53 pm
Sorry to correct you but the “whiskers” extend 1.5 times of the height of the box. Data beyond that are classified as outliers.

It is not that simple. There are several types of boxplot. Even wikipedia notes that the ends of the whiskers can represent several different things, only one which would meet your mention.

Jimbo

November 22, 2013 11:34 am

David A says:
November 22, 2013 at 4:20 am
Jimbo says:
November 22, 2013 at 1:33 am
Would it not be a good idea for the IPCC to look at say 5 of the models that came closest to temperature observations…………………….
============================================
………………………If they follow your suggestion they will likely find that by tuning way down “climate sensitivity” to CO2, they can produce far more accurate predictions.

That is what I was hinting at. 🙂 If the IPCC did this their scary projections would no longer be scary and we would not have to act now.

Max Hugoson

November 22, 2013 12:02 pm

For those critics of Willis on being overly detailed and arcane. You couldn’t be more correct. Let’s take a look at this 14 page classic:
http://people.csail.mit.edu/bkph/courses/papers/Exact_Conebeam/Radon_English_1917.pdf
This is Johann Radon’s original paper on his “Transform”. Clear, transparent, and the root mathematical basis of al CT work, MRI or Xray based.
It should be obvious that Willis spends FAR too much time on WORDS, actually describing things, when he could be more direct with just the equations.
Max
PS: Am I supposed to put a \sarc tag in to indicate sarcasm is now off?
[Reply: Always use a /sarc tag when being sarcastic. Some folks take everything literally. — mod.]

3x2

November 22, 2013 12:15 pm

My final observation is an odd one. It concerns the curious fact that an ensemble (a fancy term for an average) of climate models generally performs better than any model selected at random. Here’s how I’m coming to understand it.
Suppose you have a bunch of young kids who can’t throw all that well. You paint a target on the side of a barn, and the kids start throwing mudballs at the target […].

Worse than that perhaps. It is like betting on every horse in a particular race. On paper it looks as though you have a ‘sure fire’ betting system because you always collect on the winner. Potential investors in your scheme only get wise once they figure out that, although they collect on every race, it is costing them, on average, a million each race for every 40,000 they collect.
Of course you could always refine your ‘sure fire’ scheme over time such that more money is placed on the current favourites in the race.
The horse racing analogy falls down with the IPCC simply because, if we were to treat the various AR’s (model elements) as a horse race, the IPCC swaps out horses mid race depending upon their performance. While, of course, claiming that it is still the same race. Were LT temps to fall over the coming decades then one could bet that AR9 will be bang on the money when it came to foreseeing that development (AR1-8 ‘model ranges’ conveniently forgotten).

Eugene

November 22, 2013 3:46 pm

Willis, your observations about the average suggests to me the comparison between “accuracy” and “precision:” in that regard, the models may be quite precise, but the accuracy sucks. If the modelers have “tuned” (i.e., targeted) the model on bad observational data, it may very well replicate that data set nicely, but if the observational data stink, well, so will the predictions by the model. Of course, if the models poorly replicate the annual, year-to-year changes, then that’s a big problem in its own right.
Thanks for taking so much time to perform and present this analysis.

Scottish Sceptic

November 22, 2013 5:03 pm

At school they said we were to have a surprise fire bell at 2pm one day the next week.
They couldn’t have it on Friday – because being the last day it would not be a surprise.
So … they couldn’t have it on Thursday as it would be a surprise – because they couldn’t have it on Friday and so Thursday would not be a surprise.
Likewise, Wednesday, Tuesday and Monday.
So, they couldn’t have a surprise fire drill.
It’s very similar with the climate models. They set them up believing they model natural variability in the climate – in practice the degrees of chaos are constrained so that they are not.

TimTheToolMan

November 22, 2013 6:12 pm

Willis writes “My main conclusion is that at some point we need to get over the idea of climate model democracy, and start heaving overboard those models that are not lifelike, that don’t even vaguely resemble the observations.”
You cant post-hoc select based on the variable you’re measuring or you will most certainly select for models which produce hockey sticks. This is precisely the same (bad) reasoning as throwing out tree rings because they dont play ball. You’re not allowed to do it.
Basically if the model is believed to represent temperature then it must stay.

David A

November 22, 2013 11:06 pm

Jimini says, “Davis A above says “one factor that would improve all the models is tuning down climate sensitivity to CO2.” but I thought that the sensitivity was one of the results, not one of the inputs.
——————————————————————————————————————-
I am fairly certain CO2 is the dominant forcing. Many of the feedbacks are based on the claimed, but not observed affect of additional CO2; IE increased water vapor, reduced albedo at the poles, etc. When ALL the models run wrong in a uniform direction, it is likely that something fundamental is wrong. CO2 forcing via direct radiative affects, and feedbacks, is the common thread from which all the WRONG models are weaved.
When, which out so much as a “How do you do”, the Mannian hockey stick overturned decades of scientific thought and research, the catestrophists created a dilemma for themselves. They claimed a flat past with flat CO2. It was necessary to insure that the historical record match the newly revealed flatness. Hansen & company went to work on the historical record to support Mann. See here… http://stevengoddard.wordpress.com/hansen-the-climate-chiropractor/ and here… http://stevengoddard.wordpress.com/2013/10/09/hansen-1986-five-degrees-warming-by-2010-2/
Jimbo, ya, (-; I knew what you were getting at, I just wanted to see if I could draw Mosher out of the trees to see if he could maybe acknowledge the big picture of the forest. No luck, once agan his Royalness did not acquiesce to an actual conversation after his condescending lecture to the proletariat. Alas, Mr Mosher remains, stuck in a world of numbers he knows far better then most, but the foundation they rest on, of that, he is unaware.

David A

November 23, 2013 6:03 am

@ Tim the Tool man. Why? Inconvenient tree rings are after all, a real world observation. The models are a computer simulation of an opinion. There are wrong answers in the models. There are only wrong interpretations of tree rings, how they were formed, what they mean. There is a dramatic difference between a model, and a real world observation. We must reject the models that do not match our real world observations.

Wes

November 23, 2013 6:07 am

Jim G says:
November 22, 2013 at 9:12 am:
I agree with Jim. I really don’t understand why they use and “anomalies” as there is so little intelligence in this metric. I don’t ever recall using this in my 40 year career as an engineer and physicist.
WP.

Jim G

November 23, 2013 7:59 am

Wes says:
November 23, 2013 at 6:07 am
Jim G says:
November 22, 2013 at 9:12 am:
“I agree with Jim. I really don’t understand why they use and “anomalies” as there is so little intelligence in this metric. I don’t ever recall using this in my 40 year career as an engineer and physicist.
WP.”
Try to even find any historical plots of actual temps. Everything is in “anomalies”. We are playing into the hands of the warmers when we use their methods without, at least, showing a few plots of actual temps to show how little change is actually occuring.

Gary Pearse

November 23, 2013 12:04 pm

As expected, an excellent post. One more graph would round it out, perhaps. Actually put all the data together and plot a composite violin (or sting wray?) plot to illustrate the effect of averaging all.

Tom van der Hoeven

November 23, 2013 12:20 pm

Hadcrut annual.csv
Willis, can you please give the Hadcrut annual.csv file aswell.
Tom

TimTheToolMan

November 23, 2013 2:16 pm

David A writes “Why?”
How does your reasoning stand up if none of the models are modelling climate? Afterall you did say “When ALL the models run wrong in a uniform direction, it is likely that something fundamental is wrong.”

Crispin in Waterloo but really in Ulaanbaatar

November 23, 2013 5:53 pm

@AndyG55 and Just an engineer
This is my verbal ‘Like’ for both your comments.
Andy sez: Any model that hindcasts to fit pre-1979 Giss or HadCrud.. will ALWAYS create an overestimate of future temperatures.
This is kinda obvious, n’est pas? If a model is built or trained or compared or correlated or checked against a temperature set that has been fiddled to make the past look colder than it was, even if it is doing ‘really well’ on the basis if its internal mechanisms, it is going to over-estimate the future temperatures both near and far. It has been created to model a non-reality. And future reality bites hard.
That the divergence starts immediately is a really good indication that the whole of the past needs to be tipped up until the dart throwing mechanism shoots at lower targets. That tilting might best be accomplished by reducing the net forcing effect of additional CO2 and elevating something else. The internal mechanisms of a functionally useful model don’t have to be correct for short term predictions. Save for one of two model, the rest seem not to meet even this puny standard.

David A

November 23, 2013 10:15 pm

TimTheToolMan says:
November 23, 2013 at 2:16 pm
David A writes “Why?”
How does your reasoning stand up if none of the models are modelling climate? Afterall you did say “When ALL the models run wrong in a uniform direction, it is likely that something fundamental is wrong.”
———————————————————————————
Sorry, but I do not see the contradiction. I am simply asserting the simple common sense approach that the projections which least match real world observations be discarded, and the GCMs closest to the observations be analyzed to see what makes them better. I have yet to find a discussion on why certain models run closer to R.W.O. AndyG55 and Crispin above do make an excellent observation.
My links previously posted in this thread begin to address how the historic surface record has been adjusted to support the Mannian hockey stick. See here… http://stevengoddard.wordpress.com/hansen-the-climate-chiropractor/ and here… http://stevengoddard.wordpress.com/2013/10/09/hansen-1986-five-degrees-warming-by-2010-2/
Of course that is just a beginning and over the last decade plus, RSS has diverged from GISS more then ever. (Steven Mosher does not like to talk about this. Actually he does not like to talk much at all, he prefers to lecture; something I find a bit sad as he has a great deal of detailed expertise.

mbur

November 24, 2013 9:25 am

One Model, One Vote….And my vote goes for…normal gaussian dataset (orange) in figure2.
Because ,IMO, it most closely resembles…HadCRUT observational dataset (red) also in figure2.
Don’t those violin plots kinda show averages anyway,an average of an average is just an average to me.
Thanks for the interesting articles and comments.

dscott

November 24, 2013 10:52 am

The IPCC game is based upon exclusion of data and methods. Why not beat them at their own game? Produce 20 climate models, each a minor iteration of the next and then independently submit each result. Presto, you have out voted their closed voting block and more importantly introduced a negative confidence factor against the established models. They will no longer be able to claim unanimous agreement or 95% confidence level in the models when half of them are showing cyclical or decreasing trends.
Wait for liberal shrieking to begin in 3…., 2….., 1…..

Brian H

November 25, 2013 2:03 am

Wes says:
November 23, 2013 at 6:07 am
Jim G says:
November 22, 2013 at 9:12 am:
I agree with Jim. I really don’t understand why they use and “anomalies” as there is so little intelligence in this metric. I don’t ever recall using this in my 40 year career as an engineer and physicist.
WP.

Yes, and another unique-to-CS term is “forcings”, With forcings of anomalies, CS has created a mental playground where anything goes.

Tim Ball

November 25, 2013 9:18 am

Good work to identify a serious limitation of the models but also climate science. Climatology was of little interest until it was chosen as a political vehicle. Prior to that it was all about averages not the concept of change.
In the 1970s trends became important in society in general and in climatology as global cooling posed a threat to world food production. Trends, especially simple linear trends, became the foundation of very simplistic computer models and politically fashionable because of their use in the Club of Rome work “Limits to Growth”.
In the 1980s the trend became global warming especially after Hansen’s politically contrived 1988 hearing. It was just another simplistic trend, but now with a full political agenda exemplified by Senator Wirth, who arranged Hansen’s appearance, comment that “We’ve got to ride the global warming tissue. Even if the theory of global warming is wrong, we will be doing the right thing…”
Just as the IPCC kept all the focus on CO2, so it kept the focus on averages and trends. It diverted from the other important statistic in any data set namely the variation. As the climate transitions from one trend to another the variability changes. This is primarily due to the changing wave pattern in the Circumpolar vortex. This is accentuated in middle latitude records, which dominate the data sets used for the climate models.
In my opinion Willis’s article serves to accentuate this failure to consider variation, but also the failure of the models because ether are built on completely inadequate data sets in space and time.
I wrote about the broader implications of a limited simplistic application of statistics to climatology in particular and society in general.
http://drtimball.com/2011/statistics-impact-on-modern-society-and-climate/

wpDiscuz

Watts Up With That?

One Model, One Vote

Like this:

Related

Share this:

Like this:

Related