Models All The Way Down

Guest Post by Willis Eschenbach

A learned man was arguing with a rube named Nasruddin. The learned man asked “What holds up the Earth?” Nasruddin said “It sits on the back of a giant turtle.” The learned man knew he had Nasruddin then. The learned man asked “But what holds up the turtle”, expecting Nasruddin to be flustered by the question. Nasruddin simply smiled. “Sure, and as your worship must know being a learned man, it’s turtles all the way down …”

I’ve written before of the dangers of mistaking the results of the ERA-40 and other “re-analysis” computer models for observations or data. If we just compare models to models and not to data, then it’s “models all the way down,” not resting on real world data anywhere.

I was wondering where on the planet I could demonstrate the problems with ERA-40. I happened to download the list of stations used in the CRUTEM3 analysis, and the first one was Jan Mayen Island. “Perfect”, I thought. Middle of nowhere, tiny dot, no other stations for many gridcells in any direction.

Figure 1. Location of Jan Mayen Island, 70.9°N, 8.7°W. White area in the upper left is Greenland. Gridpoints for the ERA-40 analysis shown as red diamonds. Center gridpoint data used for comparisons.

How does the ERA-40 reanalysis data stack up against the Jan Mayen ground data?

Figure 2. Actual temperature data for Jan Mayen Island and ERA-40 nearest gridpoint reanalysis “data”. NCAR data from KNMI. Jan Mayen data from GISS.

It’s not pretty. The ERA-40 simulated data runs consistently warmer than the observations in both the summer and the winter. The 95% confidence intervals of the two means (averages) don’t overlap, meaning that they come from distinct populations. Often the ERA-40 data is two or more degrees warmer in the winter. But occasionally and unpredictably, ERA-40 is 3 to 5 degrees cooler in winter. Jan Mayen’s year-round average is below freezing. The average of the ERA-40 is above freezing. The annual cycle of the two, as shown in Figure 3 below, is also revealing.

Figure 3. Two annual cycles (Jan-Dec) of the ERA-40 synthetic data and Jan Mayen temperature. Photo Source

The ERA-40 synthetic data runs warmer than the observations in every single month of the year. On average, it is 1.3°C warmer . In addition, the distinctive winter signature of Jan Mayen (February averages warmer than either January or March) is not captured at all in the ERA-40 synthetic data.

So that’s why I say, don’t be fooled by people talking about “reanalysis data”. It is a reanalysis model, and from first indications not all that good a reanalysis model. If you want to understand the actual winter weather in Jan Mayen, you’d be well-advised to avoid the ERA-40, or February will bite you in the ice.

The use of “reanalysis data” has some advantages. Because the reanalysis data is gridded, it can be compared directly to model outputs. It is mathematically more challenging to compare the model outputs to point data.

But that should be a stimulus to develop better mathematical comparison methods. It shouldn’t be a reason to interpose a second model in between the first model and the data. All that can do is increase the uncertainty.

In addition, due to the fact that both models involved (various GCMs and the ERA-40) are related conceptually (being current generation climate models), we would expect the correlations to be artificially high. In other words, a model’s output is likely to have a better fit to another related model’s output than it does to observational data. Data is ugly and has sudden jumps and changes. Computer model output is smooth and continuous. Which will fit better?

My conclusion? The ERA-40 is unsuited for the purpose of validating model results. Compare model results to real data, not to the ERA-40. Comparing models to models is a non-starter.

Regards to everyone,

[UPDATE] Several people have asked about the sea surface temperatures in the area. Here they are:

Figure 4. As in Figure 2, but including HadSST sea surface temperature (SST) data for the gridcell containing Jan Mayen. SST data from KNMI

Figure 5. As in Figure 3, but including HadSST sea surface temperature (SST) data for the gridcell containing Jan Mayen. SST data from KNMI

Note that SST is always higher than the Jan Mayen temperature. This is not true for the ERA-40 reconstruction model output.

0 0 votes

Article Rating

136 Comments

Inline Feedbacks

View all comments

Willis Eschenbach

Author

March 8, 2011 9:40 am

Rod Everson says:
March 8, 2011 at 8:14 am

I think Steve E raises a valid point. The air temp over the land mass is being compared to (modeled) air temp over water, so there’s some comparing of “apples to oranges” going on here.
What I’m curious about is what does the annual pattern of water temps look like around the island? Because if it’s relatively stable (being a huge heat sink) then you’d expect the air temp over the island to be warmer than the air over the ocean during summer month and then cooler in winter months. But if the ocean warms significantly beyond “just above freezing” then the heat sink argument fails. So it would be nice to have data on the water temps too.
I don’t think Steve’s point should be dismissed though. It’s appears to me to be a valid one.

SteveE and others, I’ve added graphs including the SST at the end of the head post.
Enjoy!
w.

Ryan Maue

Editor

March 8, 2011 9:46 am

Willis, you cannot compare reanalysis data to station data and expect it match up one-to-one. Your post is a straw man argument. Furthermore, you are comparing against first generation reanalysis models, which have been significantly improved since the 1990s when NCEP Reanalysis and the ERA-40 were implemented. The spectral resolution of the models does not come close to resolving the very complicated topography (tiny islands) at the station you chose to prove your point.
You are proving nothing with your analysis. I work with operational and reanalysis numerical weather prediction models on a daily basis — and what you just showed is a model bias in the ERA-40, which is a function of the model itself. Try comparing the station data to the newest ECMWF T1279 operational grids, which are readily available from the TIGGE archive.

SteveE

March 8, 2011 10:01 am

Willis Eschenbach says:
March 8, 2011 at 9:24 am
Hi Willis,
I apologise for accusing you of a big fail, correct in your comment.
However your new graph does add weight to my point. The Summer SST are higher than the point data temps measured on land and also warmer than the winter temps.
Surely this explains why the grid cell data is higher than the point data measured on the island.

tty

March 8, 2011 10:15 am

Ryan Maue says:
“The spectral resolution of the models does not come close to resolving the very complicated topography (tiny islands) at the station you chose to prove your point.”
A single tiny island in the middle of a big ocean is “a very complicated topography”? I would think that is about as simple as topography can possibly be.
In that case, just how bad is ERA40 in areas with complicated topography, like e. g. Central Asia or Europe?

eadler

March 8, 2011 10:39 am

As I understand it, the objective of the modeling to fill in data is to produce an estimate of the temperature anomaly in the area. That is a different thing from trying to determine the exact temperature. A consistent warm or cold bias doesn’t make a difference under those circumstances.
I doubt that the people doing the modeling expect their results to be spot on. As Anthony himself points out, the local environment can affect the station temperature.
If you could show that the temperature anomaly was significantly affected, you might have an argument that a significant error is created.
Of course, the alternative is to omit the temperature from a grid when there is no data. In fact, the HADCRUT data does that, and seems to underestimate the global anomaly increase because it leaves out a lot of the Arctic region.
It is pretty clear that the temperature anomaly of an Island surrounded by hundreds of miles of ocean would not be representative of the anomaly of territory inside the grid point.
It has also been pointed out, that there are better models out there than the one you used and found to be less accurate than you would like to see.
So I don’t consider your analysis very telling at all, despite the applause you have gotten from so many posters.

Ryan N. Maue

Editor

March 8, 2011 10:47 am

tty: yes, a little piece of land surrounded by ocean is a difficult analysis or forecasting situation for a model where a grid cell is 100 km x 100 km.
ERA-40 is not meant to represent every square kilometer of the Earth at street-level resolution. It is a large-scale model, just like the climate models. No one should attempt to compare station data to a grid point and expect the issue of representativeness to automatically disappear.
Similarly, when a forecast model is run for the next 7-days, it is verified afterward often against radiosondes at the given locations. However, it is not individual radiosondes, but usually a composite or collection of them to determine any vertical biases in the model analysis and forecast.

Kev-in-Uk

March 8, 2011 10:47 am

I am in total agreement with Willis apparent main thrust argument – that modelling based on ‘other’ model data is essentially unrealistic (my summation!).
As I see it – and I am sure if I am wrong, someone will correct me – a bunch of station data is averaged (in a given grid), if there are limited (or no) stations, the grid is extrapolated from adjacent grids (?) – then the grids are averaged together (so we have an average of an average of an extrapolation) to give us a gridded dataset ‘summed’ together to give us a potential ‘global’ (or regional) anomaly…..is that right?
then some comedian decides to reanalyse this but instead of comparing the reanalysis model outputs to actual recorded station data – they compare output to the gridded average? – if the model doesnt fit, either the gridded averaging is wrong or the model is wrong, so they ‘tweak’ either (or both?)…..
whichever way you cut it – I cannot see that as making good science or even sense? It strikes me that more and more, we find the use of ‘actual’ (as in REAL observed) data further and further removed from the methodology….

dbstealey

March 8, 2011 10:50 am

NikFromNYC,
Good graphs, I wonder if Izen looked at them? Izen commented:
“The actual data certainly confirms the warming and its magnitude over the global increase as predicted from AGW theory (sic) that higher latitudes will show greater effect.”
The basis for AGW is models. [The trend is simply emergence from the Little Ice Age.] But lots of folks still believe that runaway global warming is right around the corner. Convincing them otherwise is going to take time.
Men, it has been well said, think in herds; it will be seen that they go mad in herds, while they only recover their senses slowly, and one by one.
~ Charles Mackay, Extraordinary Popular Delusions and the Madness of Crowds

Tom_R

March 8, 2011 10:53 am

>> SteveE says:
March 8, 2011 at 10:01 am
However your new graph does add weight to my point. The Summer SST are higher than the point data temps measured on land and also warmer than the winter temps.
Surely this explains why the grid cell data is higher than the point data measured on the island. <<
I understand your concern about the comparing apples and oranges. However, looking at the island data and the sea surface data, it appears that the grid cell was made from just the island data, otherwise the average would be much closer to the sea surface temperatures.
This then begs the original question, what transformed the only data used for that grid cell to make it result in a warmer, and warming, grid cell?

Eduardo Ferreyra

March 8, 2011 11:00 am

Hi, Willis, good post, though Ryan Maue has a point there.
But what I wanted to tell you is about Nasruddin, a “saint man” in the Persian dervish philosophy. He can be at the same time an extremely wise man and and a very dumb one. One story I like because it depicts what the whole AGW scare means is:
A man saw Nasruddin throwing bread crumbles around his garden and asked him why. Nasruddin said: “It keeps tigers away from my house”.
-“But there are no tigers in this region,” the man said.
–“So, you see, it works!”– said Nasruddin.

Ric Werme

Editor

March 8, 2011 11:17 am

John F. Hultquist says:
March 8, 2011 at 8:23 am
Jim Sorenson says:
March 8, 2011 at 5:54 am
Re: Richard Feynman, o-rings, and ice water
> I saw a video years ago in which Dr. Feynman explained he was tipped off to the o-ring demonstration….
I read that in one of Feynman’s books. It confirmed my suspicion that it was staged, and that Feynman did it in part to use his authority figure make an impression on people who might have ignored the same demonstration from a Morton-Thiokol engineer.
It kinda confirm to me that understanding and using office politics is important. I was also surprised that it seems to have boosted Feynman’s stature in the eyes of the beholders. I’m sure Feynman wasn’t looking for that, but it was something that the non-technical people apparently had never seen before but could understand.

mkelly

March 8, 2011 11:20 am

If I wanted to pick a place to live I would pick the warmest of the three temperature profiles, but I can’t. One is water which I cannot live on and the other is make believe. So I am stuck with the coldest. Where people actually live matters.

Steven Mosher

March 8, 2011 11:25 am

Folks should note this.
1. DMI, the arctic temps some people like to quote?.. based on NCEP reanalysis.
2. Even “ground truth” “data” is filled with theory. Does anyone think that a thermometer records the physical property known as temperature?
its theory all the way down. all data is theory laden. At the bottom the theory that infuses the data is very very hard to give up.

Willis Eschenbach

Author

March 8, 2011 11:32 am

Ryan Maue says:
March 8, 2011 at 9:46 am

Willis, you cannot compare reanalysis data to station data and expect it match up one-to-one.

Ryan, as always, thanks for your thoughts. First, it is not reanalysis data. It is reanalysis model output. Data is what went into the reanalysis model, not what came out of it. This sloppiness in nomenclature is very deceptive for those lay people following the discussion. They are fooled into thinking it is data. I know that’s not your intention or your understanding. I’m just pointing out that names have consequences, and that calling model output “data” is inherently deceptive regardless of your intention.
Next, Ryan, you make my point exactly. The reanalysis so-called “data” doesn’t match the real data, the actual observations. So if we wish to test the GCM model hindcasts, we should test them against the real data. Introducing a new model in between the data and the model hindcasts just muddies the water and provides falsely high correlations. Model outputs are smooth and continuous. The real world pied and dappled and contains sharp boundaries. If you compare two model outputs, they’re both smooth and will give better correlations than when you compare a GCM’s output to actual observations.
Yes, reanalysis synthetic observations are our best guess. Yes, they are reasonable guesses, and getting better at you point out. But they are not data. They are the output of a computer model.
w.

jim hogg

March 8, 2011 11:35 am

Seems to me that a little knowledge of statistics drags us away from the reality we’re attempting to identify and explain, and a lot of statistical knowledge too often (I didn’t say always!) serves only to obscure matters more.
I’m still hoping that someone with access to the necessary data will identify a few hundred stations – or more or less – around the world which have not been moved and not been subjected to environmental changes, and whose equipment over a lengthy period is consistent/has not been changed and can reasonably be assumed to be accurate.
Then the plan would be to plot the average (the only arithmetical processing) of the data – the raw data and only the raw data – for as long a period as is feasible – given such conditions – to see what the modern temperature record really looks like – so far as it’s possible to get an accurate representation. Only when we have that can we attempt to explain it and perhaps reach conclusions that are reliable, such as possibly – yes, it’s getting warmer, or surprisingly, no, it’s getting colder, or whatever, but we don’t know why exactly.
But that would be boring of course. It wouldn’t be sophisticated or adequately intellectual, and wouldn’t need its own arcane language. It would be easily within the grasp of the proles. And wouldn’t generate research funds. And would puncture one more pointless political football. But might help to restore some respect to the field of climate science. I’ve been waiting awhile, and I’m not optimistic.

greg holmes

March 8, 2011 11:43 am

The common sense approach shown here is quite breathtaking and I applaud it.
Here in the UK we have in the past been known for a common sense approach, I know it scares the hell out of the “great and the good” (sarc) who rule over us. The solid explanation above is brilliant and cannot reasonably be denied.
Many thanks Willis.

Willis Eschenbach

Author

March 8, 2011 12:13 pm

Steven Mosher says:
March 8, 2011 at 11:25 am

Folks should note this.
1. DMI, the arctic temps some people like to quote?.. based on NCEP reanalysis.

True, as I have noted several times in these pages.

2. Even “ground truth” “data” is filled with theory. Does anyone think that a thermometer records the physical property known as temperature?

True but misleading. As you point out, nothing really measures temperatures. We measure a variety of physical properties that change in response to temperature.
However, that doesn’t make a thermometer reading somehow magically equivalent to the output of a climate model as you imply. One is a measurement of reality, however flawed. The other is a programmer’s best guess. There’s a big difference.

its theory all the way down. all data is theory laden. At the bottom the theory that infuses the data is very very hard to give up.

That’s true, mosh, but that doesn’t make the output of some you-beaut climate model the equivalent of some ice-encrusted guy patiently reading an old mercury thermometer day after day …
As always, thanks for your thoughts,
w.

Willis Eschenbach

Author

March 8, 2011 12:23 pm

eadler says:
March 8, 2011 at 10:39 am

As I understand it, the objective of the modeling to fill in data is to produce an estimate of the temperature anomaly in the area. That is a different thing from trying to determine the exact temperature. A consistent warm or cold bias doesn’t make a difference under those circumstances.

Thanks, eadler. The problem arises when (as is often the case) we don’t have any data, or only scarce data, for a gridcell. I say if we want to analyze GCM model results, if we have no data for that gridcell, we don’t compare it to anything.
Instead, people use the ERA-40 climate model to manufacture imaginary, synthetic data for that gridcell. Then they compare the GCM results to the imaginary, synthetic data and TA-DA!. They announce that their model matches the observations. Which is what the Nature flood folks said, their model matched the observations.
But they weren’t observations at all. They were just the results of another model. You end up comparing two sets of synthetic temperatures. I don’t see the value in that when you have real temperatures to compare to. Nor do I see the value in that when (as is often the case) you have no real temperatures to compare to.
Finally, whether a warm or cold bias makes no difference as you claim depends on what you are analyzing and what the bias looks like. If the bias is not constant year-round, for example, it may not be a problem for some kinds of annual analyses but it would be a problem for most seasonal analyses.
Regards,
w.

izen

March 8, 2011 12:55 pm

@-Smokey says:
March 8, 2011 at 10:50 am
“Good graphs, I wonder if Izen looked at them? ….”
Yes, very pretty. It rather backs up the point that comparing station data with model reanalysis of a grid cell is of limited value. And now it appears that the ERA-40 model reanalysis may not be recent, of sufficient resolution to make it relevant and superseded by better re-analysis.
“The basis for AGW is models. [The trend is simply emergence from the Little Ice Age.]
No, the basis for AGW theory is measured physical quantities in the LWIR back radiation and outgoing spectra and the known thermodynamics of the constituents of the atmosphere.
Whenever people claim that the observed trend is “simply emergence from the Little Ice Age” it surely begs the question; WHY are we emerging from the LIA? Why did it not continue, or get even colder as it has around this point in the last 3 interglacial periods?
“But lots of folks still believe that runaway global warming is right around the corner. Convincing them otherwise is going to take time.”
Only the scientifically ignorant would think that RUNAWAY global warming is possible. The SB relationship of energy emitted is proportional to the temperature raised to the fourth power means that as any factor amplifies a warming effect it must exceed the fourth power magnitude to achieve a ‘runaway’ effect. As anyone with a modicum of knowledge in this field will be aware, CO2 increases have a logarithmic influence so are NEVER going to generate a runaway effect.
Of course the fact that a runaway effect is virtually impossible does not preclude the probability of a measurable rise in global surface temperature from the measured rise in CO2.

Kev-in-Uk

March 8, 2011 12:57 pm

Steven Mosher says:
March 8, 2011 at 11:25 am
valid points – but just because a thermometer is an imperfect way of measuring something does not make it useless (I mean in real terms, taking a physical temp measurement automatically affects the temperature of the medium being measured) – indeed, the whole basis of any physical measurement, is that it is compared to another one!, thus it is a comparable measurement – this is definately not the case with modelled or ‘adjusted’ data – in that case, one is perhaps comparing processed data with other processed data – in other words, neither has a ‘fixed’ point of physical reference.
Even if you define historical records as ‘inaccurate’ by some degree (no pun intended) – they are still actual measurements – they are not predicted/modelled/made up figures – they may be erroneous due to various reasons, but the error is a physical one – not a data processing one!

Kev-in-Uk

March 8, 2011 1:09 pm

Am I missing something?
either we are dealing with real measurements, or we aren’t.
If I measure, say, the size of an orange, and then produce an average of x million oranges – I will get an average size. But if I take only a few hundred oranges, and try to extrapolate/interpolate a ‘gaussian’ type distribution, will it be correct? To make it more correct (or realistic, if your prefer) – you need more data points, but if I used the model curve from my few hundred samples to predict ‘new’ data points – I’d be stupid because I’m using processed data to produce/verify more processed data? The only REAL way to get more points is to bloody measure more damned oranges(?).
Isn’t that the basic process that Willis is referring/objecting to?

Dave in Delaware

March 8, 2011 1:10 pm

SteveE says: March 8, 2011 at 8:16 am
I’m not sure how you could back calculate point data from an averaged grid cell.
————–
I am not suggesting that one should ‘back calculate point data’ from the climate model output.
The model output gives an averaged grid cell value at a point in time. Compare the model averaged value directly with the point data at that place and time. Don’t make it more difficult than that. I guarantee that the climate modelers are doing this kind of comparison of climate model output to the grid cell synthetic data. That is how they tweak the model parameters. Take it the next step and also compare the model output to the original measured data in that time window. If there are 5 measured values within the grid cell- then do the comparison 5 times.
We are not all that far apart. I believe the point you make is .. it won’t be exactly right. And I say, doesn’t matter. What matters is, if I do this comparison for 1000 or more point measurements, and compute a total for all of the Delta-squared values, I now have a yardstick measure of how well that particular climate model run performed. If the model inputs are tweaked, and re-run, does my yardstick get better or not? If a different climate model is run, and the yardstick computed against the same 1000+ points, is the yardstick value better in model A or model B? Does a particular climate model have ‘tendencies’ to run hot in summer or cold in winter, or vice versa? That would impact the yardstick value. Is the model stable – that is, are the year over year yardstick values about the same, getting bigger, getting smaller?
And to the point made by Willis – if there is a big problem (like UHI) in the homogenization process that created the synthetic grid data, then the climate model has no chance, but unless there is a comparison back to the measured data, how would you know?
I’m off my soapbox now. Thanks for listening. And thanks once again Willis, for thought provoking articles.

John Johnston

March 8, 2011 1:24 pm

@Juice, your supposed Lord Rutherford of Nelson quote is a misquote:
‘“If your experiment needs statistics, you ought to have done a better
experiment.”
The actual quote was: “If your result needs a statistician then you should design a better experiment”
There is a huge difference. Rutherford did say the odd dumb thing (who the hell does not?) but this was definitely not one of them. Lord Rutherford was the New Zealand Chemist and Physicist who laid the groundwork for the development of nuclear physics by investigating radioactivity. and who first split the atom. He collaborated with Bohr in describing atomic structure, and won the Nobel Prize in 1908, when that award still meant something.
He knew that statistics could be used to show correlation and thus point the way to formulating theories and identifying subjects worthy of investigation. He also knew that they could never be used to prove the theories. Only data will serve as results.

Stephen Brown

March 8, 2011 2:00 pm

Appropriate, methinks.
[youtube http://www.youtube.com/watch?v=-Py5aPLG348&w=480&h=390%5D

Christian

March 8, 2011 2:06 pm

Put very (too) simply:
1. You collect the data from various sampling points.
2. You perform statistics on it to model it together with its neighbours, in an attempt to create a ‘regional’ model.
3. You generate values for areal subsets that represents averages, including a ‘global’ average for the entire dataset.
4. You use these values to model ‘regional’ patterns.
The problem is, the ‘regional’ average generated for a subarea can differ markedly from the original point value. There are many possible reasons, but the main one is that the errors are large because there are not enough sample points.
It’s all about the errors. ‘Regional’ air temperatures are also flawed e.g. because the sampling is not optimised for ‘regional’ purposes, the sampling points are too few and irregularly distributed, one can go on and on.
The key observation is that, for climate science and even modelling to be useful, the land temperature data and the sampling density and protocols needs to be optimised. I have never read any study that discussed these critical issues by the policy makers but they emerge from studies like http://www.surfacestations.org.
In the meantime, we can do our best with the satellite and Argos datasets which are still young but hold out the hope that they will produce reliable regional and global datasets.