A courtesy note ahead of publication for Risbey et al. 2014

People send me stuff. In this case I have received an embargoed paper and press release from Nature from another member of the news media who wanted me to look at it.

The new paper is scheduled to be published in Nature and is embargoed until 10AM PDT Sunday morning, July 20th. That said, Bob Tisdale and I have been examining the paper, which oddly includes co-authors Dr. Stephan Lewandowsky and Dr. Naomi Oreskes and is on the topic of ENSO and “the pause” in global warming. I say oddly because neither Lewandowsky or Oreskes concentrates on physical science, but direct their work towards psychology and science history respectively.

Tisdale found a potentially fatal glaring oversight, which I verified, and as a professional courtesy I have notified two people who are listed as authors on the paper. It has been 24 hours, and I have no response from either. Since it is possible that they have not received these emails, I thought it would be useful to post my emails to them here.

It is also possible they are simply ignoring the email. I just don’t know. As we’ve seen previously in attempts at communication with Dr. Lewandowsky, he often turns valid criticisms into puzzles and taunts, so anything could be happening behind the scenes here if they have read my email. It would seem to me that they’d be monitoring their emails ahead of publication to field questions from the many journalists who have been given this press release, so I find it puzzling there has been no response.

Note: for those that would criticize my action as “breaking the embargo” I have not even named the paper title, its DOI, or used any language from the paper itself. If I were an author, and somebody spotted what could be a fatal blunder that made it past peer review, I’d certainly want to know about it before the paper press release occurs. It is about 24 hours to publication, so they still have time to respond, and hopefully this message on WUWT will make it to them.

Here is what I sent (email addresses have been link disabled to prevent them from being spambot harvested):

===============================================================

From: Anthony

Sent: Friday, July 18, 2014 9:01 AM

To: james.risbey at csiro.au

Subject: Fw: Questions on Risbey et al. (2014)

Hello Dr. Risbey,

At first I had trouble finding your email, which is why I sent it to Ms.Oreskes first. I dare not send it to professor Lewandowsky, since as we have seen by example, all he does is taunt people who have legitimate questions.

Can you answer the question below?

Thank you for your consideration.

Anthony Watts

—–Original Message—–

From: Anthony

Sent: Friday, July 18, 2014 8:48 AM

To: oreskes at fas.harvard.edu

Subject: Questions on Risbey et al. (2014)

Dear Dr. Oreskes,

As a climate journalist running the most viewed blog on climate, I have been graciously provided an advance copy of the press release and paper Risbey et al. (2014) that is being held under embargo until Sunday, July 20th. I am in the process of helping to co-author a rebuttal to Risbey et al. (2014) I think we’ve spotted a major blunder, but I want to check with a team member first.

One of the key points of Risbey et al. is the claim that the selected 4 “best” climate models could simulate the spatial patterns of the warming and cooling trends in sea surface temperatures during the hiatus period.

But reading and re-reading the paper we cannot determine where it actually identifies the models selected as the “best” 4 and “worst” 4 climate models.

Risbey et al. identifies the 18 originals, but not the other 8 that are “best” or “worst”.

Risbey et al. presented histograms of the modeled and observed trends for the 15-year warming period (1984-1998) before the 15-year hiatus period in cell b of their Figure 1.   So, obviously, that period was important. Yet Risbey et al. did not present how well or poorly the 4 “best” models simulated the spatial trends in sea surface temperatures for the important period of 1984-1998.

Is there some identification of the “best” and “worst” referenced in the paper that we have overlooked, or is there a reason for this oversight?

Thank you for your consideration.

Anthony Watts

WUWT

============================================================

UPDATE: as of 10:15AM PDT July 20th, the paper has been published online here:

http://www.nature.com/nclimate/journal/vaop/ncurrent/full/nclimate2310.html

Well-estimated global surface warming in climate projections selected for ENSO phase

Abstract

The question of how climate model projections have tracked the actual evolution of global mean surface air temperature is important in establishing the credibility of their projections. Some studies and the IPCC Fifth Assessment Report suggest that the recent 15-year period (1998–2012) provides evidence that models are overestimating current temperature evolution. Such comparisons are not evidence against model trends because they represent only one realization where the decadal natural variability component of the model climate is generally not in phase with observations. We present a more appropriate test of models where only those models with natural variability (represented by El Niño/Southern Oscillation) largely in phase with observations are selected from multi-model ensembles for comparison with observations. These tests show that climate models have provided good estimates of 15-year trends, including for recent periods and for Pacific spatial trend patterns.

of interest is this:

Contributions

J.S.R. and S.L. conceived the study and initial experimental design. All authors contributed to experiment design and interpretation. S.L. provided analysis of models and observations. C.L. and D.P.M. analysed Niño3.4 in models. J.S.R. wrote the paper and all authors edited the text.

The rebuttal will be posted here shortly.

UPDATE2: rebuttal has been posted

Lewandowsky and Oreskes Are Co-Authors of a Paper about ENSO, Climate Models and Sea Surface Temperature Trends (Go Figure!)

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

336 Comments
Inline Feedbacks
View all comments
ren
July 20, 2014 8:42 am

Here you have the effect of increased ionization GCR. Blockade the vortex in the southern magnetic pole stronger.
http://www.cpc.ncep.noaa.gov/products/intraseasonal/temp50anim.gif
http://arctic.atmos.uiuc.edu/cryosphere/antarctic.sea.ice.interactive.html

RACookPE1978
Editor
July 20, 2014 8:42 am

weather4trading says:
July 20, 2014 at 7:56 am (complaining/commenting about Mosher)
Why is Mosher given free reign to troll in the comments? Because that’s all he ever contributes here.

And the mod’s reply

[Because he contributes and doesn’t contravene the site rules.. . mod]

Even more important, no one can learn or expand past their own mind and their own prejudged conclusions UNLESS they are exposed to logical criticism and comment from a person who does not share their opinion. (Note: I did not say “correct” criticism and I did not say “correct” conclusions…) If I only wanted to hear things I agreed with, I would speak loudly and passionately in an empty room.

July 20, 2014 8:52 am

“How better might I have phrased the question, the point of which was to interrogate Mr. Mosher regarding how he could know an “average of the models” was more informative than any single model?”
simple.
1. read the literature
2. compare all the models to observations
3. compare the average of all models.
lets see
http://berkeleyearth.org/graphics/model-performance-against-berkeley-earth-data-set
its pretty simple. you can use any performance metric you like.
here is what you see.
1. models that score well on one metric, score poorly on others.
2. the average of all models wins.
It really isnt that hard.

RACookPE1978
Editor
July 20, 2014 8:57 am

Angech says:
July 20, 2014 at 3:42 am
If you average models and there is one halfway right model in there it will track better than any ensemble of anonymous incorrect models is my reading of above comments. Still not a very good model but?

No.
If you average different models together, you HIDE the one (?) good model with garbage from the 3, 4, or 21 “bad” models. Sometimes. And sometimes you “hide” that one “almost good enough” model errors with garbage from the rest.
to exaggerate.
For small values of “n”
2 + 2 = 2 * 2 = 2+2+n^2 = 2* 2^n + 2^(n+1) right? But each “model” is “wrong” under different initial conditions.

July 20, 2014 8:58 am

now, go do the work
start with the literature.
http://journals.ametsoc.org/doi/pdf/10.1175/2011JCLI3873.1

July 20, 2014 9:06 am

“You never looked at any data on how much “better” the average is than the individual model prediction, but somehow you just know the average is “better”?
Yes. its pretty simple.
Noting that the average is better and CALCULATING how much better are two different things.
basically the work we did looking at the issue confirmed what has already been published.
so, nothing too interesting there.
Still, there might be some interesting work to be done. folks here can get the data and see for themselves. Its an active area of research. so you have to pick the metrics you want to look at,
and then pick a performace or skilll metric RMSE is a good start, but there are others.
when you find the model that outperforms all others and the mean of all the models, then publish.
or.. you can avoid reading the literature, avoid looking at data. That works for blogs

Matt
July 20, 2014 9:08 am

@Truthseeker
Regarding your dart board analogy, it seems that looking at the actual board to see where the bull ‘s eye is translates to checking what the PRESENT temperature is. Guess what, I do that every day. The purpose of the exersice is to learn something about the FUTURE though, and looking at the actual board does not help in that case, now does it?

kadaka (KD Knoebel)
July 20, 2014 9:08 am

From Kate Forney on July 20, 2014 at 8:13 am:

How better might I have phrased the question, the point of which was to interrogate Mr. Mosher regarding how he could know an “average of the models” was more informative than any single model?

It’s a common fallacy about accuracy that nevertheless often works out. All the models are aiming at the same target. So if you average all the hits together you’ll be close to the bullseye.
But the models have a high degree of inbreeding, built on shared concepts that are incomplete, inaccurate, and possibly flat-out wrong. It’s like if there was a common school of thought in gunsmithing the front sights of rifles needed to be mounted several hundredths of an inch to the right of the barrel axis while the rear sight is directly over it. From there it doesn’t matter how many different rifles and how close together are the holes (how precise), the average of the holes will still be to the left of the bullseye (will lack accuracy).

Jim Cripwell
July 20, 2014 9:14 am

you have to forgive steven mosher. he thinks that there is no categorical difference between an estimate and a measurement.

Bruce Cobb
July 20, 2014 9:20 am

Mark Stoval (@MarkStoval) says:
July 20, 2014 at 8:35 am
We need a good word for that stuff that should be data but is not data.
“Doodoo ” comes to mind.

July 20, 2014 9:28 am

Steven Mosher says:

2. the average of all models wins.

What is the average of these models?

Roy UK
July 20, 2014 9:42 am

dbstealey poses the best question I have seen. So I wait for the answer from Steven Mosher.
(BTW the mean of those models seem to be running hot to me!)

Admin
July 20, 2014 9:50 am

When he said “the average of all models wins.” I think Mosher meant funding, not the goodness of fit with reality.

July 20, 2014 9:52 am

My apologies, I didn’t read it that way at first.

Harry Passfield
July 20, 2014 9:56 am

Surely, the average of the models is as accurate as the watch that has stopped: It is spot on twice a day.

NikFromNYC
July 20, 2014 9:59 am

Mosher here helps point out quite strongly that models only work well matched to his own outlier global average temperarure data set that fails to show any pause in warming at all. This is important since these same models fail when much more comprehensive Space Age satellite data is used in place of the rickety old thermometer record. The two independent satellite products falsify his result, as do the oldest continuous thermometer records which indicate recent warming to form not a hockey stick but fuzzy toothpicks in utter defiance of claims of a super water vapor enhanced greenhouse effect:
http://s6.postimg.org/uv8srv94h/id_AOo_E.gif
There is simply no trend change in the bulk of the oldest records. Nor is there any trend change in similarly linear tide gauge records in which the full volume of the oceans acts as a liquid expansion thermometer. There is only a sudden upturn in his own and to a lesser extent Jim Hansen’s product that also only uses satellites to estimate urban heating while ignoring NASA satellites for direct temperature readings. All the while Hansen’s replacement Gavin Schmidt publishes a rationale for the pause as being just a crazy coincidence of little factors adding up, a publication that admits to the pause that falsifies BEST.
Mosher’s skyward plot:
http://static.berkeleyearth.org/graphics/figure9.pdf
Note strongly how his product also nearly erases the global cooling that led to a new ice age scare which would have been impossible with such a lack of mid-century cooling as his product claims. Note also that no plots have ever been offered despite years of requests of his algorithm toned down to not slice and dice so much, so ridiculously much, but only for truly abrupt step changes so we have no idea how sensitive to parameterization his black box is.
These guys are just shamefully tweaking parameters and adjustments and rationales towards an alarmist result rather than simply accepting a lower climate sensitivity in objective fashion. That Mosher’s boss at BEST was exposed as a brazen liar about being a newly converted skeptic means he has been exposed as being a dishonest man. So we know that only the temperature product of an unapologetic liar matches climate models. This fact alone now falsifies those models.

kadaka (KD Knoebel)
July 20, 2014 10:01 am

Jim Cripwell said on July 20, 2014 at 9:14 am:

you have to forgive steven mosher. he thinks that there is no categorical difference between an estimate and a measurement.

But the temperature numbers we get from the satellites are not measurements, but come from taking measurements of other things and running them through models that use assumptions (best known values i.e. educated guesses) to generate estimates we normally refer to as data (aka measurements). The optical sensors of the observing entity, etc.

July 20, 2014 10:02 am

10:01 AM. Where is it?

July 20, 2014 10:05 am

As someone barely and tangentially related to anything scientific, it would seem to me that the average of many piles of garbage would still be, indeed, more garbage. Even the “best” 4 piles of garbage. And driving ahead at breakneck speed while looking out the rear of the car will never be a good idea.

July 20, 2014 10:13 am

Steven Mosher;
Yes. its pretty simple.
Noting that the average is better and CALCULATING how much better are two different things.
basically the work we did looking at the issue confirmed what has already been published.
so, nothing too interesting there.
>>>>>>>>>>>>>>>>>>
You are, in the end, fooling yourself. You’ve taken a bunch of models and averaged them, and noted that they get closer to observations as a consequence. As you yourself noted, no single model is correct. We can only assume then, that all the models are wrong and that at this point in time the errors in each of the models off set each other to some extent. Since we know that the models are wrong, and for differing reasons. we have no way of knowing if averaging them will bring them closer to future observations, or farther away.
Averaging the output of models that are known to be incorrect is simply indefensible, and it matters not in the least that for the tiny portion of the earth history for which we have instrumental data, doing so bring results closer in line with observations. This doesn’t even aspire to “correlation does not equal causation”. It is even less scientific than that. It is an average of a bunch of things that are known to be wrong happen to correlate for a short period of time with recent observations does not, I repeat NOT, I repeat NOT equal to a useful predictor of the future.
If I make predictions from chicken entrails and the average of my forecasts correctly predicts tomorrow’s weather, then I think you would agree that all I am presenting is a coincidence. That’s all you are presenting. It has no basis in science, no matter how many metrics you surround it with.

Go Home
July 20, 2014 10:20 am

While noting the glaring missing model identification, it does not in its own right dismiss the results. That said, do they recommend to their climate science pals that they should eliminate all the other models going forward as they are now calling them failures? You knew they needed to find an answer to the pause. We will see if it holds up in the court of public opinion. Go get em guys and gals.

Chuck Nolan
July 20, 2014 10:21 am

I tried that averaging thing at the horse track.
In 100 races the average winning number was horse #4.235 so I rounded it off and I bet on horse #4 every race the next time I went to the track.
I lost .
What went wrong?
I had good data.

July 20, 2014 10:25 am

Steven Mosher
July 19, 2014 at 11:28 am
says:
‘Let’s see.
We know there are 4 best and 4 worst.
It might not be an oversight to not name them.
Hint. Modelers and those who evaluate models generally don’t identify the best versus the worst.’
I figure if I don’t ask a stupid question at least once a month I’ll ruin my reputation, so here goes: Couldn’t the worst models be considered comparable to a control group in an evaluation of the validity of the conclusions reached through an analysis of what are considered the best models? Similar to the evaluation of, say, a pharmaceutical where a control group is not given the drug in question so as to determine the effectiveness of this drug in those to whom it is administered, wouldn’t the worst models function in a similar manner to the aforementioned pharmaceutical control group? Would it be a mere omission not to include them? And, should that not be standard practice?
(P.S. Judging from many of the replies you receive from your comments I’ve come to the conclusion, Mr. Steven Mosher, that you must have a thick skin. An admirable quality. I salute you.)

SIGINT EX
July 20, 2014 10:33 am

The anointed hour has come.
Checked the nature web site and preview abstract and reduced figures, reference and all.
So reassuring that Nature is printed on recyclable paper.
The modes of the distributions; interesting.
The “observations” per say I would not call observations given what has gone into them including various adjustments. Engineering (instrument drift and measurement offset) bias is one thing but fudging (CRU and Hansen for instance) to arrive at a preferred result another.
Oh well. The authors payed the publishing fee and Nature accepted with glee.

July 20, 2014 10:53 am

Published, serious science must be replicable. Unless we are told which models were used, and which were good, and which were poor, there can be no replication. So there is no science in the latest Lew paper.