People send me stuff. In this case I have received an embargoed paper and press release from Nature from another member of the news media who wanted me to look at it.
The new paper is scheduled to be published in Nature and is embargoed until 10AM PDT Sunday morning, July 20th. That said, Bob Tisdale and I have been examining the paper, which oddly includes co-authors Dr. Stephan Lewandowsky and Dr. Naomi Oreskes and is on the topic of ENSO and “the pause” in global warming. I say oddly because neither Lewandowsky or Oreskes concentrates on physical science, but direct their work towards psychology and science history respectively.
Tisdale found a potentially fatal glaring oversight, which I verified, and as a professional courtesy I have notified two people who are listed as authors on the paper. It has been 24 hours, and I have no response from either. Since it is possible that they have not received these emails, I thought it would be useful to post my emails to them here.
It is also possible they are simply ignoring the email. I just don’t know. As we’ve seen previously in attempts at communication with Dr. Lewandowsky, he often turns valid criticisms into puzzles and taunts, so anything could be happening behind the scenes here if they have read my email. It would seem to me that they’d be monitoring their emails ahead of publication to field questions from the many journalists who have been given this press release, so I find it puzzling there has been no response.
Note: for those that would criticize my action as “breaking the embargo” I have not even named the paper title, its DOI, or used any language from the paper itself. If I were an author, and somebody spotted what could be a fatal blunder that made it past peer review, I’d certainly want to know about it before the paper press release occurs. It is about 24 hours to publication, so they still have time to respond, and hopefully this message on WUWT will make it to them.
Here is what I sent (email addresses have been link disabled to prevent them from being spambot harvested):
===============================================================
From: Anthony
Sent: Friday, July 18, 2014 9:01 AM
To: james.risbey at csiro.au
Subject: Fw: Questions on Risbey et al. (2014)
Hello Dr. Risbey,
At first I had trouble finding your email, which is why I sent it to Ms.Oreskes first. I dare not send it to professor Lewandowsky, since as we have seen by example, all he does is taunt people who have legitimate questions.
Can you answer the question below?
Thank you for your consideration.
Anthony Watts
—–Original Message—–
From: Anthony
Sent: Friday, July 18, 2014 8:48 AM
To: oreskes at fas.harvard.edu
Subject: Questions on Risbey et al. (2014)
Dear Dr. Oreskes,
As a climate journalist running the most viewed blog on climate, I have been graciously provided an advance copy of the press release and paper Risbey et al. (2014) that is being held under embargo until Sunday, July 20th. I am in the process of helping to co-author a rebuttal to Risbey et al. (2014) I think we’ve spotted a major blunder, but I want to check with a team member first.
One of the key points of Risbey et al. is the claim that the selected 4 “best” climate models could simulate the spatial patterns of the warming and cooling trends in sea surface temperatures during the hiatus period.
But reading and re-reading the paper we cannot determine where it actually identifies the models selected as the “best” 4 and “worst” 4 climate models.
Risbey et al. identifies the 18 originals, but not the other 8 that are “best” or “worst”.
Risbey et al. presented histograms of the modeled and observed trends for the 15-year warming period (1984-1998) before the 15-year hiatus period in cell b of their Figure 1. So, obviously, that period was important. Yet Risbey et al. did not present how well or poorly the 4 “best” models simulated the spatial trends in sea surface temperatures for the important period of 1984-1998.
Is there some identification of the “best” and “worst” referenced in the paper that we have overlooked, or is there a reason for this oversight?
Thank you for your consideration.
Anthony Watts
WUWT
============================================================
UPDATE: as of 10:15AM PDT July 20th, the paper has been published online here:
http://www.nature.com/nclimate/journal/vaop/ncurrent/full/nclimate2310.html
Well-estimated global surface warming in climate projections selected for ENSO phase
Abstract
The question of how climate model projections have tracked the actual evolution of global mean surface air temperature is important in establishing the credibility of their projections. Some studies and the IPCC Fifth Assessment Report suggest that the recent 15-year period (1998–2012) provides evidence that models are overestimating current temperature evolution. Such comparisons are not evidence against model trends because they represent only one realization where the decadal natural variability component of the model climate is generally not in phase with observations. We present a more appropriate test of models where only those models with natural variability (represented by El Niño/Southern Oscillation) largely in phase with observations are selected from multi-model ensembles for comparison with observations. These tests show that climate models have provided good estimates of 15-year trends, including for recent periods and for Pacific spatial trend patterns.
of interest is this:
UPDATE2: rebuttal has been posted
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
As has been noted, different aspects of models can be looked at. However if the four that are deemed “best” are the ones that show the smallest rise in global temperature over the last 18 years, would they also not rule out the C in CAGW? If so, they truly are the “best”.
Werner Brozek;
However if the four that are deemed “best” are the ones that show the smallest rise in global temperature over the last 18 years, would they also not rule out the C in CAGW?
>>>>>>>>>>>>>>>
Since we have no information as to what those specific models say going forward, I wouldn’t make that assumption. In fact, my guess is that this is a one-two punch. Here’s four models that got the pause right… well over the oceans any way….skip that whole land thing…. and ignore how accurate they were before the pause…just ignore all those factors…. and look at what they predict for the future…. its worse than we thought!
Jeff Alberts says: “One could say the same of McIntyre and McKitrick. A person’s title or background is irrelevant. The paper should stand or fall on its own merits or shortcomings.”
It depends on what the authors have contributed to the analysis. If the above is a paper focused on physical climate processes, the would question stand: what are the material contributions of Oreskes and Lewandowsky to the physical analysis?. If the answer is “nothing”, it would devalue journal publication as a basis for researchers to assert their credentials.
I’m sure both M&M can give a satisfactory account of their respective contributions to their papers.
I’d probably have payed real money to see Anthony’s face if someone would have told him a few days ago he will write an email to Dr. Oreskes regarding a paper on ENSO/models + the pause™ .
You can’t make that $#!^ up.
I wonder if/how the authors have addressed the issues discussed in this paper.
Hallelujah for Risbey, et al! I can’t tell you how much I thank God for this paper! Many years ago at a Grateful Dead concert, I had an incredible drug-induced epiphany revealing how particle physics and the time-space continuum could be harnessed to make deep fried Twinkies taste seven orders of magnitude more delicious. I’ve kept this secret to myself for decades, never dreaming I could publish such an idea in a prestigious scientific journal like ‘Nature’. (Truth be told, I’m just a lowly geologist who doesn’t know squat about particle physics, the time space continuum, or deep fried Twinkies.) But apparently that’s irrelevant. Now that ‘The Journal Nature’ has published Oreskes’ and Lewandowski’s ENSO hallucinations, they can’t possibly deny publishing mine.
schitzree says:
July 19, 2014 at 2:02 pm
I Wouldn’t have posted this before the end of the embargo. It just leaves you open to criticism for no real benefit. Either they read your e-mail and take steps to check out any problems you point out, or they don’t. If the don’t, then you’ve got something worth righting about AFTER the embargo is lifted.
Well, except by posting this, now, we don’t have to take anyone’s word that there was an attempt to discuss the matter before the end of the embargo.
Steven Mosher says:
July 19, 2014 at 12:41 pm
hooofer
Simple fact is that the avergae of models is a better tool than any given one.
deal with it.
================
Do you mean that as a general observation, or is the scope of that remark confined to the 18 climate models in question here?
By “better tool” do you mean more consistent with observations? How do you judge performance? Do you account for differences in inflection points in your measurement?
Is not an average of a bunch of models simply another model? Does that imply that some kind of averaging process internal to a model makes it a better model? How so? Is it always the case that increasing the number of models in the “average” increases the accuracy? Is it a linear improvement or something else?
To make it a “better tool”, do you have to apply weights (non-unit)? How are these weights derived? What kind of average is it? Arithmetic? Geometric? Harmonic?
I’d be interested to know on what theory you base your assertion, because, for the life of me, I can’t see it.
NB: I’m not attempting to debate, as I’m just a dumb kid. I really want to learn.
===========================================================
True, I’m just a layman here but, if the models aren’t identified then “”4 best and 4 worst” is a matter of subjective rather than objective evaluation.
The 4 projections that are closest to observations are the the 4 best. The 4 that diverge the most from observations are the 4 worst. That seems pretty simple.
I haven’t read all the comments but has anyone asked just how long ago the models’ projections were made versus the real-time observations?
If I’m shooting a rifle but my aim is off a little bit, I might still get a bulls-eye if the target is only 5 feet away. If it’s a 100 yards away…..?
Be inclined to include one of Risbey’s bosses at CSIRO in the communications. Unlike the others he earns the Queen’s shilling for doing directed research and is accountable internally and to those funders for the quality of what he produces.
I’m stunned you didn’t say “Mannian blunder” or “Phil Jones-like blunder”.
Suppose we have a trend line, and we attempt to compare it to a “drunkard’s walk”. We model the drunkard’s walk in three implementations — one with the toss, heads/tails of a coin, one with red/black on a roulette wheel, and one with odd/even spots on a thrown dice. The points of the “walk” zig zag up and down, heads red odd, heads black even, tails red even,…
As some point, we stop. We get to choose when to stop. If the model looks close to our target line, we can stop earlier. If not, we can keep modeling…
One of the three models will — very likely –be closer to the target trend than the other two. It’s not likely all three will be close to the trend, or each other. But given the choice to decide which model most closely matches the target, we can identify a winner. (If not, we can keep tossing coins, spinning the wheel, and throwing the dice.)
Now, having modeled a random walk process, and found at least one such model that better matches the measured trend than others, what have we proved about the target trend of interest? Have we in fact provided evidence that the trend IS a drunkard’s (random) walk, or are we at least more sure it’s a random walk now, than before we ran our models?
And does it advance our knowledge of the drunkard’s future path to specify a throw of dice is a better model of the past trend than a toss of a coin?
“dp says:
July 19, 2014 at 2:57 pm
Steven Mosher says:
July 19, 2014 at 10:34 am
Omission is a better word than blunder
I’m stunned you didn’t say “Mannian blunder” or “Phil Jones-like blunder”.
#####################
measured language is better
Lewandowsky is a social psychologist. The behavioral sciences now push the idea that it is beliefs about reality that guide future behavior. This paper is also designed to influence and confirm those beliefs. Very naughty to actually read carefully and peruse those footnotes and discover this omission.
I got last week’s FDEUF award. Footnote Diving and Extraction of Useful Facts Award. Looks like this will be next week’s. Good job.
Do you mean that as a general observation, or is the scope of that remark confined to the 18 climate models in question here?
1. general observation about all the models
By “better tool” do you mean more consistent with observations? How do you judge performance? Do you account for differences in inflection points in your measurement?
1. pick your skill metric.. but more consistent yes.
Is not an average of a bunch of models simply another model?
1. A+ answer
Does that imply that some kind of averaging process internal to a model makes it a better model?
1. no
How so? Is it always the case that increasing the number of models in the “average” increases the accuracy? Is it a linear improvement or something else?
1. Not always the case. I never looked at the improvement stats
To make it a “better tool”, do you have to apply weights (non-unit)? How are these weights derived? What kind of average is it? Arithmetic? Geometric? Harmonic?
1. weights are a big debate. currently no weights
I’d be interested to know on what theory you base your assertion, because, for the life of me, I can’t see it.
1. No theory. pure fact. If you take the mean of the models you get a better fit. why? dunno.
just a fact.
Jordan wins the thread. Thanks for pointing out the ignorance of expecting an average to be better “just because” it is an average. I also applaud you noting that if the estimators are all unbiased, then they should all be used in the average. Picking only “the best” implies there was no rigor in the selection process, merely an eyeball match. This is also a tacit admission the models are not unbiased, nor do they constitute an ensemble (which means their average is physically meaningless).
For that matter, how is “best” defined? This word is akin to “optimal,” which is meaningless without context. For example, “best with respect to minimum mean square error” actually sets forth the criteria by which “best” was determined.
Mosher, seriously, invest in a book on statistical signal processing. Then read it. Then ask questions.
Mark
Any model that “predicted” the pause must be insensitive to CO2. Looking forward to finding out which input parameters were used and how much they were weighed.
“Jordan says:
July 19, 2014 at 1:56 pm
“Simple fact is that the avergae of models is a better tool than any given one”
Only if the models are unbiased estimators for the variables of interests.
Not really. in fact they are biased and weirdly averaging them gives you the best answer. just fact.
Steiner said Mr. Watts is interested in collecting more brownie points towards sainthood..
Wrong: Anthony achieved climate sainthood long ago.
Musher says (hey, he called me hoofer first!)
I never looked at the improvement stats
Followed by:
If you take the mean of the models you get a better fit. why? dunno.
just a fact.
You’ve never looked at the stats yet consider it a fact? LOL.
I am 5′ 10″ tall, can’t jump, can’t dribble, but I beat my wife at basketball. I am the best b’ball player in this house. Hey Lakers, when can I sign the contract?
Link didn’t show up. this paper was meant.
Steven Mosher:
At July 19, 2014 at 12:41 pm you say
Simple fact is that average wrong is wrong. Face it and live with it.
Richard
I notice [Mosher] avoids the statistical challenges. That is because he knows, deep down, that he is full of sh*t.
[Note: edited to fix a mispelling Kosher to Mosher – Anthony]
Sorry, my stupid tablet seems to think it knows how to auto-correct my block quotes. Here is the correct version (please delete the previous):
I notice Mosher avoids the statistical challenges. That is because he knows, deep down, that he is full of sh*t.
No kidding. You are quite blind to any theory regarding statistics – that much we can all be sure of.
Of course, without any theory, this phrase is simply nonsense. Let us all just make our own facts and … Wait a minute, we already have enough climate scientists doing just that.
Except when you don’t. That is almost what Mann does with his reconstructions, hence we have divergence. Further more, “better” with respect to what? Eyeball wiggle matching?
Of course you don’t; you have no idea what you are doing, yet you seem unhindered by that truth when commenting on statistical processing methods (yes, an average is a statistical processing method). Guess what, I bet I DO know why, and it is identical to the reason Mann can find wiggles that match anything he wants in even ordinary tea leaves: spurious relationships.
Mark
Steven Mosher: “Simple fact is that the [average] of models is a better tool than any given one.“.
Odd that it’s a technique that isn’t used for sunspot cycle prediction, or, AFAIK, for anything else. Generally, the range of predictions is used as an indication of uncertainty, ie. it is used as … the range of predictions.