Uh oh, a significant error spotted in the just released IPCC AR5 SPM

From the “(pick one: 90% 95% 97%) certainty department, comes this oopsie:

Via Bishop Hill:

=============================================================

Doug Keenan has just written to Julia Slingo about a problem with the Fifth Assessment Report (see here for context).

Dear Julia,

The IPCC’s AR5 WGI Summary for Policymakers includes the following statement.

The globally averaged combined land and ocean surface temperature data as calculated by a linear trend, show a warming of 0.85 [0.65 to 1.06] °C, over the period 1880–2012….

(The numbers in brackets indicate 90%-confidence intervals.)  The statement is near the beginning of the first section after the Introduction; as such, it is especially prominent.

The confidence intervals are derived from a statistical model that comprises a straight line with AR(1) noise.  As per your paper “Statistical models and the global temperature record” (May 2013), that statistical model is insupportable, and the confidence intervals should be much wider—perhaps even wide enough to include 0°C.

It would seem to be an important part of the duty of the Chief Scientist of the Met Office to publicly inform UK policymakers that the statement is untenable and the truth is less alarming.  I ask if you will be fulfilling that duty, and if not, why not.

Sincerely, Doug

============================================================

To me, this is just more indication that the 95% number claimed by IPCC wasn’t derived mathematically, but was a consensus of opinion like was done last time.

Your article asks “Were those numbers calculated, or just pulled out of some orifice?” They were not calculated, at least if the same procedure from the fourth assessment report was used. In that prior climate assessment, buried in a footnote in the Summary for Policymakers, the IPCC admitted that the reported 90% confidence interval was simply based on “expert judgment” i.e. conjecture. This, of course begs the question as to how any human being can have “expertise” in attributing temperature trends to human causes when there is no scientific instrument or procedure capable of verifying the expert attributions.

The IPCC's new certainty is 95% What? Not 97%??

So it was either that, or it is a product of sleep deprivation, as the IPCC vice chair illustrated today:

IPCC_vicechair_tired_tweet

There’s nothing like sleep deprived group think under deadline pressure to instill confidence, right?

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

144 Comments
Inline Feedbacks
View all comments
Nullius in Verba
September 27, 2013 4:06 pm

Nick,
“Would you care to quote Julia Slingo saying AR(1) was rubbish?”
“However, considering the complex physical nature of the climate system, there is no
scientific reason to have expected that a linear model with first order autoregressive noise
would be a good emulator of recorded global temperatures, as the ‘residuals’ from a linear
trend have varying timescales ”
You could have found that yourself. Doug linked to it.
“But the IPCC is not claiming that linear+AR(1) is the best model of temperature. They are simply using it as the basis for calculating temperature change over the period.”
They’re claiming that these are 90% confidence intervals on the actual temperature rise. And implying, to a statistically non-literate audience who wouldn’t recognise the issues with the AR(1) choice, that they would be justified in thinking these have a 90% probability of covering the value that is being estimated. (Which from a Bayesian point of view is not true, either. That would be a ‘credible interval’, not a ‘confidence interval’. A common error, that.)
I know there’s this thing about “not giving ammunition to sceptics”, but wouldn’t it be simpler, more straightforward, and a lot less desperate, on hearing that the IPCC was basing its confidence intervals on an linear+AR(1) model, to simply say: “That’s wrong; they ought to have either picked a better model, explained the difficulty, or not given confidence intervals at all”?

Nick Stokes
September 27, 2013 4:31 pm

Nullius in Verba says: September 27, 2013 at 4:06 pm
That’s not Julia Slingo saying AR(1) is rubbish. She’s saying it would not be a good emulator of global temperature. Noone ever thought it would be. And that’s nothing to do with AR(1). No linear model would be a good emulator. Nor is Keenan’s model. When he says it is a thousand times more likely, that covers over the fact that it is still impossibly unlikely.
That’s not the point. It’s used as a basis for computing the difference between temperatures at two times. Regression fits are used for this in all kinds of fields, and they work well.

nevket240
September 27, 2013 4:41 pm

Whatever happened to 1850 to the present??
Or is that Inconvenient??
regards.

tom0mason
September 27, 2013 4:45 pm

Manny M says:
September 27, 2013 at 11:38 am
Ian W says:
September 27, 2013 at 11:51 am
Bill Parsons says:
September 27, 2013 at 1:21 pm
You all may be interested in this little snippet from NASA about shrinking atmosphere and the cooling effects of CO2 in the thermosphere.
http://www.nasa.gov/topics/earth/features/coolingthermosphere.html

Nullius in Verba
September 27, 2013 5:11 pm

“It’s used as a basis for computing the difference between temperatures at two times.”
And reporting confidence intervals. It’s the confidence intervals that are the issue.
And it isn’t a computation of the difference in temperatures at two times. To do that, you would subtract the temperature at one time from the temperature at the other. It’s a much simpler process. What you’re trying to do is something a lot more complicated – by “temperature” you don’t mean the temperature, but an underlying equilibrium temperature due to forcing that has short-term weather superimposed on top of it – a purely theoretical concept that assumes that’s how weather works. You’re trying to estimate the change in the underlying equilibrium, and using a low-pass filter to cut out the high-frequency ‘noise’ – a process that requires accurate statistical models of both signal and noise to do with any quantifiable validity.
The mainstream constantly conflate these two concepts – the observed temperature and the underlying equilibrium temperature – because it gives the impression that the statements are about direct empirical observation, while actually being about an unobservable parameter in a question-begging assumed model.
Had they simply given the OLS trend, you could have argued that it was merely informally descriptive, a rough and unscientific indication of how much temperatures generally had gone up, without making any comment on its significance. However, they stuck a confidence interval on it. Worse, they said there was a 90% likelihood of it covering the quantity being estimated. That gives the impression of a scientifically testable statistical statement. But the “confidence interval” here is a meaningless pair of numbers, because it relies for its validity on an assumption known not to be true.
“Regression fits are used for this in all kinds of fields, and they work well.”
Sadly so. That doesn’t make it right, though.
You might well know what they’re doing and that such estimates are to be treated cautiously, but the intended readers of this report don’t. They read it as authoritative science, and if they see confidence intervals being written down, by scientists, they’re going to assume they’re meaningful. In this case, as in so many others, they’d be wrong.

Nick Stokes
September 27, 2013 5:28 pm

Nullius in Verba says: September 27, 2013 at 5:11 pm
“But the “confidence interval” here is a meaningless pair of numbers, because it relies for its validity on an assumption known not to be true.”

They have given an estimate, and the basis on which it was calculated. And they have given confidence intervals for that calculation. That’s appropriate.
I agree that AR(1) is not the only basis for calculating confidence intervals, and there is a case for others (discussed here). But it’s not meaningless.

Brian H
September 27, 2013 6:32 pm

Not entirely meaningless, but surely deceptive. It is written in the Summary in such a way as to create a false impression. What it means is not stated clearly. SOP.

Michael Jankowski
September 27, 2013 6:36 pm

You’re right…it’s not “meaningless.” In fact, I’d say it means a lot that they selected an inappropriate basis to determine their estimate and confidence intervals.

Nullius in Verba
September 27, 2013 7:01 pm

“I agree that AR(1) is not the only basis for calculating confidence intervals, and there is a case for others (discussed here). But it’s not meaningless.”
AR(1) is the wrong basis for calculating confidence intervals.
It’s meaningless if it’s based on an untrue assumption. Policy makers need to know how accurately you can state the amount of global warming observed. This does not answer that question.
And the IPCC didn’t fully explain the basis on which it is calculated – that’s something we had to deduce from what they did last time around (and buried in an appendix to the main report), and the fact that the interval they report this time matches that method. What the IPCC say is that we can be confident there’s a 90% likelihood that this interval covers the amount of global warming there has actually been. That’s not true.

lund@clemson.edu
September 27, 2013 7:23 pm

(Fake ‘David Socrates’ sockpuppet ID -mod)

jorgekafkazar
September 27, 2013 7:50 pm

The 0.65 and 1.06 °C figures are obviously P.O.O.M.A. numbers. [That stands for Preliminary Order of Magnitude Approximation. Really it does.]

TomRude
September 27, 2013 8:56 pm

Thomas Stocker had the best line of the IPCC press conference, claiming in substance that we do not have enough data about the last 15 years to properly evaluate the “hiatus”. Really not enough data in the past 15 years!!!! That have been the most instrumented, observed period ever… except that it showed no warming.
This guys deserves a IgNoble prize, just for that one!!!

Colorado Wellington
September 27, 2013 10:26 pm

To me, this is just more indication that the 95% number claimed by IPCC wasn’t derived mathematically …

Maybe, but if so, it was at least the result of impeccable risk assessment logic:
The clients must receive what they specified. Otherwise they will defund the project.

rtj1211
September 28, 2013 12:06 am

Well, so long as the MSM censor dissent, does it matter??
The Guardian is back to censoring again – censored a within the rules challenge to Liberal Democrat (the great Greenies of our major UK political parties) Tim Farron to face reality.
I do wonder whether they have the honesty to draw out historical coverage of the Duma under Brezhnev and compare it to how they write some tripe and get fawning Kommisar after Kommisar to say ‘oh wonderful benefactor, how wise you are!’?
It’s really getting beyond a joke.

Nullius in Verba
September 28, 2013 1:21 am

“I would say: An ARIMA(3,1,0)? Surely you jest in saying that is a thousand times more likely? I would sure like to see that likelihood comparison.”
Follow the earlier Met Office discussion at Bishop Hill. There are links back to Doug’s calculations, which Slingo confirms.
“Do prove me wrong, but the model you propose has a random walk component, meaning the variance increases linearly in time. That is clearly not the case with this data. What you propose isn’t even a stationary model, which should be the null hypothesis of any climate change argument.”
It’s an approximation for a subset of data, like a linear trend is.
It’s a standard procedure in time series analysis – if there are roots of the characteristic equation very close to the unit circle, it makes any short enough segment of the series look approximately as if it was on the unit circle (i.e. random-walk-like), and a lot of the standard tools don’t work or give invalid answers. So the standard approach on analysing a new time series is to first test for unit roots, and if “found”, take differences until the result is definitely stationary. It’s an approximate measure to handle situations when you don’t have a long enough sample to fully explore the data’s behaviour, and to avoid getting misleading results because of that.
Think of it as like the situation you get with the series x(t+1) = 0.999999999 x(t) + rand(t) where rand(t) is a zero-mean Gaussian random number series. Technically it’s AR(1) and stationary, but over any interval short of massive it’s going to look indistinguishable from x(t+1) = 1 x(t) + rand(t), which is a random walk. You don’t have enough data to resolve the difference.
Usually, after testing for unit roots and taking differences, the next step is to test to find what ARMA process best fits the result. This is where the ARIMA(3,1,0) model came from – it is the ARIMA process that best fits the short-term behaviour of the data. The process is analogous to fitting a polynomial to a short segment of a function to model its curves. It’s a local approximation that is not expected to apply indefinitely.

Brandon Shollenberger
September 28, 2013 2:12 am

While I agree the AR(1) model is lacking, I can’t understand why people would endorse Keenan’s letter when he seriously suggests the “correct” error margins might include zero. Does anyone actually think we shouldn’t be able to rule out the possibility of no warming in the last 100+ years?

September 28, 2013 2:28 am

“Nor is Keenan’s model. ”
Din not Keenan say repeatedly that he wasn’t advocating ‘his’ model, but merely using it to illustrate his point?

Nullius in Verba
September 28, 2013 3:14 am

Brandon,
Depends what you mean by “correct”.
My view is that all this talk about whether changes in temperature are “significant” or not are meaningless without a validated statistical model of ‘signal’ and ‘noise’ derived independently of the data, which we don’t have. We don’t know the statistical characteristics of the normal background variability precisely enough, so it is simply impossible to separate any ‘global warming signal’ from it. All these attempts where you make nice neat mathematical assumptions simply get out what you put in, and your conclusion depends on what you assumed. If you assumed a trend you’ll find a trend. If you assume no trend, you’ll find there’s no trend. Doug’s ARIMA(3,1,0) is merely a standard example derived by the textbook method to illustrate that point.
But it’s got no independent validation, either, so it’s no more “correct” than anything else we could do. It’s simply a better fit.
There are no correct error margins because we don’t have an independent, validated model of the errors. We cannot rule out, by purely statistical means, the possibility of no warming in the last 100+ years. And the IPCC’s confidence intervals are just the same sort of significance testing in disguise.
However, I don’t expect the mainstream is ready to accept that one, so I’ll let it pass. That you accept that linear+AR(1) is “lacking” is a good start, and sufficient for the time being.

JPeden
September 28, 2013 4:15 am

mwhite says:
September 28, 2013 at 3:22 am
http://stevengoddard.files.wordpress.com/2013/09/screenhunter_1013-sep-28-00-13.jpg
That’s pretty funny.

Craig
September 28, 2013 7:28 am

Maybe this is too simplistic of a way to look at it, but say you have a system with several subsystems, each of which you are 99% confident that you have a sufficient understanding to model accurately. If there are 6 or more of these subsystems, is it possible for you to be 95% confident of the accuracy of your model of the entire system? 0.99^6 = 94.1%
Given that the climate has well over six subsystems, few of which if any, we are 99% confident that we can model accurately, let alone any interactive effects, it would seem nonsensical simply from a probability standpoint to claim 95% confidence.

steverichards1984
September 28, 2013 8:27 am

Did anyone notice when the panel were questioned about the pause, that the models can not be used to predict individual rain showers or storms but can be used to show you the trend?
It seems to have escaped them that their models are not good a predicting trends either.
If your trend predicting model does not map across to recently gathered real measurements, but is consistently producing higher temperature outputs, then your model is wrong and all of your predictions that come from it are wrong.
How many wrongs make a right?

steverichards1984
September 28, 2013 8:37 am

Nullius in Verba:
You make some interesting comments about the suitability of various statistical methods to be used on different occasions.
Could you state what you currently feel would be the best sequence to use to analyse the various global temperature datasets?
I would really like to know what a person skilled in statistical analysis would say is the correct method or sequence to use.
Its a shame that most statistical methods give answers irrespective on whether the method should have been used in a particular case.

p@ Dolan
September 28, 2013 10:36 am

I have seen this before in my own profession, where I’ve been asked to provide a percentage of accomplishment in a project which involves research into unknowns— First, how can i put a finite number value on an open-ended research project? Secondly, since the research is hardly a linear process, how can any percentage of completion be anything but an ‘idiot meter’ indication of MY confidence that I’ll be done by the deadline I’m assigned?
Answers, respectively: I cannot, it cannot.
Everyone knows this, though several try to pretend it’s not the case. For many, who are stymied in a phase of a project, but wish to show that they’re actually making progress so as to not alarm someone higher up in the food-chain, starting with a very low number and leaving themselves lots of room to up the percentage as time goes by is a familiar strategy which allows them to show “progress” even when there is none.
Trouble occurs as the deadline looms, and you have to show progress but have less and less room before hitting 100%. Where weeks ago you could make 10% per week, at 85%, you can’t. The “idiot meter” indications now take on a definite asymptotic curve, approaching completion.
At what point to people stop and say, “You’re Bee Essing me, right?”
Apparently, the IPCC and their believers aren’t at that point yet, and think they still have room to increase the numbers…we also know that they create more room by lowering their starting point…
Anyone who puts confidence in an “idiot meter” indication…. Well, there’s a REASON we call them “idiot meters”…

Nullius in Verba
September 28, 2013 12:17 pm

“Could you state what you currently feel would be the best sequence to use to analyse the various global temperature datasets?”
The primary problem, like I said, is the lack of a validated model for the background noise. This is where effort needs to be concentrated. Until we have one, none of the methods are going to give reliable answers.
More sophisticated studies in detection and attribution (‘detection’ is what we’re looking at here) use the big climate models to generate statistics on the background variation. There’s a bit more reason to pay attention to these, since they are at least partially based on physics. But there are lots of approximations and parameterisations and fudge-factors galore, they’re not validated in the sense required, they don’t match observations of climate in many different aspects and areas, and their failure to predict the pause (or rather, pauses of a similar length and depth) falsifies them even at the game they were primarily designed and built to play – global temperature.
Because they’re not validated, the fact that they can’t produce a rise like 1978-1998 doesn’t mean anything. But because they’re not validated, the fact they can’t produce a pause like 1998-2013 doesn’t mean anything either, except that one way or another the models are definitely invalidated. The pause doesn’t show that global warming theory is wrong, because it could just be that the models underestimated the natural variation.
If, one day, they can build a climate model that can be shown to predict climate accurately over the range of interest, then the correct approach would be to use this to generate statistics on trend sizes over various lengths, and use that to perform the trend analysis and confidence intervals and so on. That would be the right way of doing it. But we’re not there yet.
“I would really like to know what a person skilled in statistical analysis would say is the correct method or sequence to use.”
Thanks! But I’d only describe myself as ‘vaguely competent’ not ‘skilled’. There are a lot of people far better at this stuff than me!