Extreme Times

Guest Post by Willis Eschenbach

I read a curious statement on the web yesterday, and I don’t remember where. If the author wishes to claim priority, here’s your chance. The author said (paraphrasing):

If you’re looking at any given time window on an autocorrelated time series, the extreme values are more likely to be at the beginning and the end of the time window.

“Autocorrelation” is a way of measuring how likely it is that tomorrow will be like today. For example, daily mean temperatures are highly auto-correlated. If it’s below freezing today, it’s much more likely to be below freezing tomorrow than it is to be sweltering hot tomorrow, and vice-versa.

Anyhow, being a suspicious fellow, I thought “I wonder if that’s true …”. But I filed it away, thinking, I know that’s an important insight if it’s true … I just don’t know why …

Last night, I burst out laughing when I realized why it would be important if it were true … but I still didn’t know if that was the case. So today, I did the math.

The easiest way to test such a statement is to do what’s called a “Monte Carlo” analysis. You make up a large number of pseudo-random datasets which have an autocorrelation structure similar to some natural autocorrelated dataset. This highly autocorrelated pseudo-random data is often called “red noise”. Because it was handy, I used the HadCRUT global surface air temperature dataset as my autocorrelation template. Figure 1 shows a few “red noise” autocorrelated datasets in color, along with the HadCRUT data in black for comparison.

hadcrut3 temperate data pseudodataFigure 1. HadCRUT3 monthly global mean surface air temperature anomalies (black), after removal of seasonal (annual) swings. Cyan and red show two “red noise” (autocorrelated) random datasets.

The HadCRUT3 dataset is about 2,000 months long. So I generated a very long string (two million data points) as a single continuous long red noise “pseudo-temperature” dataset. Of course, this two million point dataset is stationary, meaning that it has no trend over time, and that the standard deviation is stable over time.

Then I chopped that dataset into sequential 2,000 data-point chunks, and I looked at each 2,000-point chunk to see where the maximum and the minimum data points occurred in that 2,000 data-point chunk itself. If the minimum value was the third data point, I put down the number as “3”, and correspondingly if the maximum was in the next-to-last datapoint it would be recorded as “1999”.

Then, I made a histogram showing in total out of all of those chunks, how many of the extreme values were in the first hundred data points, the second hundred points, and so on. Figure 2 shows that result. Individual runs of a thousand vary, but the general form is always the same.

histogram extreme value locations temperature pseudodataFigure 2. Histogram of the location (from 1 to 2000) of the extreme values in the 2,000 datapoint chunks of “red noise” pseudodata.

So dang, the unknown author was perfectly correct. If you take a random window on a highly autocorrelated “red noise” dataset, the extreme values (minimums and maximums) are indeed more likely, in fact twice as likely, to be at the start and the end of your window rather than anywhere in the middle.

I’m sure you can see where this is going … you know all of those claims about how eight out of the last ten years have been extremely warm? And about how we’re having extreme numbers of storms and extreme weather of all kinds?

That’s why I busted out laughing. If you say “we are living today in extreme, unprecedented times”, mathematically you are likely to be right, even if there is no trend at all, purely because the data is autocorrelated and “today” is at one end of our time window!

How hilarious is that? We are indeed living in extreme times, and we have the data to prove it!

Of course, this feeds right into the AGW alarmism, particularly because any extreme event counts as evidence of how we are living in parlous, out-of-the-ordinary times, whether hot or cold, wet or dry, flood or drought …

On a more serious level, it seems to me that this is a very important observation. Typically, we consider the odds of being in extreme times to be equal across the time window. But as Fig. 2 shows, that’s not true. As a result, we incorrectly consider the occurrence of recent extremes as evidence that the bounds of natural variation have recently been overstepped (e.g. “eight of the ten hottest years”, etc.).

This finding shows that we need to raise the threshold for what we are considering to be “recent extreme weather” … because even if there are no trends at all we are living in extreme times, so we should expect extreme weather.

Of course, this applies to all kinds of datasets. For example, currently we are at a low extreme in hurricanes … but is that low number actually anomalous when the math says that we live in extreme times, so extremes shouldn’t be a surprise?

In any case, I propose that we call this the “End Times Effect”, the tendency of extremes to cluster in recent times simply because the data is autocorrelated and “today” is at one end of our time window … and the corresponding tendency for people to look at those recent extremes and incorrectly assume that we are living in the end times when we are all doomed.

All the best,

w.

Usual Request. If you disagree with what someone says, please have the courtesy to quote the exact words you disagree with. This avoids misunderstandings.

 

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
0 0 votes
Article Rating
218 Comments
Inline Feedbacks
View all comments
observa
April 24, 2014 10:20 pm

Reminds one of looking up a street directory and pondering why the street you want is always overlapping the damn pages. Now that’s auto correlation for you producing extreme temperatures particularly in peak hour with the missus navigating.

Lloyd Martin Hendaye
April 24, 2014 10:25 pm

Suggest plotting random-recursive “auto-correlated” Markov Chains, wherein chance-and-necessity determine growth-and-change. For the record, global hedge funds have long adapted quant-model algorithms to Markov-generated series as proxies for trading volume.
As Benoit Mandelbrot noted in studying 19th Century New Orleans cotton futures, such “fractal” (fractional-geometric) patterns, self-similar on every scale, are totally deterministic yet absolutely unpredictable in detail. The same is true, of course, of Edward Lorenz’s celebrated Chaos Theory, whose “Strange Attractors” obey related protocols.
Like many features of population genetics, linguistic’s Zipf’s Law, and so forth, “statistics” is not the end but the beginning of a meta-analytical approach which puts correlation, distribution, and Standard Error (probability) in context of a far deeper mathematical reality. Among other exercises Conway’s “cellular automata”, Group and Information Theory, high-level cryptographic systems, all dance around Emergent Order as a hyper-geometric reality over-and-above pro forma statistical emendations.

Seattle
April 24, 2014 10:26 pm

I think I may have found a mathematical explanation for this.
For a Wiener process (a random walk comprising infinitesimally small random steps), the “Arcsine laws” apply: http://en.wikipedia.org/wiki/Arcsine_laws_(Wiener_process)
Per that page, the arcsine law says that the distribution function of the maximum on an interval, say [0,1], is 2 / pi * arcsin(sqrt(x)).
Differentiating that expression yields the probability density 1/(pi*sqrt(x)*sqrt(1-x))
This yields a plot that looks quite like your histograms!
https://www.wolframalpha.com/input/?i=plot+1%2F%28pi*sqrt%28x%29*sqrt%281-x%29%29

David A
April 24, 2014 10:32 pm

davidmhoffer says:
April 24, 2014 at 8:01 pm
========================================
Thanks, and I am somewhat following. However are not all series defined by an arbitrary start and end point? For instance take the Hardcrut black series, from 1850 to 1840. From an eyeball perspective the extremes, low and high are in the middle.
Yet I cannot debate the second graph of 1000 pseudo runs showing such extremes lumped at both ends. It would seam that in a truly random series the extremes would be as likely to appear anywhere, except for my earlier comment, that the middle third would only be 1/2 as likely to have a minimum or maximum as both the first and last third of the series combined, as it clearly is only one third of the series vs the two thirds composing both ends.

David A
April 24, 2014 10:33 pm

Sorry, from 1850 to 1890.

charles nelson
April 24, 2014 10:34 pm

(From Wikipedia on Benford’s Law.)
In 1972, Hal Varian suggested that the law could be used to detect possible fraud in lists of socio-economic data submitted in support of public planning decisions. Based on the plausible assumption that people who make up figures tend to distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the expected distribution according to Benford’s Law ought to show up any anomalous results.
Has anyone subjected Warmist Climate Data to the Benford Law test?

Cinaed Simson
April 24, 2014 10:39 pm

Willis Eschenbach says:
April 24, 2014 at 8:50 pm
Cinaed Simson says:
April 24, 2014 at 8:06 pm
First, you haven’t shown the data set is stationary – it’s simple an assumption or wild eyed guess.
Dear heavens, my friend, such unwarranted certainty. Of course I measured the mean, the trend, and the heteroskedasticity of the random data. As expected, the random data generator generates stationary data, no surprise there. And I was going to assume that, when I thought no, someone might ask me, and I’ve never checked it … so I did. Stationary.
——
Just glancing at the data, it looks like a random walk with a drift which is known to be non-stationary.
Also, I missed the part where you indicated you were using R to do the auto-correlation calculations and the code used to generate the figures.

Seattle
April 24, 2014 10:42 pm

The arcsine law is pretty easy to use. For example, the chance of a maximum (or, equivalently, minimum) being in the first 1/3rd of the interval is
2 / pi * arcsin(sqrt(1/3)) = 39.2%
and it’s the same with the last 1/3rd of the interval, due to symmetry. The “middle” third only has a 21.6% chance (the remaining amount).

Geoff Sherrington
April 24, 2014 10:57 pm

While the general pattern derived from the statistics of red noise shows more extremes in the end bins, this is a generalisation. Can I surmise that the actual case, rather than a general or synthesised case, should be adopted for making statements about recent climate extremes?

Greg
April 24, 2014 10:59 pm

Bernie says: electronotes.netfirms.com/EN208.pdf
Excellent study. Very interesting. The few, knowledgeable commenters like you are what makes this site a gold mine.

David A
April 24, 2014 11:01 pm

I do not follow the logic in a true random series. When throwing a fair die, each of the six values 1 to 6 has the probability 1/6. Assume that each throw generates a different number for six throws. Is the one and the 6 any more likely to be the first or last throw?

Greg
April 24, 2014 11:06 pm

Seattle says
https://www.wolframalpha.com/input/?i=plot+1%2F%28pi*sqrt%28x%29*sqrt%281-x%29%29
yes, I think we have winner ! Good work.

John
April 24, 2014 11:11 pm

In 1st year physics we learned about the “drunken walk” and soon understood that after n random steps on a one dimensional line, one ended up a distance proportional to sqrt(n) from the starting point. Intuitively most people guess you would have traveled a distance of zero (on average), which is wrong. Is this not the same as saying the extremes are more than likely at the beginning and the end of the series.

Greg
April 24, 2014 11:18 pm

Cinead Simon: “Third, the auto-correlation function is an even function, i.e….,”
someone confusing autocorrelated series and autocorrelation function , or just not reading before sounding off.

Greg
April 24, 2014 11:23 pm

Geoff Sherrington says:
April 24, 2014 at 10:57 pm
While the general pattern derived from the statistics of red noise shows more extremes in the end bins, this is a generalisation. Can I surmise that the actual case, rather than a general or synthesised case, should be adopted for making statements about recent climate extremes?
====
The point is that in making statements about recent changes being “weird” , “unprecedented” or unusual, we should be making probability assessments against Seattle’s graph , not the layman’s incorrect assumption of a flat probability.
The point is to compare the actual case to general synthetic case.

Seattle
April 24, 2014 11:28 pm

“When throwing a fair die, each of the six values 1 to 6 has the probability 1/6. Assume that each throw generates a different number for six throws. Is the one and the 6 any more likely to be the first or last throw?”
David A, you are right, if the a time series works like that, then the maximum or minimum could occur anywhere within a given interval with equal probability. That kind of time series would be “white noise”.
But, for an autocorrelated “red noise” distribution, where each value is close to adjacent values, the arcsine laws apply as I mentioned above.
But which kind of power spectrum does the climate have?
To be red noise, it would have to be 20db (100x more power) for each decrease of one decade (log scale) in frequency (i.e. 0.1x frequency). On a log-log graph of power spectrum, the slope would be -2.
If we trust this “artist’s rendering of climate variability on all time
scales” – http://www.atmos.ucla.edu/tcd/PREPRINTS/MGEGEC.pdf – it looks relatively flat like white noise.
This graph looks quite a bit more “reddish” – https://www.ipcc.ch/ipccreports/tar/wg1/446.htm
Which one is to be trusted?

Neo
April 24, 2014 11:29 pm

Methinks that Richard Hamming and Julius von Hann might have something to say about this.
Most of the problems that cause the ends to look the worst are related to the ends of the window not syncing with the data set.

John Fleming
April 24, 2014 11:41 pm

My Grand-children may never know what hurricans are….

Seattle
April 24, 2014 11:54 pm

So whenever I look for power spectral density graphs of temperature from different sources, I see similarities to red or pink noise, but basically never blue noise. So I think it’s quite plausible that the effect pointed out in this article is applicable to temperature time series.
I’m not sure about other kinds of time series.

Steve C
April 24, 2014 11:58 pm

A passing linguistic thought: rather than calling it the “Extreme Times Effect”, I’d suggest that it be called the “End Times Effect”. It just seems a shade more appropriate, considering the ever-popular use of the phenomenon to “prove” that we are now in the End Times and are therefore All Doomed!

Greg
April 25, 2014 12:13 am

Seattle: “But which kind of power spectrum does the climate have?”
This assumption that red noise (which is the integral of white noise) is the base-line reference against which all climate phenomena should be measured is erroneous. It is often used to suggest that long period period peaks in spectra are not “statistically significant”.
Since many climate phenomena are constrained by negative feedbacks (like the Plank feedback for temperature) the red noise assumption is not valid for long term deviations which tend to be less than what would be expected under a red noise model. It may work quite well for periods up to a year or two.
http://climategrog.wordpress.com/?attachment_id=897
The end effect will still be there but may be less pronounced than the simplistic red noise model.

April 25, 2014 12:18 am

Draw a sine curve with long period. Give it a uniformly random phase. That’s a stationary, highly autocorrelated time series. Observe it though a short, randomly located window. What do you see? Mostly, a line going up, or a line going down.
Now, if every summit means the world is a baking Sahara desert, every valley is a major ice-age, I would say that seeing things go up and up and up might be just cause for concern.

April 25, 2014 12:24 am

Draw a sine curve with long period. Give it a completely random phase. Now that’s a beautiful stationary and highly autocorrelated time series for you.
Observe it through a short independently chosen window. What do you see? A line going up, or a line going down.
If the peaks mean a global baking Saharan desert and the valleys are massive ice-ages, then seeing the series go up and up and up could be cause for concern if you don’t like the heat.

Greg
April 25, 2014 12:27 am

“When throwing a fair die, each of the six values 1 to 6 has the probability 1/6……”
If you want to use the die as your random number generator you need to build your time series from the cumulative sum of all the throws. Then if you plot the frequency distribution of all the values in that time series, you should (on average) see the something like what Willis provided using R.
Go and log a million dice throws , add them to get the time series and do the distribution plot and let us know if you find something different ;).

Greg
April 25, 2014 12:39 am

https://www.ipcc.ch/ipccreports/tar/wg1/446.htm
Bearing in mind this is log-log it should be perfectly straight for red noise.
It looks very straight from 2 to 10 years. Then breaks to a very different slope for 10-100 and is basically flat beyond 100y.
Bear in mind that much of this is quite simply artificially injected ‘red’ or other noise that it pumped into the models to make the output look a bit more ‘climate-like’. It is a total artifice, not a result of the physical model itself.
This is done to disguise the fact the models are doing little more than adding a few wiggles to the exaggerated CO2 warming they are programmed to produce.
They then do hundreds of runs of dozens of models , take the average (which removes most of the injected noise by averaging it out ) and say LOOK, WE TOLD YOU SO! Our super computer models that cost billions to produce and a generation of dedicated scientist working round the clock PROVE to that within 95% confidence it’s all caused by CO2.
We must act NOW before it’s too late ( when everyone realises it’s a scam).