Guest Post by Willis Eschenbach
I read a curious statement on the web yesterday, and I don’t remember where. If the author wishes to claim priority, here’s your chance. The author said (paraphrasing):
If you’re looking at any given time window on an autocorrelated time series, the extreme values are more likely to be at the beginning and the end of the time window.
“Autocorrelation” is a way of measuring how likely it is that tomorrow will be like today. For example, daily mean temperatures are highly auto-correlated. If it’s below freezing today, it’s much more likely to be below freezing tomorrow than it is to be sweltering hot tomorrow, and vice-versa.
Anyhow, being a suspicious fellow, I thought “I wonder if that’s true …”. But I filed it away, thinking, I know that’s an important insight if it’s true … I just don’t know why …
Last night, I burst out laughing when I realized why it would be important if it were true … but I still didn’t know if that was the case. So today, I did the math.
The easiest way to test such a statement is to do what’s called a “Monte Carlo” analysis. You make up a large number of pseudo-random datasets which have an autocorrelation structure similar to some natural autocorrelated dataset. This highly autocorrelated pseudo-random data is often called “red noise”. Because it was handy, I used the HadCRUT global surface air temperature dataset as my autocorrelation template. Figure 1 shows a few “red noise” autocorrelated datasets in color, along with the HadCRUT data in black for comparison.
Figure 1. HadCRUT3 monthly global mean surface air temperature anomalies (black), after removal of seasonal (annual) swings. Cyan and red show two “red noise” (autocorrelated) random datasets.
The HadCRUT3 dataset is about 2,000 months long. So I generated a very long string (two million data points) as a single continuous long red noise “pseudo-temperature” dataset. Of course, this two million point dataset is stationary, meaning that it has no trend over time, and that the standard deviation is stable over time.
Then I chopped that dataset into sequential 2,000 data-point chunks, and I looked at each 2,000-point chunk to see where the maximum and the minimum data points occurred in that 2,000 data-point chunk itself. If the minimum value was the third data point, I put down the number as “3”, and correspondingly if the maximum was in the next-to-last datapoint it would be recorded as “1999”.
Then, I made a histogram showing in total out of all of those chunks, how many of the extreme values were in the first hundred data points, the second hundred points, and so on. Figure 2 shows that result. Individual runs of a thousand vary, but the general form is always the same.
Figure 2. Histogram of the location (from 1 to 2000) of the extreme values in the 2,000 datapoint chunks of “red noise” pseudodata.
So dang, the unknown author was perfectly correct. If you take a random window on a highly autocorrelated “red noise” dataset, the extreme values (minimums and maximums) are indeed more likely, in fact twice as likely, to be at the start and the end of your window rather than anywhere in the middle.
I’m sure you can see where this is going … you know all of those claims about how eight out of the last ten years have been extremely warm? And about how we’re having extreme numbers of storms and extreme weather of all kinds?
That’s why I busted out laughing. If you say “we are living today in extreme, unprecedented times”, mathematically you are likely to be right, even if there is no trend at all, purely because the data is autocorrelated and “today” is at one end of our time window!
How hilarious is that? We are indeed living in extreme times, and we have the data to prove it!
Of course, this feeds right into the AGW alarmism, particularly because any extreme event counts as evidence of how we are living in parlous, out-of-the-ordinary times, whether hot or cold, wet or dry, flood or drought …
On a more serious level, it seems to me that this is a very important observation. Typically, we consider the odds of being in extreme times to be equal across the time window. But as Fig. 2 shows, that’s not true. As a result, we incorrectly consider the occurrence of recent extremes as evidence that the bounds of natural variation have recently been overstepped (e.g. “eight of the ten hottest years”, etc.).
This finding shows that we need to raise the threshold for what we are considering to be “recent extreme weather” … because even if there are no trends at all we are living in extreme times, so we should expect extreme weather.
Of course, this applies to all kinds of datasets. For example, currently we are at a low extreme in hurricanes … but is that low number actually anomalous when the math says that we live in extreme times, so extremes shouldn’t be a surprise?
In any case, I propose that we call this the “End Times Effect”, the tendency of extremes to cluster in recent times simply because the data is autocorrelated and “today” is at one end of our time window … and the corresponding tendency for people to look at those recent extremes and incorrectly assume that we are living in the end times when we are all doomed.
All the best,
w.
Usual Request. If you disagree with what someone says, please have the courtesy to quote the exact words you disagree with. This avoids misunderstandings.
Bernie
Yes I am an engineer (engineering physics) so I understood your term
Well then your’re the expert.
I still do not see why you refer to an exponential decay. You write out “scalar = 1/(|w|^B” which would be a correct frequency weighting for red if B=1 (AND you are apparently thinking cosine transform, and not FFT as you say).
In my field the production of fractal red noise is ubiquitous for simulating uncertainty. B is a non-integer (>1). It all depends what you wish to use.
As for cosine transform this was a red herring when I was trying to get a handle on what was going on.
Because w is the variable here, an exponential series would be, for example, B^w, not w^B.
Yes I stand corrected. As I say it seemed like a good way to get across the general idea but inaccurate.
But again, do you agree that the symmetric histogram distribution of extremes is down to small sample windows likely sampling local drift?
Thanks for your time.
cd asks: “But again, do you agree that the symmetric histogram distribution of extremes is down to small sample windows likely sampling local drift?”
Essentially – YES! I think this was was the understanding Willis had too, as did many other commenters.
What remains to be agreed is the notion of a “small sample window”. With red noise there is always a lower frequency of larger amplitude so I suspect that the window size does not matter – just the correlation properties. I am working on this.
Best wishes
Bernie
Bernie
I think this was was the understanding Willis had too
This should have been stipulated more explicitly in the post as it was said that:
…any given time window…
This is what threw me as this isn’t true for stationary autocorrelated series – is it? For example, I’ll try to be clearer here, if one were to use large windows (with lengths greater that the range of the autocorrelation function for the series), then I can’t imagine how the type of distribution shown could be reproduced. By range, I mean that for typical autocorrelated stationary series the autocorrelation decreases “exponentially” (you know what I mean) before stabilising about 0 for all lag distances thereafter.
cd-
I think it is pretty much fractal, independent of length. I just complete a program similar to my previous displays of 10 red signals, but here one set of 10 for length-100 signals and the other set of 10 for length-16,000 signals:
http://electronotes.netfirms.com/redguys-SL.jpg
The figure shows the same clustering at the ends as we have been observing, as in the histograms Willis posted. I considered a max/min to be at the end if it was within 0-10% or 90%-100%. By chance, it should have been 20% (4 of the possible 20), but it was 40% to 60% in the examples. For the short sequences, this was 8 of 20 for the figure (five other runs gave 11, 8, 4, 6, and 8 of the 20 possible extremes). For the long sequences, the figure shows 12 of 20 (in the five repeats, this was 8, 12, 6, 5, and 11).
This convinces me that window length does not matter.
I think it is just what red noise is.
Bernie
Bernie
You’ve done one of either of two things (correct me if I’m wrong):
1:
You’re entire series is plotted in each plot? If so then you have drift in most (a trend across the entire series – they’re not stationary). If you were to analyse the autocorrelation for these sets, I suspect they wouldn’t stabilise about 0 (or fit a regression line I suspect that the t magnitude would would be => 2; p = >0.05).
2
You have sampled a larger series and plotted these sub-samples.
Now before you say that the effect is the same, can I say that you need first to derive the autocorrelation for you’re entire series to ensure the series is stationary. If the autocorrelation function does not converge on 0, and thereafter remain relatively constant, then you’re original series isn’t stationary! Which is one of the conditions of the experiment. If the series is stationary, then you need to run the experiment using sample windows greater than the range (the lag distance at which the correlation converges on 0).
To do this you need to create a:
1) stationary series
2) determine the series autocorrelation function
3) ensure that the series is stationary – correlation converges on 0 (for lag which is the range), and thereafter remains constant (for all lags above range). If not then reject and goto 1.
4) sub-sample the series using windows with lengths greater than the range of the autocorrelation function.
However, if you did case 2 then I’m baffled and will, at some time in the future repeat what I suggest.
Oops:
(or fit a regression line I suspect that the t magnitude would would be => 2; p = >0.05).
Should be:
(alternatively fit a regression line, which I suspect would have a slope coefficient different to 0 with a t value => 2 (or 0.05).
Aargh:
(alternatively fit a regression line, which I suspect would have a slope coefficient different to 0 with a t value => 2 (or 0.05).
to
(alternatively fit a regression line, which I suspect would have a slope coefficient different to 0 with a t value => 2 (or p < 0.05).
cd –
My illustrations are simply red noise sequences of potentially indefinite durations, which I happen to start at n=0 and end at n=15999. Each of the 10 examples plotted is thus 16000 long and the length 100 examples are the same sequences from sample 4000 to 4099. The samples of red noise (xr) are simply obtained iteratively from the original white noise (xw) as: xr(m) = xr(m-1) + xw(m) [discrete integration]. I happened to start the sequences with xr(0)=xw(0) but this just adds a particular DC offset that does not effect the indices of min/max. (We could equally well have used your FFT method to get xr). The results seem to me to be fractal.
These are classic “random walks” and while you believe you SEE drifts, it is only a matter or waiting longer to find a return to zero (proven 100 years ago or so). Remember that the “drunkard” finds his way home with probability 1 with a finite number of iterations (may be very large). See my length 16000 examples, second column, middle, which happens to be an outstanding example. Yes – it is counter-intuitive.
You would presumably not attempt to fit, in a meaningful way, a linear regression to white noise. Neither should you attempt to fit a linear regression to red noise – ultimately the slope is zero.
[ In fact, in a larger context, no polynomial (for which a first-order linear regression is an example) should be used to “model” a signal. Polynomials may be useful when applied to local segments of signals for interpolations and/or smoothing purposes (e.g. linear interpolation of tables of trig function values). But polynomials amplitudes all run to + infinity or to – infinity non-locally (they are vertical), and are inherently unsuited to signals that are infinite in time (even if zero at ends), finite amplitude, and thus essentially horizontal. For example, an apparent first-order upslope in global temperature could only be temporary. ]
I think that the one thing we believe, at least empirically, is that the extreme values trend strongly to the ends of any window chosen, due to correlation of successive samples. Perfect insight, as usual, is elusive!
Bernie
Bernie
My job involves producing commercial software. I am the principal designer and programmer of most of our statistical tools used for spatial analysis and modelling, ranging from Kriging (you may know it as a spatial linear regression or GPR), multivariate regression, principal components, stochastic modelling, highly optimised systems for solving large linear and non-linear systems (of the order of 100,000s) to name but a few. So please don’t assume just because I don’t express something that somehow I’m not as up-to-speed as you are. When I say there is likely to be drift in your series – there probably is! But again I don’t have the data.
Neither should you attempt to fit a linear regression to red noise – ultimately the slope is zero.
That’s slightly patronising and not what I said (I’m talking about local drift). That said, what you say will only stand if it the series you’re testing is stationary, it doesn’t matter if the process will ULTIMATELY produce a stationary process if given enough time to evolve as such.
These are classic “random walks” and while you believe you SEE drifts, it is only a matter or waiting longer to find a return to zero (proven 100 years ago or so).
You need to create a stationary series, not one that given enough time will prove to be stationary – remember Willis is talking about sub-sampling a stationary series, not sub-sampling a sample of one that is stationary. By the way you’re the guy with the series, just for fun fit a simple regression line to all YOUR series.
Until you actually create only stationary series (not a portion of something that will undoubtedly become stationary), and then run the test on sub-samples from them you can’t repeat the experiment.
cd-
Apologies If I stepped on toes – it’s difficult to guess what another person knows or does not know from a few online comments.
When you did not seem to recognize that a auto-correlation was self-cross-correlation and thought them to be “algorithms” I had to make an assumption. Then you berated me (twice at least) for not obviously understanding your FFT red process when this was because YOU misled me for two days when you yourself were confusing a “decaying exponential“ with a series of reciprocals. Really!
Anyway, this thread is too old now.
Bernie
Apologies If I stepped on toes
Well please don’t put words into my mouth, you did this with regard to fitting trends to local drift which you then expanded to suggest I was talking about global trends for a stationary series. And then here again. I guess the problem here is that we’re talking cross-purposes most of the time.
When you did not seem to recognize that a auto-correlation was self-cross-correlation and thought them to be “algorithms” I had to make an assumption.
First of all, you never SAID that, you said a cross-correlation WAS THE SAME AS an autocorrelation – it was just terminology! Are you saying that this holds in the general case of cross-correlation then all cross-correlation must hold the same properties as all autocorrelations for finite, discrete series. As I explained to you this is not true:
1) To begin with an autocorrelation is always symmetric, something that cannot be said in the general case of the cross-correlation.
2) For a stationary series the normalized auto-correlation is equal to its normalized autocovariance which cannot be assumed for the generalised cross-correlation.
And because of this you cannot just blankly say they are one in the same – BTW, I’m sure you know all this and this wasn’t what you meant by it is just “terminology”; I understood you to be saying that autocorrelation is just a special case of cross-correlation – I wasn’t disagreeing with that I was disagreeing with…
The process of breaking a length 1000 random sequence into two length 100 sub-segments and correlating these, as I described, is a cross-correlation. But that is just terminology.
The use of two length segments, by segments did you mean lag distance? This seemed very confused to me, probably because you use a terminology I’m not familiar with.
No I’ve described an autocorrelation. You’re bivariate statistic comes from the same series. Cross-correlation samples two different series.
Now tell me where did I say that autocorrelation was not a special instance of cross-correlation.
Then you berated me (twice at least) for not obviously understanding your FFT red process when this was because YOU misled me for two days when you yourself were confusing a “decaying exponential“ with a series of reciprocals.
Yes I took that on the chin and I’m sorry for berating you (I’ve already said this). I was using the term far to liberally which you corrected me on. Again…I stand corrected.
Bernie
For autocorrelation: by “special case”, I mean the cross-correlation of a series with itself and with regard to symmetry this is for a finite series of real values.
Willis, I found this by accident and it may support the point you made here.
This guy started a Twitter account in 2008:
twitter.com/jpbimmer
He has only posted one tweet per year, each tweet being on the same day of the same month. So what’s the big deal? Look at the number of retweets for each tweet. The very first and very last tweets have more retweets than the ones in between. The gap has narrowed since I last checked, though.