**Guest Post by Willis Eschenbach**

I read a curious statement on the web yesterday, and I don’t remember where. If the author wishes to claim priority, here’s your chance. The author said (paraphrasing):

If you’re looking at any given time window on an autocorrelated time series, the extreme values are more likely to be at the beginning and the end of the time window.

“Autocorrelation” is a way of measuring how likely it is that tomorrow will be like today. For example, daily mean temperatures are highly auto-correlated. If it’s below freezing today, it’s much more likely to be below freezing tomorrow than it is to be sweltering hot tomorrow, and vice-versa.

Anyhow, being a suspicious fellow, I thought* “I wonder if that’s true …”*. But I filed it away, thinking, *I know that’s an important insight if it’s true … I just don’t know why …*

Last night, I burst out laughing when I realized why it would be important if it were true … but I still didn’t know if that was the case. So today, I did the math.

The easiest way to test such a statement is to do what’s called a “Monte Carlo” analysis. You make up a large number of pseudo-random datasets which have an autocorrelation structure similar to some natural autocorrelated dataset. This highly autocorrelated pseudo-random data is often called “red noise”. Because it was handy, I used the HadCRUT global surface air temperature dataset as my autocorrelation template. Figure 1 shows a few “red noise” autocorrelated datasets in color, along with the HadCRUT data in black for comparison.

* Figure 1. HadCRUT3 monthly global mean surface air temperature anomalies (black), after removal of seasonal (annual) swings. Cyan and red show two “red noise” (autocorrelated) random datasets.*

The HadCRUT3 dataset is about 2,000 months long. So I generated a very long string (two million data points) as a single continuous long red noise “pseudo-temperature” dataset. Of course, this two million point dataset is stationary, meaning that it has no trend over time, and that the standard deviation is stable over time.

Then I chopped that dataset into sequential 2,000 data-point chunks, and I looked at each 2,000-point chunk to see where the maximum and the minimum data points occurred in that 2,000 data-point chunk itself. If the minimum value was the third data point, I put down the number as “3”, and correspondingly if the maximum was in the next-to-last datapoint it would be recorded as “1999”.

Then, I made a histogram showing in total out of all of those chunks, how many of the extreme values were in the first hundred data points, the second hundred points, and so on. Figure 2 shows that result. Individual runs of a thousand vary, but the general form is always the same.

*Figure 2. Histogram of the location (from 1 to 2000) of the extreme values in the 2,000 datapoint chunks of “red noise” pseudodata.*

So dang, the unknown author was perfectly correct. If you take a random window on a highly autocorrelated “red noise” dataset, the extreme values (minimums and maximums) are indeed more likely, in fact twice as likely, to be at the start and the end of your window rather than anywhere in the middle.

I’m sure you can see where this is going … you know all of those claims about how eight out of the last ten years have been extremely warm? And about how we’re having extreme numbers of storms and extreme weather of all kinds?

That’s why I busted out laughing. If you say “we are living today in extreme, unprecedented times”, mathematically you are likely to be right, *even if there is no trend at all*, **purely because the data is autocorrelated and “today” is at one end of our time window!**

How hilarious is that? We are indeed living in extreme times, and we have the data to prove it!

Of course, this feeds right into the AGW alarmism, particularly because any extreme event counts as evidence of how we are living in parlous, out-of-the-ordinary times, whether hot or cold, wet or dry, flood or drought …

On a more serious level, it seems to me that this is a very important observation. Typically, we consider the odds of being in extreme times to be equal across the time window. But as Fig. 2 shows, that’s not true. As a result, **we incorrectly consider the occurrence of recent extremes as evidence that the bounds of natural variation have recently been overstepped** (e.g. “eight of the ten hottest years”, etc.).

This finding shows that we need to raise the threshold for what we are considering to be “recent extreme weather” … because even if there are no trends at all we are living in extreme times, so we should expect extreme weather.

Of course, this applies to all kinds of datasets. For example, currently we are at a low extreme in hurricanes … but is that low number actually anomalous when the math says that we live in extreme times, so extremes shouldn’t be a surprise?

In any case, I propose that we call this the “End Times Effect”, the tendency of extremes to cluster in recent times simply because the data is autocorrelated and “today” is at one end of our time window … and the corresponding tendency for people to look at those recent extremes and incorrectly assume that we are living in the end times when we are all doomed.

All the best,

w.

**Usual Request.** If you disagree with what someone says, please have the courtesy to quote the exact words you disagree with. This avoids misunderstandings.

Interesting. Also, how likely would it be that you would want to go into a career in climatology if you believed the climate isn’t changing much and won’t until a few thousand years after you retire. What would you write your thesis on? What would you do every day?

I think this goes to what Lindzen says – one would expect our times to be warmest in a warming climate.

So…any idea why this happens? It seems counter intuitive to say the least

What’s it they say – “So nat’ralists observe, a flea

Hath smaller fleas that on him prey;

And these have smaller fleas to bite ’em.

And so proceeds Ad infinitum.”

The same is true of landscape “hills hath smaller hills on them and these in turn have smaller hills … ad infinitum”.

And the same is true of red/pink noise … the small undulations we see from year to year are just small hillocks on the larger decadal variations, and those in turn are just pimples on the centuries … and when we get to the millennium, those are just small fluctuations on the interglacial, then the epochs.

sombeach!

The statement I had heard before….just never would have connected it to this!

This seems to be an example of Benford’s distribution, or Benford’s Law as it is sometime called. If you take, say Bill Clinton’s tax forms, or any of hundreds such data, the number 1 will occur most frequently as the first number in the the data set and 9 will be the least frequent. It is why, in the old days, that the first several pages of a book of log tables get worn out and dog-eared.

http://en.wikipedia.org/wiki/Benford%27s_law

weird stuff

By the way – I beleive this was the article that discussed the tendency.

At What Time of Day do Daily Extreme Near-Surface Wind Speeds Occur?

Robert Fajber,1 Adam H. Monahan,1 and William J. Merryfield2

http://journals.ametsoc.org/doi/abs/10.1175/JCLI-D-13-00286.1?af=R&

That was very well explained Willis. Thanks.

Making much ado about many of the years within the most recent string of years being near the recent extremes was one of the first disingenuous tactics of the CAGW alarmists. Even when warming stops, they can continue that scam for many years to come.

When I grow up, I wanna be a statistician. Then I won’t have to tell my Mom I’m a piano player in a whorehouse (kudos to HSTruman).

You, Willis, are a man among men!

Could this be a manifestation of what in some circles is referred to as “the trend’s your friend”?

I suspect that the explanation might be as simple as follows: a) in a dataset such as you describe, it is generally true that there will be will be long-term variations with period longer than the time period of the dataset. That is, a Fourier analysis of the “full” data series (i.e. the data before a chunk was cut out) would not be band-limited to the period of the sample. b) When you cut a chunk from a long-time-period Fourier component, there is a good chance that you will cut a chunk that is either increasing or decreasing throughout the chunk. When that happens, the end-points of the chunk will be extrema relative to all other points in the chunk.

Sorry – not as simple to explain as I had hoped. A drawing would be easier.

Thanks for sharing your findings. Very relevant to many disciplines, but particularly in recent and current climate discussions.

Gary Bucher’s reference is exactly on-point. Thanks.

Willis: this is another very relevant and surprising observation from your fertile mind. I enjoy your work very much.

relative of Benford’s law

http://en.wikipedia.org/wiki/Benford%27s_law

Benford’s law may be just the tool to reveal fiddled data.

I disagree with the suggestion that this is related to Benford’s law.

You don’t even need to do a Monte Carlo experiment to see why this is the case. Draw a parabola. Now pick a random interval on the x-axis. No matter what interval you pick, at least one endpoint of that interval will be an extreme (if the vertex is not in your interval, then both endpoints will be extremes).

Realize any functional relationship that goes up, down, or both, will have subsets of that relationship that are somewhat parabolic in shape.

So, yeah, the endpoints tend to be extremes.

Michael D says:

April 24, 2014 at 4:38 pm

Not at all. I thought your explanation was clear. I’m not sure if it’s right, certainly sounds reasonable, but either way it gives me something to grab onto. Thanks. 🙂

just providing another laugh:

24 April: Bloomberg: Julie Bykowicz: Steyer Nets $10,050 for $100 Million Climate Super-PAC

Billionaire Tom Steyer is trying to enlist other wealthy donors in a $100 million climate-themed political organization, pledging at least half from himself.

So far, he’s landed one $10,000 check.

Mitchell Berger, a Fort Lauderdale, Florida, lawyer and top Democratic fundraiser, was the lone named donor to NextGen Climate Action Committee in the first three months of the year, a U.S. Federal Election Commission filing shows…

The report notes another $50 in contributions so small that they didn’t need to be itemized.

“Well, if I’m the only donor, I guess it won’t be the last time I’m a donor,” said Berger, chuckling, in a telephone interview. “Although I certainly hope that I’m joined by others at some point.” …

***Berger has spent much of his adult life raising political money and has worked for decades with former Vice President Al Gore, another advocate for addressing climate change. His assessment of Steyer’s goal of securing $50 million from others: “It’s not going to be easy.” …

The donor compares the climate issue to the Catholic Church’s condemnation of Galileo in the early 1600s after the astronomer disputed its pronouncement that the Sun orbits the Earth.

“Things that will appear to be obvious to us in 100 years are not as obvious now,” Berger said. He said he admires Steyer’s goal “to create an undercurrent on climate where it’s possible for politicians to say the Earth travels around the Sun without being excommunicated.”…

Steyer, a retired investor who lives in California, didn’t solicit the donation, Berger said. Rather, Berger volunteered the $10,000 while Steyer was visiting in Florida. Steyer and Berger’s wife, Sharon Kegerreis Berger, are high school and college classmates…

http://www.bloomberg.com/news/2014-04-24/steyer-nets-10-050-for-100-million-climate-super-pac.html

Thanks Willis, that’s pretty cool.

Any mathematical issue that depends upon an integral from minus to plus infinity (correlation, Fourier transform, etc.) is not accurate with a finite series. Hence the great interest in Window Functions: https://en.wikipedia.org/wiki/Window_function

Gary Pearse says:

April 24, 2014 at 4:21 pm

Thanks, Gary. I don’t think it’s related to Benford’s distribution, it’s another oddity entirely.

w.

Michael D – My thoughts exactly. It could perhaps be tested by chopping Willis’ data many times, using a different segment kength each time, and see what pattern emerges. If you are right, some form of cycle should be seen in graph shape vs segment length.

gary bucher says:

April 24, 2014 at 4:22 pm

No, what I saw was a few-line comment on some blog, not a full article. But thanks for that, it’s interesting.

Michael D says:

April 24, 2014 at 4:42 pm

Thanks, Michael. I have a curious and wonderful opportunity, which is that I get to discuss my scientific research publicly here on WUWT pretty much in real-time. It’s great because I get kudos to keep me going, and brickbats to keep me going in the right direction. Plus I get to spring my latest bizarre insight on an unsuspecting public. What’s not to like?

All the best in these most extreme of times,

w.

Wonderful explanation of a wonderful insight, Willis. Just what we expect from you.

Michael D, agreed. This has nothing to do with Benford’s Law. But thanks for bringing my attention to it.

For example, currently we are at a low extreme in hurricanesFor additional things that are extreme now, see:

http://www.theweathernetwork.com/news/articles/extreme-weather-events-not-making-headlines/25948/

Extreme weather events NOT making headlines

Dr. Doug Gillham

Meteorologist, PhD

1. GREAT LAKES ICE COVERAGE

“Current ice coverage is over double the previous record for the date (April 23) which was set in 1996.”

2. SLOW START TO TORNADO SEASON

3. LACK OF VIOLENT TORNADOES

4. QUIET 2013 TROPICAL SEASON

My gut feeling is you have only proved your time series is band-limited both in low and high frequencies.

“Usual Request. If you disagree with what someone says, please have the courtesy to quote the exact words you disagree with. This avoids misunderstandings.”

=================

Please define a “misunderstanding” 🙂

u.k.(us) says:

April 24, 2014 at 5:12 pm

Me thinking you’re talking about one thing, while in fact you’re talking about something totally different.

Or me going postal because someone didn’t quote my words for the fiftieth time and is accusing me of something I never said …

Either one works for me,

w.

Steve from Rockwood says:

April 24, 2014 at 5:12 pm

Thanks, Steve, and you may be right about the cause. However, I wasn’t speculating on or trying to prove the underlying

causesof the phenomenon.Instead, I was commenting on the practical

effectsof the phenomenon, one of which is that we erroneously think we are living in extreme times.w.

Maybe this can be understood in an inductive fashion. Suppose you have points 1 through N and N has, say, the highest value. Now add point N+1. If the series is autocorrelated, this new point has a 50% chance of being the new highest point.

So, compare the “chances” of point N staying the highest if we add another N points. If there’s no autocorrelation, it’s 50%. With autocorrelation it’s obviously lower.

I haven’t figured out a quantitative result (yet) but the result seems intuitive.

More to come (I hope).

Thanks Willis, I am continually amazed by the things that you unearth that the rest of us sail on past. What Lindzen said is obviously true, but this goes well beyond and into unexpected territory.

@Willis. I’m with you on the extreme times bit and always look forward to your thought provoking posts.

Is the effect stronger for shorter series? Eg what about a 160 point long series (to reflect the hottest year on record claims), or 16 point long series (to reflect hottest decade)

Well that was a head scratcher as it seems counter intuitive until I thought it through a bit more. Ignoring the red noise for a moment and just considering HadCrut alone, this makes a lot of sense. Hadcrut is sort of an undulating wave. Cut it into pieces smaller than the entire wave form and you get four possible scenarios:

1, your sample is over all a negative trend, resulting in high extrema at one end and low extrema at the other,

2. your sample is over all a positive trend, and the reverse of 1 applies.

3. your sample spans a peak in the undulating wave, in which case you have low extrema at both ends

4. your sample spans a bottom in the undulating wave, so you have high extrema at both ends

In other words, assuming the data is an undulating wave, it doesn’t much matter how you cut it up into smaller segments, you’re pretty much guaranteed to have extrema at both ends of the segment.

What an interesting analysis Willis. I’d never in a million years have thought this would be the case, but now that you’ve pointed it out, it makes sense!

Robert of Ottawa wrote; (CAPS added by myself)

“Any mathematical issue that depends upon an integral from minus to plus infinity (correlation, Fourier transform, etc.) is NOT ACCURATE WITH A FINITE SERIES. Hence the great interest in Window Functions:”

Exactly correct, this is one of the limits taught early in signal processing. Most signal processing (especially digital, versus analog computing) is an APPROXIMATION to a closed form equation (you know the ones with the integral sign). That is why there a dozens of windowing functions. These artifacts can easily be mistaken for real information, but they very rarely are.

As an interesting historical aside; the old Soviet Union was far behind the “West” in terms of digital computing power. But they had many quite good mathematicians. They solved many integrals with closed form equations (i.e. to get the accurate answer for the integral of function “abc”, plug the limits and the values into this closed form equation). The “West” just hammered it with digital signal processing. I have an old Russian reference text (translated version) from the 1970’s (long out of print) that has closed form equations for hundreds of integrals, 20 per page, 700 or so pages. And the closed form solutions are exact (up to the number of decimal places you use, of course). Finding the closed form solution for an integral is like a puzzle, there is no exact algorithm to follow, you just try hunches, I wonder if the derivative of function “qwy” is the answer?

The whole mathematical basis behind stealth radar technology applied to warplanes was done in closed form equations by the Russians and published in open math journals. The US defense industry found it and used it to create the F-117 and the B-2. The first time they took a scale mock up of the F-117 out to a radar test range they (US scientists) thought their instrumentation was broken; “how come we cannot see that metal object over there with our radar?”. The secret was the shape of the plane, then they applied radar absorbing coating and the plane virtually “disappeared” from the radar screen.

Another example of these artifacts is the use of digital random number generators (like the rand()) function in Excel ™. It does not produce a true random number sequence, it is good enough for most work, but you can see frequency components in the data that are artifacts from the random number generator algorithm. At one time there was a company that had a electronic circuit which digitized the noise across a resistor (designed to maximize the noise) and sold it as a “true random number generator. The digital versions have become better with time (more bits to work with) so I think that device is no longer on the market.

Cheers, Kevin.

Is this related to John Brignall’s “Law of League Tables”?

http://www.numberwatch.co.uk/2004%20february.htm

“All measures used as the basis of a league table always improve.

Corollary 1 All other measures get worse to compensate.

Corollary 2 What you measure is what you get.”

I believe what the NumberWatch master intends to convey is that the top (or bottom) record reported tends to be taken as a standard against which subsequent measures are evaluated. I’d thought he was making a point about psychology but the analysis here makes me wonder if I overlooked something…

The “red noise” or “Brownian motion” assumption is essential to finding a closed form solution. In my example of adding the N+1th point, knowing the value of the Nth point needs to be complete knowledge. (This is sometimes called “memoryless.”) If there are longer autocorrelations (trends, periodicity, etc.) the problem gets harder, and all bets are off on the endpoint effect — it could grow or disappear.

Well I have heard this before, I can’t remember when or where, I think I just thought the idea was a crank. I didn’t gel with it at all.

When you study ‘highly correlated data or red noise’, I’m am fairly certain you will find that it exhibits all the characteristics of ‘highly correlated data or red noise’.

I suggest Willis you use pink noise, not red noise.

This is a tricky and contentious subject. Pink noise is 1/f noise, is very common in natural processes, related to chaos. A lot of opinions go on about red noise, beware.

Unfortunately pink noise is not so simple to produce.

I do not know what would happen if you try.

That was extremely interesting. Thanks

BTW, we just had another extreme solar event. X 1.3 flare (R3). Perhaps it is just me, but there have been many more of those in the last few months than I have viewed over the last several years I have been monitoring such (Perhaps Leif can comment on such). Also, have been watching the USGS pages and the ping pong of quakes across the Pacific. Chile, Nicaragua, Mexico and yesterday British Columbia, in that order. All significant events with similar sized events in between them across the Pacific. Addtionally, a recent anomolous event, from what I can tell, between South Africa and Antarctica.

Just some novice observations.

Regards Ed

Actually, there is a direct relationship between Benford’s Law and convolution, and autocorrelation is just convolution of a sequence with itself. See a really good description of how and why here: http://www.dspguide.com/ch34/1.htm

“”””””……tchannon says:

April 24, 2014 at 6:12 pm

I suggest Willis you use pink noise, not red noise…….””””””

Well 1/f noise is pretty common in analog MOS or CMOS circuits; and it is inherent. PMOS transistors, tend to have lower 1/f noise than NMOS, so analog designers (good ones), tend to use PMOS in input stages, even though NMOS gives higher gm values for a given gate area.

It is common to use very large area PMOS devices, in analog CMOS to reduce the 1/f noise corner frequency.

I designed (and built) an extremely low noise, and low 1/f corner frequency CMOS IC, using truly enormous PMOS transistors. It was a very high current gain feedback amplifier for an extremely sensitive high speed photo-detector.

1/f noise seems to defy logic, since it tends to infinity as f tends to zero. Actually it is not a catastrophe, since you can prove that the noise power is constant in any frequency octave (or decade; so the noise power doesn’t go to infinity, since the lower the frequency, the less often it happens.

I have often claimed, that the “big bang” was nothing more than the bottom end of the 1/f noise spectrum. Get ready for the next one.

As to Willis’ new conundrum; is not a truncated data sequence akin to the transient portion of the startup of an inherently narrow band signal.

An ideal sine wave signal, only has zero bandwidth, if you disallow for turning it on, or switching it off. When you do either of those things, you get a transient, that dies out with time leaving the steady state signal.

So if your signal begins with an off to on step function, which it does in Willis’ chopped chunks, you are going to get the overshoot, of a brick wall filter response.

Is that not what is going on here ??

I suppose this explains how you can have a once-every-100-year storm, and then a second once-every-hundred-year-storm only a few weeks later. I recall this happening with a couple of snowstorms that hit Boston in February of 1978, and that I became rude and sarcastic towards the people who used the phrase “once-every-hundred-year-storm.”

Sorry about that, you people who used that phrase, and who are still alive 36 years later.

However I have to confess this doesn’t make a lick of sense to me. It seems to me that if you snipped a random 100 years from the history of weather, the once-every-100-year-storm might come on any year, and not be more likely to come in year-one or year-hundred.

Likely there is something I don’t understand. However, armed with my incomplete intellectual grasp, I am heading off to Las Vegas, convinced I can beat the odds.

Appologies, I neglected my citations.

http://earthquake.usgs.gov/

http://www.swpc.noaa.gov/ftpmenu/warehouse.html

In the original statement :

“If you’re looking at any given time window on an autocorrelated time series, the extreme values are more likely to be at the beginning and the end of the time window.”

Is the “extreme values ” referring to the “extreme values” of the input time series ? or the “extreme values” of the output ACF function? From your calculation, it would appear you are looking at the input time series, but in that case, there is no need to calculate the ACF … or am I mis-understanding your calculation (or perhaps what you mean by ” autocorrelated time series”) & you are looking at the extreme values of the ACF output with the x axis on figure 2 being the lag times.

Thanks for the clarification.

I think the term “red noise” is throwing folks off here. Willis is talking about pure Brownian motion. That is known as red noise but thinking about this in terms of spectrum is a rabbit trail. Willis is speaking of a series with no periodicity.

“How hilarious is that? We are indeed living in extreme times, and we have the data to prove it!

Of course, this feeds right into the AGW alarmism, particularly because any extreme event counts as evidence of how we are living in parlous, out-of-the-ordinary times, whether hot or cold, wet or dry, flood or drought …”

Well, the way I see it is that we have a recent pattern that is in the 1/40 part of the externe, if I understand your distribution.

So two points:

1. How often should we be in “extremes”. Three times in the past century?

2. If there is variability, why don’t you people accept the possibility that the “pause” is variation…

davidmhoffer says:

April 24, 2014 at 5:56 pm

In other words, assuming the data is an undulating wave, it doesn’t much matter how you cut it up into smaller segments, you’re pretty much guaranteed to have extrema at both ends of the segment.Thanks, David … I think that was very well worded and well explained.

oops … sorry, above was me … WordPress has taken over here…

If you’re looking at any given time window on an autocorrelated time series, the extreme values are more likely to be at the beginning and the end of the time window.

=====================================

??

It appears to me that the “beginning” and “end” need to be defined. The last 5 years of the T record are less extreme then the preceding five years and they are not random. If the end and beginning are defined as the first third and the last third, then you are covering 2/3rds of the series, and so more likely to have extremes within those segments.

(Likely I do not follow this at all.)

David M…”In other words, assuming the data is an undulating wave, it doesn’t much matter how you cut it up into smaller segments, you’re pretty much guaranteed to have extrema at both ends of the segment.”

====================================

What if you stard and end the undulation on the mean?

What if you start and end the undulation on the mean?

Even if we assume Brownian motion, a general closed form solution cannot be produced. It depends on the shape of the distribution of the difference between neighboring points. Because of the central limit theorem the effect isn’t huge, but there’s still an effect. I’ll try to work this out for a normal distribution but the integrals as a pain in the … (and I don’t have time).

What I like is that you can scale the autocorrelation (change the sigma of f(N)-f(N-1)) and it doesn’t make any difference.

David A;

What if you start and end the undulation on the mean?

>>>>>>>>>>>>>>>>.

Hmmmm. Well in my thinking out loud thoughts above, I presumed that each segment was smaller than the over all cycle. So if I understand your question, you’d be using a segment size that equals the entire cycle rather than a segment size that is only part of the cycle. But you could only do that if the over all trend across an entire cycle was zero, and you’d be able to manipulate what extrema showed up at what end simply by choosing where the start point was (ie peak to peak or valley to valley) Or, assuming that there is an underlying trend that is positive (and in Hadcrut there is a positive trend) by choosing a segment that starts and ends on the mean would be tough to do. You’d essentially have to choose an artificial start and end point over a part of the data that where those end points are at the mean value, which would be less than a complete cycle. Could you find such a segment? Probably. But it would be vary rare in comparison to other segments of the same length with random start and end points.

gary bucher says:

April 24, 2014 at 4:15 pm

So…any idea why this happens? It seems counter intuitive to say the least

================================================

Let me fathom a guess..

I would dare say that this is a manufactured illusion because TODAY is always considered an extreme time as it is ALWAYS at the end of the last data set. SO in its easiest form, this is a self fulfilling prophecy for the CAGW fear mongers. No matter when the cut is made in the data, the first and last will always be extreme.

Talk about creating your own perception of reality… (and all by accident for most)

Well Done WIllis!

First, you haven’t shown the data set is stationary – it’s simple an assumption or wild eyed guess.

Second, it’s called a temperature anomaly because it’s neither the temperature nor the mean deviation – the mean was pulled from where the Sun doesn’t shine so it has a linear trend. In any case, for partial correlations you need to demean the data and throw out the end points.

Third, the auto-correlation function is an even function, i.e.,

int[ f(u)*g(u-x)*du] = int[ f(u)*g(u+x)*du]

and auto-correlation function should have a maximum at zero lag which should be in the center of the plot (not on the left hand side.)

Try using R to do the calculation.

MarkY says: “When I grow up, I wanna be a statistician. Then I won’t have to tell my Mom I’m a piano player in a whorehouse (kudos to HSTruman).”

I believe H. Allen Smith said it first. He told his newspaper friends not to let his parents, in town for the weekend, know he was a journalist, that he’d told them the above.

Jeff L says:

April 24, 2014 at 6:50 pm

It is the extreme values (max & min) of the time series data points within the time window.

I’m not calculating the ACF of anything but the HadCRUT3 data. (Actually, I calculate the AR and MA coefficients of an ARIMA model of the HadCRUT3 data.) Then I used those coefficients to generate the temperature pseudo-data, so that it would resemble the HadCRUT3 data (see Fig. 1).

Welcome,

w.

Another very interesting one, Willis.

Is it possible to tie this in with Dr Roy Spencer’s climate null hypothesis? Current parameters such as extreme weather events, global temperatures, etc., have all been exceeded in the past — and not just the deep geologic past, but within the current Holocene.

Anyway, a very interesting hypothesis. Thanks for sharing.

++++++++++++++++++++++++++

trafamadore says:

why don’t you people accept the possibility that the “pause” is variation…Maybe that is because to be properly labeled a “pause”, global warming would have had to resume. It may resume, or it may not. We don’t know.

But unless global warming resumes, the proper way to express the past 17+ years is to say that global warming has

stopped.Sorry about all those failed runaway global warming predictions.Oops, I meant you need to remove the linear trend.

There is another reason for ” it was the n hottest of the instrumental record”. The instrumental record is an S form with the hottest years at the top. Any year in the last 17 is guaranteed to be one of the top 17.

Humans have a natural tendency to “autocorrelate”. It is a perennial search for portents.

Here is my logical explanation…

An auto correlated time series is similar to a continous function in mathematics, since neighbouring points are more likely to be near each other.

For a continuous function, all global maxima occur either at local maxima or at the endpoints.

All local maxima occur at critical points (places where a function is either non-differentiable or the derivative is zero).

If you consider the space of all continuous functions, all points in the domain are equally likely to be critical points..

So that means that endpoints are more likely to be global maxima. They are equally likely as all other points to be critical points, and in addition, there are classes of continous functions where they are the maxima even when they are not critical points.

Michael D says:

April 24, 2014 at 4:43 pm

“I disagree with the suggestion that this is related to Benford’s law.”

Willis Eschenbach says:

April 24, 2014 at 4:57 pm

“Thanks, Gary. I don’t think it’s related to Benford’s distribution, it’s another oddity entirely.”

This from Wolfram: http://mathworld.wolfram.com/BenfordsLaw.html

One striking example of Benford’s law is given by the 54 million real constants in Plouffe’s “Inverse Symbolic Calculator” database, 30% of which begin with the digit 1. Taking data from several disparate sources, the table below shows the distribution of first digits as compiled by Benford (1938) in his original paper.

Scrolling down to the large table we find a broad range of electic data that fits, including populations of countries, areas of rivers, engineering/physics data such as specific heats of materials, etc. etc. I believe your extreme “high”s are the “1s” and the “lows” are the “9s” of the Benford distribution.

A similar idea is to look at the frequency of records – floods, temperatures, rainfall, snow… and the like as a random distribution of a set of numbers. In an N=200 (years) for example, counting the first year’s data point as a record, their will be approximately Ln N records in the 200 yr stretch. Even though the distribution of such data is not in fact perfectly random, it is surprising that you get something close to the actual number of records for the data set (I’ve done this for Red River of the North floods). Maybe Briggs or McIntyre might weigh in on the topic.

tchannon says:

April 24, 2014 at 6:12 pm

Neither did I. However, it’s easy to produce in R. I used the function TK95 from the package RobPer. Turns out it’s the same story. Here’s pink noise:

w.

Cinaed Simson says:

April 24, 2014 at 8:06 pm

Dear heavens, my friend, such unwarranted certainty. Of course I measured the mean, the trend, and the heteroskedasticity of the random data. As expected, the random data generator generates stationary data, no surprise there. And I was going to assume that, when I thought no, someone might ask me, and I’ve never checked it … so I did. Stationary.

However, instead of asking, you’ve made an unpleasant accusation that I’m either assuming it is stationary (wrong), or just guessing (also wrong).

Cinaed Simson. I’ll remember your name. Next time, rather than assuming bad faith, foolishness, or bad motives on my part, just ASK!

I’ll give you an example. You say “Willis, how do you know that your pseudo-data is stationary?”

See how easy it is?

Lay off the accusations. Not appreciated, not polite.

w.

Cinaed Simson says:

April 24, 2014 at 8:06 pm

For a person who doesn’t understand what I said, you certainly are unpleasant. Re-read my explanation. I don’t show the autocorrelation in Figure 2. Read the explanation of Figure 2 again, and the caption.

The whole post was prepared in R. What are you talking about?

w.

Willis – Good thinking, nice work! Following on from your post, I thought I would investigate the notion that nine of the last 10 years being the warmest “ever” was unprecedented. Answer : NO. It also happened back in 1945 and 1946. [I used Hadcrut4 from http://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/time_series/HadCRUT.4.2.0.0.annual_ns_avg.txt%5D

The whole mathematical basis behind stealth radar technology applied to warplanes was done in closed form equations by the Russians and published in open math journals. The US defense industry found it and used it to create the F-117 and the B-2. The first time they took a scale mock up of the F-117 out to a radar test range they (US scientists) thought their instrumentation was broken; “how come we cannot see that metal object over there with our radar?”. The secret was the shape of the plane, then they applied radar absorbing coating and the plane virtually “disappeared” from the radar screen.

#####################

Wrong. wrong wrong.

First the F-117 and B2 used entirely different codes for RCS prediction. The F117 was limited to flat plates because the radiative transfer codes where limited to flat objects.. The used a program called echo1 yes taked from an obscure soviet math paper. northrop at the time did not have access to echo1.

The b2 was designed using an entirely different set of code far superior to echo1. It could handle curved surfaces ( very specific surfaces ). The algorithms were not from a soviet paper. The chinese gentlemen who wrote them was in my group

Autocorrelations are related to power spectra by the Fourier transform. And power spectra are the square of the magnitude spectrum (typically a Fourier transform of the times signal). What happens, even with just white noise? The power spectrum is flat? Not with the FFT!

Of course, no single white noise is flat. But if one takes the AVERAGE magnitude spectrum of a large set of white noise signals (millions say), it is supposed to trend more and more flat – a “schoolboy” exercise. If we take the magnitude spectrum as the magnitude of the FFT (the fast Discrete Fourier transform), it gets remarkably flat, save at one or two frequencies, where it is down by a factor of about 90%. One of the frequencies is 0. The other is fs/2 (half the sampling frequency) if we have an even number (N) of samples. The exact ratio seems to be 2^(3/2)/pi = 0.9003163. Astounding. IT SHOULD BE FLAT!

Well I had seen this for years and never found, or worked hard on an explanation until two years

ago:

http://electronotes.netfirms.com/EN208.pdf

(For example, Fig. 2 there.)

In essence, (and I think I got it right) it is because the DFT X(k) of a time sequence x(n) is of course by definition:

X(k) = SUM { x(n) e^(-j*2*pi*n*k)/N) }

This is a REAL random walk if k=0 (the exponential becomes just 1), and if k=N/2 and N is even, and has the “drunkard’s walk” normal distribution. For all other values of k (most values) we have a sum of vectors with COMPLEX random magnitudes (two dimensional random walk), and that’s a Rayleigh distribution (hence the different mean when we average FFTs at each k).

Einstein I believe thought that Nature was subtle, but not malicious.

Walpurgis says: “Interesting. Also, how likely would it be that you would want to go into a career in climatology if you believed the climate isn’t changing much and won’t until a few thousand years after you retire. What would you write your thesis on? What would you do every day?”

Well you could do paleoclimatology. That’s a lot of fun. Or you could do prehistorical-historical climatology (Discontinuity in Greek civilization Paperback, Rhys Carpenter; also R.A. Bryson). Interesting summary here: http://www.varchive.org/dag/gapp.htm

Or you could do cosmoclimatology. Hendrik Svensmark, Nir Shaviv, Jan Veizer, Eugene Parker and Richard Turco manage to keep busy. http://www.thecloudmystery.com/The_Cloud_Mystery/The_Science.html

Or you could do anthropological climatology. Elizabeth Vbra found that interesting enough to edit Paleoclimate and Evolution, with Emphasis on Human Origins. http://yalepress.yale.edu/yupbooks/book.asp?isbn=9780300063486

During the last 20 years climatologists, geophysicists and other scientists have revealed a few pages of the book of Earth’s climate system.

Still, our ignorance is greater than our knowledge and will continue to languish until scientists free themselves from the view that the science is settled..

Good find Willis. This looks to be of fundamental importance. However, trying to explain this to some Joe down at the bar who is freaked out “weird climate” is going to take some work.

“Of course, this applies to all kinds of datasets. For example, currently we are at a low extreme in hurricanes … but is that low number actually anomalous when the math says that we live in extreme times, so extremes shouldn’t be a surprise?”

I don’t see how that can apply. Your graph shows large magnitudes at the ends, not unusually small values.

This is all about ‘random walks’ and averaging.

The data is based on continual summing of a random ( gaussian distributed ) series. At the beginning the data is very short and that ‘random’ distribution has not been sufficiently sampled for the subset to accurately represent the gaussian distribution. Thus the chance of having a run of numbers in one direction or the other is greater.

A similar argument applies at the end. Since the middle of a reasonable long window has been well enough sampled to average out , the nearer you get the to end the stronger the chance is of a temporary run off to one side.ie the last few points are not a sufficient sample and can provide a non average deviation.

I’m wondering what the profile of your graph is. My guess is 1/gaussian

Reminds one of looking up a street directory and pondering why the street you want is always overlapping the damn pages. Now that’s auto correlation for you producing extreme temperatures particularly in peak hour with the missus navigating.

Suggest plotting random-recursive “auto-correlated” Markov Chains, wherein chance-and-necessity determine growth-and-change. For the record, global hedge funds have long adapted quant-model algorithms to Markov-generated series as proxies for trading volume.

As Benoit Mandelbrot noted in studying 19th Century New Orleans cotton futures, such “fractal” (fractional-geometric) patterns, self-similar on every scale, are totally deterministic yet absolutely unpredictable in detail. The same is true, of course, of Edward Lorenz’s celebrated Chaos Theory, whose “Strange Attractors” obey related protocols.

Like many features of population genetics, linguistic’s Zipf’s Law, and so forth, “statistics” is not the end but the beginning of a meta-analytical approach which puts correlation, distribution, and Standard Error (probability) in context of a far deeper mathematical reality. Among other exercises Conway’s “cellular automata”, Group and Information Theory, high-level cryptographic systems, all dance around Emergent Order as a hyper-geometric reality over-and-above pro forma statistical emendations.

I think I may have found a mathematical explanation for this.

For a Wiener process (a random walk comprising infinitesimally small random steps), the “Arcsine laws” apply: http://en.wikipedia.org/wiki/Arcsine_laws_(Wiener_process)

Per that page, the arcsine law says that the distribution function of the maximum on an interval, say [0,1], is 2 / pi * arcsin(sqrt(x)).

Differentiating that expression yields the probability density 1/(pi*sqrt(x)*sqrt(1-x))

This yields a plot that looks quite like your histograms!

https://www.wolframalpha.com/input/?i=plot+1%2F%28pi*sqrt%28x%29*sqrt%281-x%29%29

davidmhoffer says:

April 24, 2014 at 8:01 pm

========================================

Thanks, and I am somewhat following. However are not all series defined by an arbitrary start and end point? For instance take the Hardcrut black series, from 1850 to 1840. From an eyeball perspective the extremes, low and high are in the middle.

Yet I cannot debate the second graph of 1000 pseudo runs showing such extremes lumped at both ends. It would seam that in a truly random series the extremes would be as likely to appear anywhere, except for my earlier comment, that the middle third would only be 1/2 as likely to have a minimum or maximum as both the first and last third of the series combined, as it clearly is only one third of the series vs the two thirds composing both ends.

Sorry, from 1850 to 1890.

(From Wikipedia on Benford’s Law.)

In 1972, Hal Varian suggested that the law could be used to detect possible fraud in lists of socio-economic data submitted in support of public planning decisions. Based on the plausible assumption that people who make up figures tend to distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the expected distribution according to Benford’s Law ought to show up any anomalous results.

Has anyone subjected Warmist Climate Data to the Benford Law test?

Willis Eschenbach says:

April 24, 2014 at 8:50 pm

Cinaed Simson says:

April 24, 2014 at 8:06 pm

First, you haven’t shown the data set is stationary – it’s simple an assumption or wild eyed guess.

Dear heavens, my friend, such unwarranted certainty. Of course I measured the mean, the trend, and the heteroskedasticity of the random data. As expected, the random data generator generates stationary data, no surprise there. And I was going to assume that, when I thought no, someone might ask me, and I’ve never checked it … so I did. Stationary.

——

Just glancing at the data, it looks like a random walk with a drift which is known to be non-stationary.

Also, I missed the part where you indicated you were using R to do the auto-correlation calculations and the code used to generate the figures.

The arcsine law is pretty easy to use. For example, the chance of a maximum (or, equivalently, minimum) being in the first 1/3rd of the interval is

2 / pi * arcsin(sqrt(1/3)) = 39.2%

and it’s the same with the last 1/3rd of the interval, due to symmetry. The “middle” third only has a 21.6% chance (the remaining amount).

While the general pattern derived from the statistics of red noise shows more extremes in the end bins, this is a generalisation. Can I surmise that the actual case, rather than a general or synthesised case, should be adopted for making statements about recent climate extremes?

Bernie says: electronotes.netfirms.com/EN208.pdf

Excellent study. Very interesting. The few, knowledgeable commenters like you are what makes this site a gold mine.

I do not follow the logic in a true random series. When throwing a fair die, each of the six values 1 to 6 has the probability 1/6. Assume that each throw generates a different number for six throws. Is the one and the 6 any more likely to be the first or last throw?

Seattle says

https://www.wolframalpha.com/input/?i=plot+1%2F%28pi*sqrt%28x%29*sqrt%281-x%29%29

yes, I think we have winner ! Good work.

In 1st year physics we learned about the “drunken walk” and soon understood that after n random steps on a one dimensional line, one ended up a distance proportional to sqrt(n) from the starting point. Intuitively most people guess you would have traveled a distance of zero (on average), which is wrong. Is this not the same as saying the extremes are more than likely at the beginning and the end of the series.

Cinead Simon: “Third, the auto-correlation function is an even function, i.e….,”

someone confusing autocorrelated series and autocorrelation function , or just not reading before sounding off.

Geoff Sherrington says:

April 24, 2014 at 10:57 pm

While the general pattern derived from the statistics of red noise shows more extremes in the end bins, this is a generalisation. Can I surmise that the actual case, rather than a general or synthesised case, should be adopted for making statements about recent climate extremes?

====

The point is that in making statements about recent changes being “weird” , “unprecedented” or unusual, we should be making probability assessments against Seattle’s graph , not the layman’s incorrect assumption of a flat probability.

The point is to compare the actual case to general synthetic case.

“When throwing a fair die, each of the six values 1 to 6 has the probability 1/6. Assume that each throw generates a different number for six throws. Is the one and the 6 any more likely to be the first or last throw?”

David A, you are right, if the a time series works like that, then the maximum or minimum could occur anywhere within a given interval with equal probability. That kind of time series would be “white noise”.

But, for an autocorrelated “red noise” distribution, where each value is close to adjacent values, the arcsine laws apply as I mentioned above.

But which kind of power spectrum does the climate have?

To be red noise, it would have to be 20db (100x more power) for each decrease of one decade (log scale) in frequency (i.e. 0.1x frequency). On a log-log graph of power spectrum, the slope would be -2.

If we trust this “artist’s rendering of climate variability on all time

scales” – http://www.atmos.ucla.edu/tcd/PREPRINTS/MGEGEC.pdf – it looks relatively flat like white noise.

This graph looks quite a bit more “reddish” – https://www.ipcc.ch/ipccreports/tar/wg1/446.htm

Which one is to be trusted?

Methinks that Richard Hamming and Julius von Hann might have something to say about this.

Most of the problems that cause the ends to look the worst are related to the ends of the window not syncing with the data set.

My Grand-children may never know what hurricans are….

So whenever I look for power spectral density graphs of temperature from different sources, I see similarities to red or pink noise, but basically never blue noise. So I think it’s quite plausible that the effect pointed out in this article is applicable to temperature time series.

I’m not sure about other kinds of time series.

A passing linguistic thought: rather than calling it the “Extreme Times Effect”, I’d suggest that it be called the “End Times Effect”. It just seems a shade more appropriate, considering the ever-popular use of the phenomenon to “prove” that we are now in the End Times and are therefore All Doomed!

Seattle: “But which kind of power spectrum does the climate have?”

This assumption that red noise (which is the integral of white noise) is the base-line reference against which all climate phenomena should be measured is erroneous. It is often used to suggest that long period period peaks in spectra are not “statistically significant”.

Since many climate phenomena are constrained by negative feedbacks (like the Plank feedback for temperature) the red noise assumption is not valid for long term deviations which tend to be less than what would be expected under a red noise model. It may work quite well for periods up to a year or two.

http://climategrog.wordpress.com/?attachment_id=897

The end effect will still be there but may be less pronounced than the simplistic red noise model.

Draw a sine curve with long period. Give it a uniformly random phase. That’s a stationary, highly autocorrelated time series. Observe it though a short, randomly located window. What do you see? Mostly, a line going up, or a line going down.

Now, if every summit means the world is a baking Sahara desert, every valley is a major ice-age, I would say that seeing things go up and up and up might be just cause for concern.

Draw a sine curve with long period. Give it a completely random phase. Now that’s a beautiful stationary and highly autocorrelated time series for you.

Observe it through a short independently chosen window. What do you see? A line going up, or a line going down.

If the peaks mean a global baking Saharan desert and the valleys are massive ice-ages, then seeing the series go up and up and up could be cause for concern if you don’t like the heat.

“When throwing a fair die, each of the six values 1 to 6 has the probability 1/6……”

If you want to use the die as your random number generator you need to build your time series from the cumulative sum of all the throws. Then if you plot the frequency distribution of all the values in that time series, you should (on average) see the something like what Willis provided using R.

Go and log a million dice throws , add them to get the time series and do the distribution plot and let us know if you find something different ;).

https://www.ipcc.ch/ipccreports/tar/wg1/446.htm

Bearing in mind this is log-log it should be perfectly straight for red noise.

It looks very straight from 2 to 10 years. Then breaks to a very different slope for 10-100 and is basically flat beyond 100y.

Bear in mind that much of this is quite simply artificially injected ‘red’ or other noise that it pumped into the models to make the output look a bit more ‘climate-like’. It is a total artifice, not a result of the physical model itself.

This is done to disguise the fact the models are doing little more than adding a few wiggles to the exaggerated CO2 warming they are programmed to produce.

They then do hundreds of runs of dozens of models , take the average (which removes most of the injected noise by averaging it out ) and say LOOK, WE TOLD YOU SO! Our super computer models that cost billions to produce and a generation of dedicated scientist working round the clock PROVE to that within 95% confidence it’s all caused by CO2.

We must act NOW before it’s too late ( when everyone realises it’s a scam).

Michael D says:

I suspect that the explanation might be as simple as follows: b) When you cut a chunk from a long-time-period Fourier component, there is a good chance that you will cut a chunk that is either increasing or decreasing throughout the chunk. … Sorry – not as simple to explain as I had hoped. A drawing would be easier.Not a bad explanation Michael.

But perhaps the simplest explanation is that because it is a random walk – it is extremely likely that the last point of the random walk will not be at the start – and that the longer the random walk, the more likely it is to be further away from the start. So the points most likely to be furthest apart are those at the beginning and those at the end of the random walk. (And the average will be in the middle)

Or … to use a simple analogy … if two people are lost in a desert, and they just set off to look for each other with no idea where they are going (i.e. random), there is far more chance of them ending further away from each other than moving closer.

So, in most real world situations (not science-labs where students only seem to be taught about a very unusual type of “white noise”), random fluctuations tend to make things get further apart.

It is true of Pooh sticks (sticks thrown in a river) which tend to diverge. Gases tend to diffuse, rivers tend to change their course over geologic tims, evolution tends to make plant and animal species change over time. So, e.g. the chance of evolution spontaneously bringing about a diplodocus is vanishingly small.

So, it is the norm, for the beginning and end of a plot of a natural system to tend to diverge and it is abnormal for them to be stay the same.

The bigger question is not why does the climate vary – because all(?) natural systems vary, but why has the earth’s climate been so remarkably stable that we are here.And perhaps just as important, is why are science students not taught about real world noise systems and is this why climate scientists incapable of understanding real world noise?

PS. I learnt about real world noise, not within the physics degree but from my electronics degree.

Willis, excellent thinking and very clear explanation. However, I am curious. Does this phenomenon hold true regardless of the size of the second sample? You chose 2000. What if you chose 1000? 1717? 3000? Does it hold true if the second datasets aren’t sequential, but rather overlap? or are chosen randomly? This is going to bug me now until someone comes up with the proof, and explains it to me.

Lloyd Martin Hendaye says:

April 24, 2014 at 10:25 pm

I doubt greatly whether that would make a difference, so I think I’ll leave that as an exercise. Too many more interesting things.

Best of luck, and thanks,

w.

Seattle says:

April 24, 2014 at 10:26 pm

Dang, Seattle, that is most impressive. It sure looks like you actually calculated a distribution function for the location of the extreme values. Sweet.

That’s valuable because that can give us exact expected values for numbers of extremes …

Onwards,

w.

Seattle says:

April 24, 2014 at 10:42 pm

Excellent, thanks for that.

w.

David A says:

April 24, 2014 at 11:01 pm

Good question, David. The answer is no. Remember that this is only true for autocorrelated series, not independent random series.

w.

Steve C says:

April 24, 2014 at 11:58 pm

Excellent. It has more weight, with all these folks claiming that’s where we are. With your permission I’ll change the head post. Thanks.

w.

Greg says:

April 25, 2014 at 12:13 am

Thanks, Greg. Actually, if you look above you’ll see that I used an ARIMA model. In this case, I used two levels of AR and MA coefficients (lag-1 and lag-2). Because the AR and MA coefficients are calculated from the HadCRUT dataset, this provides a good match to the statistical properties of that particular dataset.

However, if we were talking about doing a monte carlo analysis of e.g. river flows, we’d use different ARIMA coefficients, calculated from actual river data.

w.

W: Good question, David. The answer is no. Remember that this is only true for autocorrelated series, not independent random series.

So all he has to do is add the dice throws , as I replied above.

DonV says:

April 25, 2014 at 1:36 am

In general yes, it works no matter how long a window you choose.

To see why it’s true, consider a very simple system, where the window is only three data points long.

Let’s use a random walk, with a step of either plus one or minus one, plus some small error. We have only four possible walks:

0, 1, 2

0, 1, 0

0, -1, 0

0, -1, -2

In the first and last cases, we have extremes at the two ends.

In the two middle cases, we have one extreme at one end and one in the middle.

Total from the four equally probable outputs shows that six of the eight extremes end up at the endpoints, while only two of the eight extremes end up in the middle …

Seattle has pointed out the actual distribution math above. Even this very simplified example follows the math. From his math we’d expect 3.2 out of eight at each end (40%), and 1.6 out of eight in the middle (20%). Instead of 3.2 and 1.6, the results are 3 and 2, which are the closest whole numbers for the actual situation.

w.

“Thanks, Greg. Actually, if you look above you’ll see that I used an ARIMA model.”

Yes I saw that , though you did not specify what sort of ARIMA. My point was that Seatlle has hit on the right formula for straight red noise but this will be a bit different for a more complex model like you used.

It will certainly be huge step in the right direction compared to erroneous assumption of a flat distribution and I think viewed in that context late 20th c. will be within expected bounds. This is what Keenan was banging on about and eventually got an official admission into the parliamentary record via a House of Lords statement.

In fact I think his main point was that the usual red noise model was not the best choice.

Of course none of these random statistical models allow for the constraining effects of the Plank response but its a damn good step in the right direction.

“I used two levels of AR and MA coefficients (lag-1 and lag-2). ”

Just out of interest , could you post details of the actual model that you used?

Prompted from Seattle’s example and slightly off topic but maybe interesting is;

https://www.wolframalpha.com/input/?i=wattsupwiththat.com+vs+www.realclimate.org+vs+www.skepticalscience.com

I keep forgetting how much fun wolframAlpha is 🙂

And how much I learn from Anthony and Willis!

Willis, what you have found empirically seems counter-intuitive, therefore something interesting to try to understand.

If a random signal is stationary, my intuition leads me to expect (rightly or wrongly) that the middle of a segment chopped from the signal should have the same characteristics as the ends of the same segment. If I have understood what you said, you have found otherwise.

You said “this two million point dataset is stationary, meaning that it has no trend over time, and that the standard deviation is stable over time.”

(1) My recollection from playing with time series years ago, is that the definition of ‘stationary’ is that *all* statistics of the time series are independent of time. This is a bit different from your definition of no trend and std dev is time independent. For example, I could put white noise through a filter that removed a range of frequencies and then sweep the center frequency of the filter. This would result in a nonstationary time series by thge definition I quoted (but stationary by the definition you give).

(2) I understand that you generate red noise by putting white noise through a linear filter with transfer function 1/s – ie a pole at the origin of the complex plane. Can the output of the filter be stationary (in the definition I give above)? [I don’t know – I’m asking.]

“… in fact twice as likely, to be at the start and the end of your window rather than anywhere in the middle.”

____________________________

I have to disagree with this statement. I did a simple graphical analysis of your graph:

http://i.imgur.com/X0ht7Ch.png

and my conclusion is that chance that extremes will be within 15% of the length of the interval from either edge (covering 30% of the interval length; 6 out of 20 columns) is approximately 44%. It is definitely more than 30% which would be the case if the distribution was flat. But it is about half more probable over uniform distribution, definitely not twice.

I was interested to see what would happen if you used perfectly trendless, noiseless data. I think some folk above have expressed it in words but I wanted to test it empirically. The IDL code for this is:

pro extrema

compile_opt idl2

nPoints = 2000000

x = findgen(nPoints)

period = 3000.0

data = sin(2*!PI*x/period)

sampleWidth = 2000

extremes = lonarr(sampleWidth)

for i=0, nPoints-sampleWidth-1 do begin

sample = data[i:i+sampleWidth-1]

sampleMax = max(sample, maxIdx, subscript_min=minIdx)

extremes[maxIdx] += 1

extremes[minIdx] += 1

endfor

plot, extremes, psym=1

end

IDL because I happen to have a license for it at work and I don’t know R but I think it should be easy to translate.

For any sampleWidth less than the period, the most likely place for extremes is the beginning and end with a flat distribution between those points. The height of the flat part relative to the ends grows as the sampleWidth grows. Any sampleWidth over the period produces odd, stepped distributions that always start high. This is an artefact of the max() function returning the index of the first maximum/minimum values where more than one identical maximum/minimum value is found. I’m sure this could be proved mathematically by someone smarter than me.

However, it underlines Willis’ point about supposed climate extrema and living in extreme times. Even for perfectly cyclic, trendless, noiseless data if your length of your sample is shorter than the cycle period you will find extremes at the ends of the sample is the most probable outcome.

I haven’t looked at what happens if you add shorter cycles but it would be trivial to add higher harmonics to the base sine wave.

That depends on the width of the window. Wiki gives the following definition for autocorrelation:

If the window is narrow compared to the period of the underlying signal then the window will usually not contain the signal’s peak values. In other words, the underlying signal will either have a positive or negative slope for the whole window.

On the other hand, if the window is wide enough we can easily find examples where the extreme values do not come at the beginning and end of the window.

Example – The window contains exactly one cycle of the underlying periodic signal. In that case the signal’s waveform at each end of the window will have the same value and, depending on the phase, the extreme values will be somewhere within the window.

Is Willis’s observation an example of:-

As soon as scientist can measure something new; then:-

a) It is always bad!

b) it is always getting worse!

c) It is always caused by humanity!!

and

d) It could be fixed by throwing more money in the scientist’s direction!!!

I think it is partly inherent in the subject. A subject where low-frequency/long-duration of variations is longer than a humans attention span, career or life span, probably will not invest the same effort in understanding noise as an electronic engineer.

Similarly a I suspect a chemist is probably more accepting/alert to the possibility of being wrong than a climate modeler by virtue of experiments being much quicker. They get more experience of being wrong.

Physicians are also accustomed to having patients die on them.

The real world teaches at different speeds.

If one takes random time-series (X, Y) data and breaks it into two parts – first half and second half – the mean Y values for the two halves will generally NOT be identical. This will result in a non-zero slope (statistically and physically insignificant, but nonetheless present and a real property of the data set).

The extrema of high and low Y data will be more probably distributed appropriately between the high and low halves. We may repeat this argument for the data series broken into fourths (quartiles), eighths (octiles), and so on – ad infinitum.

The logic of this recursion accounts for the observation that the extreme high Y values will be found nearer the opposite end of the data from the end nearer the extreme low Y values.

Economist Eugen Slutsky showed that random processes can result in cyclic process, such that Fourier analysis would find the cycle, but it is just random noise. The article’s title is The Summation of Random Causes as the Source of Cyclic Processes, translated to english around 1936. Here is a brief article discussing the impact on economics: http://www.minneapolisfed.org/publications_papers/pub_display.cfm?id=4348&

Perhaps this has something to say about all the oscillations connected to weather and all the speculation about future cycles in the weather.

davidmhoffer says:

April 24, 2014 at 5:56 pm

=========

This appears to be the correct solution. Cut a waveform into small enough segments and you are likely to have a min or max (extremes) at the ends.

charles nelson says:

April 24, 2014 at 10:34 pm

“…..Has anyone subjected Warmist Climate Data to the Benford Law test?…..”

That thought has occurred to me.

However, I doubt if any climate scientists simply sat down and invented the data. It’s not necessary. They can use cherry-picking and ignore any inconvenient data, or they can ‘adjust’ the data. I imagine that, while Benford can detect purely made-up numbers, it may not detect data that has been systematically adjusted (by systematic, I mean the same adjustment was applied to all the data).

I’ve thought about it, and I think that Benford probably doesn’y apply to Willis’ fascinating findings.

One (rather boring) explanation of Willis’s findings did occur to me, other posters may have arrived at a similar explanation:

If you cut out part of a long data series, the selected section will almost certainly have an overall positive or negative trend, even if it’s random (e.g. the drunkard’s walk). So, if the trend is positive, the early numbers will tend to be lower and the later numbers will tend to be higher, and vice versa.

I’m not sure if this real effect is needed to explain some of the recent claims about ‘records’. Although there has been no global warming in this century we’re still very near the top. Therefore it’s very easy for short-term temperature excursions (which are often large) to set new records. Of course, it’s a complete scam: they know that most people, when they hear about new records being set, will assume the climate is still warming, when of course it isn’t.

Records should have no place in science. The only thing that matters is the trend.

By the way, for anyone not familiar with Benford’s Law, I suggest you Google it – now.

Chris

tadchem says:

April 25, 2014 at 6:05 am

===========

Also the correct solution. As you slice the segments smaller you increase the odds of the extremes being at the ends.

@Willis – You really didn’t need to ask. Of course you have full permission, and thanks for the flowers!

Willis, the implications are indeed important. Extreme weather can be manufactured statistically simply by segmenting the time series. Intuitively it may not require auto-correlated data if the segments are small enough. However, auto-correlation should allow the effect to occur with larger segments as compared to true random data.

This is a surprising result because it goes against our common sense ideas of randomness. It does seem worthy of a larger, more formal paper as it does have wide implications for those wishing to draw conclusions from statistics.

Itocalc says:

April 25, 2014 at 6:20 am

http://www.minneapolisfed.org/publications_papers/pub_display.cfm?id=4348&

==============

a very interesting paper:

Slutsky had shown in dramatic fashion that stochastic processes could create patterns virtually identical to the putative effects of weather patterns, self-perpetuating boom-bust phases and other factors on the economy.

From a comment above…”Even for perfectly cyclic, trendless, noiseless data if your length of your sample is shorter than the cycle period you will find extremes at the ends of the sample is the most probable outcome.”

Thanks all for helping a layman begin to follow. Of course the natural earth cycle variance and period is quite the mystery, seeing as our climate is a function of many different cycle periods combining in ever changing variances,caused by many may different inputs combing in unique ways.

I can see why CO2 gets lost in the noise.

Wow. Just when I think my troglodytic brain is beginning to get a handle on things, Willis comes along and says the inside of a table tennis ball is exciting. And then he explains it, and he’s right. Reading this blog continually humbles me, usually when I’m preening over what I had previously supposed was a clever thought of my own.

Once again Willis, thank you for opening up a new vista.

Makes sense. Autocorrelation with no trend is a random walk and random walks have an increasing standard deviation the longer the walk, so the last data points have a statistical tendency to be the most extreme vis a vis the beginning and vis a vis the rest of the sample. Equivalently, the beginning has a tendency to be the most extreme vis a vis the rest of the sample, and this would also hold for windows within the sample. SOUNDS right anyway.

To me this makes perfect sense considering the definition of autocorrelated data.

Consider: an event, n, where the measure T=f(n) is autocorrelated. By the definition we know that ΔT for f(n – (n+1)) is small therefore the probability of T0 = T1 is high. The same probability holds for each delta step, n+1 – n+2; n+2 – n+3; etc. However the probabilities drop with each successive event from the original event, n, such that the probability of T0=T10 is much lower or ΔT for f(n – (n+10)) is much greater. The function is symmetric about n so that the probability for T1 = T-1, T2 = T-2, etc.

So what we end up with an inverse probability distribution centered about event n (the middle of the graph). All that is required is that the data be autocorrelated.

So can we apply this statistical result in the form of a correction to various weather related records being touted by alarmists as proof of climate caused weather “extremes”?

In any case, I propose that we call this the “End Times Effect”Okay, but it comes under the sub-category of “End of CAGW Effects”.http://bishophill.squarespace.com/blog/2013/5/27/met-office-admits-claims-of-significant-temperature-rise-unt.html

Excellent stuff Willis. This pins down a sort of a priori feeling about many claims surrounding CAGW, and why I am comfortable ignoring them without having a decent argument as to why they are irrelevant.

Willis Eschenbach says:

April 24, 2014 at 8:42 pm

(checked out pink noise)

Thanks for looking, not that then.

Result you have is nonsensical, why so being the question. I think several comments give clues and is about the validity of the test. I agree with those who point out the known “non-stationary” will produce a convolution kind of result aka fourier transform hence the large items both ends.

Adding noise to something leaves something plus noise.

To me this suggests the result is dominated by the large slow excursions.

Recently I had something similar giving a strange statistics answer. Eventually I figured what, removed it from the “signal” and that left normal stats stuff. The point perhaps is non-stationary is more literal than it might seem.

It might be interesting to band-split the hadcrut at say 4 years and then do the same analysis on each portion. (low pass will do and subtract to produce the complement)

I recall another ploy involving chopping into sections and reordering.

Brilliant. Of course, it makes perfect sense, but you just don’t think of it that way every day.

I have a few I’ve done like this:

http://naturalclimate.wordpress.com/2014/03/31/ipcc-ar5-claims-in-review-last-decades-unusual/

http://naturalclimate.wordpress.com/2012/01/27/268/

http://naturalclimate.wordpress.com/2012/01/28/usa-run-and-rank-analysis/

It is not the least bit unusual for the last years to be ranked #1, top 10, or whatever, as you can see.

The effect may be due to the generated time series being red (as has been suggested in several of the comments). Have you tried it with a white time series to see if the result holds for that?

My first impression, upon reading this was, Nonsense! That guy’s been looking at too many cherry-picked alarmist time series.”

On further consideration, it actually makes sense to me. In autocorrelation, two adjacent points are likely to have values close to each other. If the points are separated by a third point, then the two outside points are correlated through a correlation. The further apart the points, the weaker the correlation between those two points. The end points, on average, are further from every other point in the series than any intermediate point would be. Further away implies less correlation so the likelihood is that this is where extremes might happen more often. (Just to make sure Willis, what would the results be if you performed the test again, but moving the start and end positions of each of your 2000 point intervals over by 1000 points?)

A question though. Since GISS is notorious for adjusting adjustments to the instrumental record, how should these adjustments affect autocorrelation? Assuming the adjustments are “correct” should we expect it to increase or decrease the autocorrelation? Or should it have zero effect? If we can determine what result to expect, it might be interesting to look at what effect the adjustments actually have had on the autocorrelation!

It’s been 34 years since I was involved in statistics, so I will avoid commenting on most of this discussion. I just want to note, that people only tend to look at things when they are at an extreme. If there is an even distribution of cancer in a certain area, people don’t question unless it is clustered, i.e. extreme. That perception of extreme is based first on their personal mental database. (i.e. wow, I don’t remember having that much snow before.) so the other extreme is likely to pretty distant from the extreme that caught your attention. The fact that most things run in waves or cycles, means that most observations in the cycle are going to be close to the norm and the extremes are going to be distant from each other. Why this works on random numbers I don’t know, but why it works in real life I get.

“Seattle says:

April 24, 2014 at 10:26 pm

I think I may have found a mathematical explanation for this.” arcsine law

You beat me to it. Here’s another link describing the arcsine law:

http://www.math.harvard.edu/library/sternberg/slides/1180908.pdf

Greg says:

April 25, 2014 at 2:38 am

Of course, glad to.

Note that above I found the same distribution of extremes using straight pink noise (1/f noise).

w.

Kasuha says:

April 25, 2014 at 4:48 am

My apologies for the lack of clarity. What I meant was that the first and last hundred points of the dataset had twice the number of extremes as any hundred-point interval in the mid-range of the dataset. My meaning, obviously poorly expressed, was that about 200 out of the 2000 datapoints fall in each of the end 100-point intervals, but only about 100 extremes out of 2000 fall in each of the mid-range 100-point intervals.

Sorry for the misunderstanding,

w.

Willis, I think you made a small error on your simple example:

” Even this very simplified example follows the math. From his math we’d expect 3.2 out of eight at each end (40%), and 1.6 out of eight in the middle (20%). Instead of 3.2 and 1.6, the results are 3 and 2, which are the closest whole numbers for the actual situation.”

Unless I am missing something – the 0’s of the middle two possibilities each land at BOTH ends meaning that we have a total of 10 extremes (instead of 8) since the 0’s which are extremes are repeated. 4 extremes at each end (40%) and 2 extremes in the middle (20%). So we don’t even need to round to get exactly the same answer as the formula.

I really have appreciated both the mathematical and common sense expalantions that have made this seem much more intuitive after all.

Willis

This is really interesting and has had me scratching my head. Thanks!

I have tried to reason why this might be the case. The only thing I can think of is outline below – and forgive me here for conjecture. So you say you’re using a simple red noise generator and excluding parameters that might create drift or locally varying mean (such as crazy Hausdorff exponents etc).

When creating red noise, the easiest way (IMO) is to:

1) create white noise

2) perform forward FFT on white noise

3) apply an exponential decay as function of wave number

4) back transform FFT

Now you have the red noise (correct?).

But what I’m wondering is, that if you assume the series is autocorrelated (which it will be), then one might assume via the Wiener–Khinchin theorem, that there is an equivalence between the FFT of the series (step 3) and the FFT of the autocorrelation function (from series after step 4). Therefore, if your red noise generator is following a similar process (steps 1 to 4), then it may only use the real terms in steps 2 and 4; the cosine transform rather than the full FFT (as the autocorrelation function is symmetric). In this case, and given the nature of the red noise spectrum (step 3) the largest values (powers->amplitude) over many many runs will typically be present at lower wavenumbers. So that after transformation the start and end of your series (approximating the longest cosine wave) will typically have the highest values. I admit that your mid-series will have the lowest values and you state that you plot extremes, so this may only work if by extremes you mean highest values.

WayneM asked at April 25, 2014 at 2:54 pm

“The effect may be due to the generated time series being red (as has been suggested in several of the comments). Have you tried it with a white time series to see if the result holds for that?”

Good suggestion. If Willis is right (bite my tongue!!!) than the histogram for White should be flat. It is:

http://electronotes.netfirms.com/ac.jpg

The figure shows red, white, and white with a (pinkish but not strictly pink) feedback of 0.8 (feedback for red is a=1.0, for white a=0). Also on the figure is the Matlab code that produced the figures, and with other options, just for documentation and to show how simple this is.

cd says in part at April 25, 2014 at 5:23 pm

“……When creating red noise, the easiest way (IMO) is to: ……”

This is an easy (easier?)problem in the time domain – it is just a random walk. It is a discrete integrator or accumulator. The first two pages here give my Matlab code and brief description:

http://electronotes.netfirms.com/AN384.pdf

I use the FFT to verify the spectrum.

I think it’s related to the logic of the TOBS adjustment. Suppose you had an autocorrelated random signal and you divide into equally spaced intervals. Where would you find interval maxima?

There will be local high and low points, and if they appear in the interior, they will be counted once. But if they occur near the cuts, it’s likely that a high point will provide the maxima for two intervals. (same with minima)

So it’s somewhere up to twice as likely that you’ll get extrema near the ends. Which seems to fit with Willis’ experiment.

Gary Pearse says:

April 24, 2014 at 4:21 pm

“This seems to be an example of Benford’s distribution, or Benford’s Law as it is sometime called.”

Gary,

I wrote a hub about Benford’s Law a couple of years ago. It includes an original theorem, which is based upon the BOGOF Principle. (Buy one, get one free.) It’s very difficult to resist the temptation of shameless self-promotion. Here’s a link.

http://larryfields.hubpages.com/hub/Frank-Benfords-Law

Bernie Hutchins

A random walk? What I know of random walks is that they can have a drift (as in multiple Brownian motion simulations reveal). In this sense they are not necessarily stationary. This is not normally true of the algorithm I presented. But then I’m no expert on random walk algorithms/methods?

Also I can’t see how multiple random walk would give you typically more extreme values at the start and end. Nor can I see how with this would be the case using the simple algorithm I presented above – unless all my assumptions are correct. In short, there is a bias in the algorithm – why?

No bias. It’s the very drift you mentioned. For example, consider only 3 points. Without loss of generality, you can start by generating the middle point, then generate a random walk “delta” to get the first point and another one for the last point.

Now, compute the probabilities of each point being the maximum. If these delta’s are real valued (so that, p(0)=0) then there’s a 50% chance that the last point is higher than the middle one and a 50% chance the first is higher (and they’re independent). Thus, the middle point has a 1/4 chance of being the max, while the endpoints each have 3/8.

That’s just an intuitive explanation but it should get you started.

It doesn’t have to be a stationary series. It works just as well with a random walk. Try this in R:

rw1<-cumsum(abs(rnorm(100000))*2*(runif(100000)-.5)) # Random walk with random step size

extremelist<-NULL # Vector of positions of extreme values – max and min of window

for(window in 1:1000)

{

vals<-rw1[(10*window-9):(10*window)] # Window of ten values

extremelist<-c(extremelist,which.min(vals),which.max(vals)) # Add positions of new extrema

}

hist(extremelist)

Again you find the extrema at the ends.

Nick Stokes

Suppose you had an autocorrelated random signal and you divide into equally spaced intervalsThis may be Willis’ experiment, but that is not what is stated in the opening quote which suggested that even if one binned the extreme values for the entire series (the window length is equal to the series length) for a suite of runs, one would find the extreme values at the end and beginning. If the actual experiment is that the time window has to be less then the total time series, then by how much and is the effect sensitive to window size.

So it’s somewhere up to twice as likely that you’ll get extrema near the ends.And if you’re right then the same should be true for unremarkable results. Also, the extreme would be binned, for consecutive windows, at either ends of the histogram range so would be “counted” only once for end and beginning. In short, would they not need to be consistently at the beginning and ends to give the “symmetric” Fig. 2.

Frederick Michael

Thanks for the description, I may be being unfair here but describing how you might get such a result by using a specific Markov Process seems very contrived. More importantly, I don’t think that this provides a reason.

If we repeat Willis’ experiment for a window of say one tenth the length of the time series BUT move it continuously through the series (remember ANY window) and bin the extreme values as Willis has done then why on Earth would one find the extremes at the ends.

Also, ANY window means full series too, which as suggested would give typically extreme values at the end.

BTW I haven’t tried to repeat the experiment.

CD, think about an extension of the three point example I gave. If you have N points and you add point N+1. If point N was the max, the new point just hit point N with a 50% chance it lost its title as the max. In the limit, as N gets large, the probability that the new N+1th point is the max should approach the probability that point N was the max before. So, in the limit, the probability that the endpoint is the max is twice the probability that the point next to it is the max.

Willis spoke precisely of (in his opening paraphrase) “an autocorrelated time series”. I think we do not need the “auto” part of that – it is just correlated (a property – as opposed to uncorrelated). Red noise (brown, integrated white, random walk – all the same) is the first such example that comes to mind. Keep in mind that what Willis plots is NOT an antocorrelation, but a histogram of occurrences of the maximum of many sequences with a PRE-EXISTING correlated property. With a red noise sequence of any length, there is always a frequency of period longer than the chosen length that is not only present but stronger than any “wiggles” we suppose we are seeing in the particular window (instance). This, in many cases, trends the segment to tip up or down. Hence the extremes at the ends. Lovely! Obvious – when we have someone like Willis to point at it! Here was my repeat of the experiment as I posted above.

http://electronotes.netfirms.com/ac.jpg

I will attempt an intuitive explanation: Take the end points of any time series of any length and draw a line connecting to two end points. In the absence of other information like 2.3 cycles occur between the endpoints, every point on the line connecting the endpoints is both higher than one endpoint and lower than the other. The line serves as the expected value of the series at a point in time. Since every expected value is both higher than one endpoint and lower than the other, it is not surprising to find the extreme points at the endpoints. It is the most likely outcome by the definition of expectation, though by no means certain. The length of the interval and standard deviation of the process will influence the likelihood of getting endpoint extrema.

cd says: April 26, 2014 at 12:55 pm“even if one binned the extreme values for the entire series (the window length is equal to the series length)”

The concept is that of a stationary random process. There isn’t a series length. You just observe windows. The series should be unaffected by the window you choose. And one way of choosing a window is to consider first a periodic dissection, then choose a single period. The frequency should not depend on how the window was chosen.

In a periodic dissection, you can see how maxima that occur near the cuts have a better chance (almost double – both sides) of appearing as an interval max. So when you then select one of the periodic intervals, the chance is thus biased.

“And if you’re right then the same should be true for unremarkable results.”No. The point of maxima is that there is only one per interval. That restriction creates the probability difference. There is no restriction on the number of unremarkable results.

Bernie

Firstly, as should have been implicit in the algorithm as I stipulated, one can choose a range of exponents to create different red noise signals. All this changes is the range of the autocorrelation and red noise is autocorrelated (if created using the method being used). Secondly, as for your use of “just correlated” I’m not sure what you mean. Autocorrelation suggests that the degree of correlation between two sets (both derived from the same series) is a function of the lag (the distance between the sampled pairs used to compute the correlation/covariance). Beyond a certain lag distance, this correlation breaks down and the degree of correlation stops being a function of lag (where the autocovariance = 0.5*varaince of the entire series: is this what you mean by beyond the “wiggles”).

Now what your point has to do with this I’m not sure. Can you answer, why would a continuous moving window (or all the data for series of runs) have predominance of extremes at the start and end?

itocalc

end points of any time seriesDo you mean the entire set as well? Then…

every point on the line connecting the endpoints is both higher than one endpoint and lower than the otherObviously not if the end points share the same value. But I take your point.

The line serves as the expected value of the series at a point in time.Not if the series is stationary. If your two end points have different values then the line will have a gradient. The expected value of any point in stationary set is the mean it does not change across the series.

since every expected value is both higher than one endpoint and lower than the other, it is not surprising to find the extreme points at the endpoints.This seems confused – I’m not saying it is, I’ve probably misunderstood.

For a stationary set, the expected value at all points is the same – the mean.

The length of the interval and standard deviation of the process will influence the likelihood of getting endpoint extrema.Again this doesn’t follow, unless you’re suggesting that the autocorrelation function of the set shows that the correlation is dependent on the lag for all possible lags, in which case the series is not stationary! The standard deviation is immaterial in this respect.

cd, Thank you for the critique.

I was not clear about the level versus the changes in the level. If we know the difference in level from the first measurement to the end measurement, and have no other information, then the expectation would fall on a line connecting the points (all changes are presumed a result of noise). If the changes alone are plotted, and increments are independent (Markov process) then obviously there should be no relationship between beginning and end points. My thoughts were along the line of a Brownian bridge, which at one time I could discuss in all confidence, but even at my young age (not quite 50) my memory fades. The effects of autocorrelation just bring out the dumb in me and it is in my best interest to stay silent.

Assuming independent increments, the difference between the endpoint values as measured in standard errors, and the number of points between endpoints will influence the probability of crossing the endpoint values during the process. This is all looking backward, with expectations of interim levels being conditioned on the endpoints values (again thinking through the lens of a Brownian bridge).

cd –

You need to run some code of your own. It’s simple and you will know exactly what was done because you yourself did it.

You said as well: “Autocorrelation suggests that the degree of correlation between two sets (both derived from the same series) is a function of the lag (the distance between the sampled pairs used to compute the correlation/covariance).”

I think you have described a cross-correlation since you derive both as sub-sequences from the same (presumably much longer) time sequence. If I have a sequence of length 1000 and I correlate samples 200-299 with samples 550 to 649, this is a cross-correlation. The process does not “know” whether it is being correlated with a later part of it own self, or perhaps sunspot numbers. If you insist on autocorrelation, you must do the whole sequence.

And, I emphasize, correlation (or not) is a pre-existing property of the series we are looking at (like red, white, pink) and we are examining it in the time domain – just inspecting the samples. No one is computing and sort of correlation anyway.

This is much simpler than you are making it – I think.

Larry Fields says:

April 25, 2014 at 10:11 pm

“Gary Pearse says:

April 24, 2014 at 4:21 pm

“This seems to be an example of Benford’s distribution, or Benford’s Law as it is sometime called.”

Gary,

I wrote a hub about Benford’s Law a couple of years ago. It includes an original theorem, which is based upon the BOGOF Principle. (Buy one, get one free.) It’s very difficult to resist the temptation of shameless self-promotion. Here’s a link.

http://larryfields.hubpages.com/hub/Frank-Benfords-Law”

Thanks Larry, enjoyed it.

Here’s another version of the problem. Say weeks start on Sunday. What’s the chance of the warmest day of the week falling on a Sunday?

It’s up to twice the chance of a Wednesday, even though Nature cares nothing for calendars. It’s likely that the max was part of a warm spell. Over a year, some warm spells will occur midweek, and be counted as one max. But some will occur at weekends, and will provide maxima for two weeks. Sundays aren’t warmer per se, but will show up more often in the statistics.

Nick Stokes says:

April 26, 2014 at 2:22 pm

isn’t a series length. You just observe windowsThere is in the above experiment, the series is dissected into sub-windows. Your explanation needs to address why one would get the same results if you use both discrete windows and continuous windows on the same series. I can’t see how it can.

The series should be unaffected by the window you choose. And one way of choosing a window is to consider first a periodic dissection, then choose a single period. The frequency should not depend on how the window was chosen.Don’t follow but then I don’t know what you mean by periodic dissection.

No. The point of maxima is that there is only one per interval. That restriction creates the probability difference. There is no restriction on the number of unremarkable results.No this is wrong. For continuous variable, there will be one value that is closest to the mean of the set. This provides an indicator statistic (0/1) as with min and max.

Bernie Hutchins says:

April 26, 2014 at 2:47 pm

You need to run some code of your own. It’s simple and you will know exactly what was done because you yourself did it.Maybe. But I have very little time and was hoping for something more akin to a technical link (maths), as being able to reproduce the same kind of results does not explain why.

I think you have described a cross-correlation since you derive both as sub-sequences from the same (presumably much longer) time sequence.No I’ve described an autocorrelation. You’re bivariate statistic comes from the same series. Cross-correlation samples two different series.

And, I emphasize, correlation (or not) is a pre-existing property of the series we are looking at (like red, white, pink) and we are examining it in the time domain – just inspecting the samples. No one is computing and sort of correlation anyway.I never said they did. Autocorrelation is a product of any Markov Process such as a random walk as each new value is conditioned on a previous result (did you not mention random walk?).

This is much simpler than you are making it – I think.It may be, and if Willis is right which I have no reason to doubt, no one has explained why yet. But thanks for your efforts.

cd,

“Don’t follow but then I don’t know what you mean by periodic dissection.”Think of my Sunday example. You have at some location an essentially endless set of daily maxima. You ask – what is the chance of a 7-day period starting with a max for those seven days?

So you think of the records divided into weeks (periodic dissection). Might as well start Sunday. Pick a random week. It’s part of a population of weeks. And by my argument above, there will be more Sunday max’s than Wednesday max’s in that population. Also more Sunday min’s.

Nick Stokes

But some will occur at weekends, and will provide maxima for two weeks. Sundays aren’t warmer per se, but will show up more often in the statistics.Sorry, if the week runs from Monday to Sunday, why would a warmest day on Sunday count as two and if it occurred in Wednesday it would count as one. If your suggesting the heat carries over (for an autocorrelated series this is a fair assumption) but then the warmest day would be at the start not the end of the weak. So unless your warmest days are more likely on a weekend or end/start of window then your explanation doesn’t work.

In short, I don’t think that works at all. For example, your argument would mean for the same year, (and exact same record) if we decided to start our dissects on a Wendesday that result would change.

Personally, I think the result depends on the window size. If your first order statistic (the mean) is only stationary for a given sub-sampled window length (i.e. the sample mean is invariant under translation for windows above a certain size), then windows less than this size will likely be in part of the series with a local trend. This drift will run across the window so that high and lower values at either side of the window. Therefore, this result only holds for certain window sizes.cd –

(1) My concern about auto- vs cross- is that these two terms apply to the way a correlation is done. There is only correlation. The process of breaking a length 1000 random sequence into two length 100 sub-segments and correlating these, as I described, is a cross-correlation. But that is just terminology.

(2) The reason you need to take the 20 minutes to write some code is that if you don’t, and the results look fishy (as they apparently do to you), you won’t be clear on (A) exactly WHAT was done by someone else or (B) what the results MEAN. So where is the fish? Writing your own experiment eliminates (A) and allows you to immediately explore the inevitable “What if we were to….” questions. The word “obviously” may then also come up in your mind. Doing this was very useful for me.

(3) If you look at red signals, I think you won’t have the slightest doubt about the essential “Why”. The exact mathematics is an issue beyond that.

Best wishes.

I happened to have on hand Melbourne daily max from may 1855 to Nov 2013. I counted the days on which the weekly max occurred (omitting 17 weeks with missing readings). The results were:

Sunday 1657

Monday 1185

Tuesday 896

Wednesday 814

Thursday 917

Friday 1224

Saturday 1581

Bernie

My concern about auto- vs cross- is that these two terms apply to the way a correlation is done. There is only correlation. The process of breaking a length 1000 random sequence into two length 100 sub-segments and correlating these, as I described, is a cross-correlation. But that is just terminology.Sorry Bernie this is just wrong (and confused)…you don’t have to take my word for it:

http://coral.lili.uni-bielefeld.de/Classes/Summer96/Acoustic/acoustic2/node18.html

The reason you need to take the 20 minutes to write some code is that if you don’t, and the results look fishy (as they apparently do to you), you won’t be clear on (A) exactly WHAT was done by someone else or (B) what the results MEAN.I never said the results look fishy. I don’t need to write any code to understand what has been done. Willis has spelt out exactly what he did.

But since you’re getting quite “sanctimonious” I have written code to do this sort of work (as part of my job) all it would need is a simple executable with a single wrapper function to put it all together and repeat his experiment. But then I’m not at work – I’m at home now and don’t want to do it, particularly as I am assuming Willis is correct. And when I’m at work and I have some down time from time-to-time I don’t want to start writing code for every technical issue raised on a blog. And again, writing and building that executable will not explain why he’s getting his results without spending even more time on it.

Doing this was very useful for me.In what respect? You haven’t been able to explain why this is the case.

If you look at red signals, I think you won’t have the slightest doubt about the essential “Why”. The exact mathematics is an issue beyond that.No that doesn’t help.

Nick

I happened to have on hand Melbourne daily max from may 1855 to Nov 2013. I counted the days on which the weekly max occurred (omitting 17 weeks with missing readings). The results were:

Sunday 1657

Monday 1185

Tuesday 896

Wednesday 814

Thursday 917

Friday 1224

Saturday 1581

That is a very interesting meteorological result, but it doesn’t prove your point, quite the opposite. If I were to dissect my time series from mid-week to mid-week then the result would look like this:

Wednesday 814

Thursday 917

Friday 1224

Saturday 1581

Sunday 1657

Monday 1185

Tuesday 896

The part of the window with the weekly max are in the middle not the ends.

cd,

“That is a very interesting meteorological result, but it doesn’t prove your point, quite the opposite. If I were to dissect my time series from mid-week to mid-week then the result would look like this:”No, it has nothing to do with meteorology. It describes the Sun-Sat max. If you shifted to mid-week, it changes. In fact, counting with Wed as the first day:

Wed 1570

Thu 1084

Fri 885

Sat 867

Sun 1020

Mon 1287

Tue 1561

cd –

After I said that the issue of cross- vs auto- is a matter of terminology, you provide a link to definitions! Not only do I know the definitions, I know what they MEAN.

Please apply your definitions to the following two sequences:

w1 = -0.3708 0.8942 0.0703 0.4039 0.7501

w2 = 0.6957 0.0537 0.6148 -0.2130 0.9235

These may, or may not, be extracted from a longer sequence:

w3 = -0.3254 -0.3708 0.8942 0.0703 0.4039 0.7501 0.9725 0.7706 -0.1903 -0.2291 0.6957 0.0537 0.6148 -0.2130 0.9235 -0.9398 0.9075

Is the correlation between w1 and w2 auto- or cross-? Would the results be different, or tell you anything?

As for a computer package, you hardly need anything like that. Have you looked at my Matlab code? Not the details, you don’t even have to know Matlab to see that it is simple, short, and that just about anyone could read it (like BASIC). No fancy functions, only a dozen lines, since most of it is commented out or for display.

Sorry if I am sounding sanctimonious to you. Apologies if I have crossed the line between persistently trying to be helpful (a habit as an educator) and showing impatience with a lack of progress.

Ah, but your effort does prove Nick’s point, and – somewhat to my surprise, the infamous time-of-observation-bias (TOBS) that has so corrupted the surface station old records: The two ends of the data stream (Sunday’s high and Saturday’s high) ARE higher because they “pick up” hot days left over from a hot Friday preceding a hot Saturday, and a hot Sunday followed by a hot Monday.

So, what would your data look like of you plotted – not a “weekly high” but a monthly high as a function of “day” ? With a constantly changing length of month, and a constantly varying number and length of the hot and cold fronts across a long month, you will not see any difference in day-of-week.

“So, what would your data look like of you plotted – not a “weekly high” but a monthly high as a function of “day” ?”Well, day of week is an unrelated cycle, so nothing expected there. There would be an end-of-month effect, but confounded with the annual cycle, with which months and weather are aligned.

TOBS hasn’t corrupted records. You can still get data without TOBS adjustment. But it’s clearly biased.

gary bucher says:

April 25, 2014 at 3:42 pm

Thanks, Gary. Regarding your question, I’ve assumed that there is some tie-breaking mechanism (say a small error in each measurement, or just flip a coin) to assure that there is only one minimum and one maximum.

w.

cd says:

April 26, 2014 at 1:17 pm

The question is not why, as the effect is assuredly real. You can use the R code provided by

Asher says:

April 26, 2014 at 12:41 pm

He’s stepping a full window at a time. Here’s the same code stepping one step at a time, as you specified:

Note that the step size in my loop (window = 1:1000) is one. This means the window steps just the way you specified,

“continuously through the series”one step at a time.And the extremes still land at the ends.

I think the problem in your logic is that you imagine a single maximum appearing at one end of the window, and then moving from one end of the window to the other as the window advances through the series.

But that’s not what happens. It is quite likely that before it makes it to the other end of the window, that maximum is superseded by a larger maximum that appears today …

In any case, it’s a real phenomenon, and as you illustrate, quite counterintuitive.

w.

cd says:

April 26, 2014 at 2:24 pm

Because statistically, the moving window is no different than randomly selected windows. Also, see above.

w.

Willis

The question is not why, as the effect is assuredly real.As I said I’m quite happy to take your word for it.

Note that the step size in my loop (window = 1:1000) is one. This means the window steps just the way you specified, “continuously through the series” one step at a time.Thanks for doing that. But there was no need as I took all your points. What strikes me though is the exact symmetry in your histogram. This is almost starting to look as if, in a round about way your actually producing the series’ non-centred spectrogram (for the windows temporal resolution). I know there are many methods of producing certain types of power spectra using moving windows and since the series is red noise then more signal (i.e. exponential decay with wavenumber for w = 1 to N/2) is contained in the end members of the of the non-centred power spectra. I hasten to add that I only know of them, but have never used them and don’t know exactly what they do so I could be talking out of my hat.

Because statistically, the moving window is no different than randomly selected windows.I understand all that as stated, I don’t refute anything you say. And I’m not suggesting that it isn’t real. But I want to know is why – mathematically, not just that it is.

Bernie Hutchins

After I said that the issue of cross- vs auto- is a matter of terminology, you provide a link to definitions! Not only do I know the definitions, I know what they MEAN.Are you sure, they refer to two very different algorithms.

As for a computer package, you hardly need anything like that. Have you looked at my Matlab code?As I say I’ve written the math libraries to do this. I only need a simple executable, one main function and gcc compiler. It costs absolutely nothing and I have the advantage of knowing what the business code is doing which you don’t have with something like R.

As for Matlab, it is way over priced for what it does. Scientists should be taught to code in C and Fortran. This move to scripting languages such as R concerns me.

Sorry if I am sounding sanctimonious to you. Apologies if I have crossed the line between persistently trying to be helpful (a habit as an educator) and showing impatience with a lack of progress.I put the sanctimonious in quotes for a reason – didn’t mean it literally. Writing code will certainly help you understand certain algorithms when you have to code them up in something like C (code akin to proof). This is not true for R which while having a lot of powerful functionality shields the user from doing any of the actual maths.

Maybe the best way to understand this is to think of how a point “competes” with it’s neighbors for the title of maximum (or minimum). Let’s just stick with maximum for now. Any point has a 50% chance of losing to either neighbor (if it has 2) and those two probabilities are independent. So, if you only have 3 points, the middle one only has a 25% chance of being the max. The remaining 75% is split between the other two (symmetry arguments are very helpful here). So, the dist is 3/8, 1/4, 3/8.

Now, let’s extend this. Since we know the 3/8 for an endpoint, we know the prob of a point being beaten by either it’s immediate neighbor on one side or the next neighbor on the same side. This raises the prob of losing from 1/2 to 5/8 because the conditional prob of the second neighbor winning (given the first didn’t) is 1/4 (50% chance the jump was in the opposite direction and 50% chance it was larger than the jump from our subject to the first neighbor).

Using this, we can derive the probs for 4 points (5/16, 3/16, 3/16, 5/16). Now we know, from the endpoints that the prob that a point isn’t beaten by its 3 nearest neighbors on one side is 5/16.

Using this, we can derive the dist for 5 points (35/128, 5/32, 9/64, 5/32, 35/128).

That’s enough for now.

Nick Stokes

Right that is interesting. It’s like bloody magic. What was your explanation again – slowly this time. A link to something describing your explanation would be good.

Above a number of comments seemed right on as to the essential reason for the clustering of max/mins of a correlated signal at the ends, but I did not see a picture. Histograms such as Willis used tell the correct story, but perhaps a medium sized set (10) of actual red-noise signals will be useful.

Here I generated (Matlab similar to comments above) 10 length 1000 white noise signals (uniform distributions between -1 and +1), integrated them, and kept the red signals from sample 500 to 599. These are plotted as a-j:

http://electronotes.netfirms.com/redguys.jpg

The red dots show the max and the blue dots the min. We see the extremes tending toward the ends, as was found in the histogram view. Note (for example) that 60% of the max values are outside the center 60% of the range, as are 70% of the min values.

It is instructive to look at the individual red signals. They are all over the place (see vertical scale). In 100 steps, there were 100 integration contributions, averaging 1/2 in magnitude. Most of the 10 examples moved considerably less that 10 of the possible 50 as positives cancelled negatives as expected. Yet we see as well some gratuitous trends (up or down for much of the 100 samples, like c, e, and h) of significance, which forces extreme values to the ends.

cd,

I don’t have a link. But imagine looking at the daily sequence without knowing the weekday status. You see various clusters of warm days (2,3,4..), which will probably provide many of the week max’s when the division is known.

Clusters that are split by a week boundary may well provide two adjacent week max’s. Others, probably only one. The split clusters get overcounted. And the weekend days are the chief beneficiaries.

Or to see it another way. How can a Wed top the week? It has to beat Tue and Thu, at least. But if Wed is warm, they are likely to be warm too. Tough competition.

Sun has to beat Mon, but not the adjacent Sat. It has to beat the following Sat, but there’s much less reason to expect that to be warm.

Nick

Thanks for your patience. I see what you’re saying. I thought I had stated as much.

You see I can imagine a sine wave series with a small amount of noise added to the each point (+/- say 0.05 of the amplitude). This is both stationary and autocorrelated. Any sample window with a length that is not an integer multiple of the original sine wave wavelength, will have a regression line that is not equal to zero (local drift), therefore the highest and lowest values are likely to lie at the ends of the windows (no matter where you sample).

This may seem a little contrived but satisfies the conditions of the experiment (and easy to communicate) and should apply to red noise for windows with lengths less that the range of the red noise signal. So could you run your experiment again with very large (continuous) windows say (1/4 the length of the series).

Bernie

If I follow your link you prove my point.

Most of your samples have drift so it follows from (see above post to Nick):

You see I can imagine a sine wave series with a small amount of noise added to the each point (+/- say 0.05 of the amplitude). This is both stationary and autocorrelated. Any sample window with a length that is not an integer multiple of the original sine wave wavelength, will have a regression line that is not equal to zero (local drift), therefore the highest and lowest values are likely to lie at the ends of the windows (no matter where you sample).This may seem a little contrived but satisfies the conditions of the experiment (and easy to communicate) and should apply to red noise for windows with lengths less that the range of the red noise signal. So could you run your experiment again with very large (continuous) windows say (1/4 the length of the series).

Nick/Bernie

“…should apply to red noise for windows with lengths less that the range of the red noise signal.”

should be:

“…should apply to red noise for windows with lengths less that the range of the red noise series.”

cd – replying to your April 27, 2014 at 2:21 pm – haven’t read your latest yet

cd-

A couple of points:

I am not sure what you mean by “two very different algorithms.” I don’t think of correlation as anything like an algorithm. Perhaps it is an operation. Once you decide what correlation is, you apply it to two signals. If the signals are different, it is cross-. If the signals are the same, it is auto-. But this is much as a multiplication such as AxB is called “squaring“ when A = B.

The only possible correlation algorithm I can even imagine would be the use of a fast convolution algorithm to do correlation. Perhaps this is where your ideas are coming from, because you originally proposed getting red noise by using an FFT. This bothered me because you said “3) apply an exponential decay as function of wave number” and I didn’t know what “apply” means. I assumed you meant to take the inverse FFT of the exponential decay and multiply it “bin-by-bin” (k values) by the FFT of the white noise. Because this uses an FFT algorithm (Fast) it can be fast for very large convolutions (used for correlations). But it is circular, so you have to live with the periodicity ,or extensively zero-pad if you want linear convolution. It takes too long just to figure out unless you have immense amounts of data.

And as I have emphasized, it is terminology and no one is talking about running a correlation. Correlation is a pre-existing property which a sequence has or does not have. We can correlate white noise into red or pink noise, for example, but we use a filter (usually low-pass) to do this. The filtering correlates successive samples. [ Correlating is filtering. Computing a correlation is analysis. ]

I agree that software that involves complicated functions and scripts does not always teach you (show you) much in the sense that Fortran, C, or Basic can. Matlab has powerful functions and scripts, but it is interpretive and can be used simply. For example, here is the “core” of my white-to-red converter:

for m=1:100

xr(m) = xr(m-1) + x(m) ;

end

It’s just a discrete integrator. You know what you did exactly. And you can ask, for example, what would happen IF you instead used xr(m) = (0.9)*xr(m-1) + x(m) , etc;

cd said April 27, 2014 at 4:09 pm

“ Bernie If I follow your link you prove my point.

Most of your samples have drift so it follows from (see above post to Nick):”

cd –

Nope, it can’t be “most” because either they ALL have “drifts” or NONE of them do. What do you mean by “drift”? Apparently something you SEE that you choose to call a drift?

You could have a white noise sequence -1 2 0 1 -1 -2 5 7 8 9 7 8 11 9 and claim it is drifting. Fooled by randomness. You will be quick to point out that it in all likelihood will come back down soon. Red noise sequences (random walk, drunkard walk) show features like this often, and much more extreme in durations and magnitudes. But they TOO always come back if you are patient. And then they “drift” again, and so on.

All of mine are pieces of sequences that are really infinite. We generally look at them only briefly!

But your quarrel should be with the mathematics of the random walk itself, not with me or Nick. No magic – Nature IS subtle.

Possibly the app note link I provided to my “Fun with Red Noise”

http://electronotes.netfirms.com/AN384.pdf

would be entertaining to you at some point. It really is Fun.

Bernie

cd says:

April 27, 2014 at 2:21 pm

…

As someone who speaks all three languages, the idea that anything but R should be used for this kind of scientific work is … well, not a brilliant plan. Of the three, R is the only functional language, meaning that functions are just another object.

The main advantage of R to me is this. To add 1 to a 3D array in either Fortran or C, you have to do something like this (in pseudo-code)

In R, on the other hand, you do this

For me, that’s a no-brainer. The R version is far easier to write, easier to read, and easier to debug. The opportunities for mis-typing in the first example, or of not actually putting in the correct limits and not adding 1 to every part of the object, is much greater. With R, none of those problems are present.

The other advantage of R is that I can select any section of code and run it by hitting command-enter. This lets me run a whole section, part of a section, a single line, or a single command. This is a huge advantage, it lets me step forwards and back and pick whatever I want to run.

Steve McIntyre talked me into learning R a few years ago. Best investment of time I’ve made in a long while.

w

Bernie

I am not sure what you mean by “two very different algorithms.” I don’t think of correlation as anything like an algorithmThis is getting tiresome (I’ m sure for you too). All math procedures boil down to an algorithm. For a continuous series, the integrand for cross correlation has two samples from two series; for the autocorrelation there is two samples from one series. You might think this is trivial as the expressions have a lot in common but that’s not the case. For example, because the autocorrelation is from one series one can assume that that variance of both sets are the same and that the value at lag = 0 is equal to that variance. You cannot make such assumptions with cross-correlation because they’re different.

The only possible correlation algorithm I can even imagine would be the use of a fast convolution algorithm to do correlation. Perhaps this is where your ideas are coming from, because you originally proposed getting red noise by using an FFT. This bothered me because you said “3) apply an exponential decay as function of wave number” and I didn’t know what “apply” means. I assumed you meant to take the inverse FFT of the exponential decay and multiply it “bin-by-bin” (k values) by the FFT of the white noise.No. You misunderstand.

And as I have emphasized, it is terminology and no one is talking about running a correlation.Jeeze…read the article note the reference time and again to autocorrelation.

What do you mean by “drift”?And yet you refute what I’m saying even though you don’t understand it.

Apparently something you SEE that you choose to call a drift?Fit a simple linear regression line through your sample series, and I’m sure for most, you’ll get statistically significant trends that are not equal to zero! Drift is a term used widely to denote a trend, it is commonly used to denote the emergence of a trend in a process (such as a Markov Process).

But your quarrel should be with the mathematics of the random walk itself, not with me or Nick. No magic – Nature IS subtle.I’m not quarreling with anyone. I find the article truly interesting and want to know why? I don’t need to invoke a random walk, and shouldn’t do so as I want to be 100% sure I’m sub-sampling a stationary process.

My own hunch is that below a certain window size, as with my sine wave example, there will be local drift (a trend) that will result in extremes values likely being at the end of each sampling window.

I think Willis has done a great service here. I like many others tend to assume that we know all there is to know about the methods we use routinely. But their nuances are far too great and varied that perhaps we need to reappraise how we use them all the time.

Willis

the idea that anything but R should be used for this kind of scientific work is … well, not a brilliant planThat is a personal opinion.

In R, on the other hand, you do thisThis highlights the issue. R is essentially a scripting language for a given environment akin to VBA in Excel. It’s quick and easy to use (as you show) but it isn’t very efficient.

Take your addition for example, the R scripts are interpreted, so that under the hood, the R script will likely be interpreted into C so that in the end – for the CPU – it all looks the same. The only thing is that if you build and compile in the “native” C code, you can be guaranteed that it will be far, far more efficient in terms of memory and CPU. So if you’re dealing with big data sets and very complex problems R sort of runs out of gas quickly. Furthermore, with C you can exploit the power of the GPU which is tailored to very specific mathematical problems and incredibly efficient. On top of this, as you build up your catalogue of C functions, as with R, doing very complex operations takes a few lines of code but with all the benefits of greater speed and management of resources.

In short, it’s a balance between efficiency+power and ease. This is a common issue, the judgement depends on the situation at hand. So in your instance for a small study R might be best. Once the processing time takes longer than the development time then its time to make the switch.

Personally, I see the use of R as something that can be useful at the design stage but in the end you should build all your number crunching in C.

Willis

for the CPU – it all looks the sameBy that I mean a series of incremental steps, the “native” C code should be complied more tightly.

Oops…

Take your addition for example, the R scripts are interpreted, so that under the hood, the R script will likely be interpreted into C so that in the end…Sorry that is just lazy and wrong, should be:

“…the R script will likely be interpreted by an interpreter and use math libraries (.so/.dll) written in C…”

cd said various things April 28, 2014 at 2:07 am:

“All math procedures boil down to an algorithm.”

If you wish, but using terms according to the common usage, in context, avoids confusion that occurs if you try to redefine or misuse as you go.

I said “This bothered me because you said ‘3) apply an exponential decay as function of wave number’ and I didn’t know what ‘apply’ means. I assumed you meant to take the inverse FFT of the exponential decay and multiply it “bin-by-bin” (k values) by the FFT of the white noise.”

to which cd replied:

“No. You misunderstand.” [That’s all cd wrote!]

Then what WERE you talking about. The term “Apply” does not mean anything in this context. (Paint can be applied!) Are you adding, multiplying, convolving? If so, what and how? How about some code or pseudo-code or a formula, – or at least something. You dodged the question.

cd then also said:

“I don’t need to invoke a random walk”

But this whole thing got started with your telling us (incorrectly or at least not with adequate information) how YOU proposed to generate red noise. Now – under the bus?

cd also said:

“My own hunch is that below a certain window size, as with my sine wave example, there will be local drift (a trend) that will result in extremes values likely being at the end of each sampling window.”

I should certainly think so! Try length 2. Even a length-2 of constants.

Last word is yours if you want it.

Bernie

If you wish, but using terms according to the common usage, in context, avoids confusion that occurs if you try to redefine or misuse as you go.This sounds like waffle.

I assumed you meant to take the inverse FFT of the exponential decay and multiply it “bin-by-bin” (k values) by the FFT of the white noise.”The step is outlined. Do I need to spell what an exponential decay is? Do I have to spell out how you’d apply (and yes it is

applyas in applying a scalar, a smooth etc. to any series) to the spectral information (both real and imaginary terms or just real if cosine transform is used).Anyway, this is all immaterial because it is now quite clear that this is not the reason for the given distribution of extremes for sub-windows. By the way this is the standard method for creating stationary red noise.

How about some codeThis is a blog for heaven’s sake get a sense of propriety. Look up “generating red noise (Brownian noise) using an FFT” and you’ll see exactly what I meant – as outlined.

But this whole thing got started with your telling us (incorrectly or at least not with adequate information) how YOU proposed to generate red noise. Now – under the bus?Oh bloody hell here you go:

http://en.wikipedia.org/wiki/Brownian_noise

I hate quoting Wikipedia but for some people needs must!

Last word is yours if you want it.I already have. My explanation seems fine to me.

Bernie

Sorry that last post was terse and bordering on rude. My only excuse is that that this time of year (Spring here) I get sinus pain and puts me in an awful mood.

Anyway. Look I thought that there was something quite remarkable going on here. I looked at the problem and I just couldn’t see why it was – and even invented some spurious theories. When I sat down and thought about – pen and paper (no need for code).- it occurred to me (several posts up in fact) that this all

justmight be down to local drift and short sampling windows (capturing this drift). But this is just “commonsensical” and obvious to anyone analysing such sets – it seemed too easy and hence why all the fuss? Could still be wrong but don’t really care anymore.There the last word.

Willis,

Any chance that you could post the actual time series (somewhere/anywhere) that you used in the above analysis.

I know it’s two million points long, but all I need is a linear array of monthly y-values as a text file or any format that you would prefer.

Thanks

cd –

Making the certain transgression error of returning here after vowing to give you the last word, and taking the risk of aggravating your sinus condition, you did after all ask:

“Do I need to spell what an exponential decay is?”

No, but you still have not said WHAT you are doing WITH an exponential decay. If you are filtering to red in the frequency domain, you would multiply the FFT, point-by-point, with a RECIPROCAL of k, mirrored at the midpoint of course, etc. etc. This is the way Matlab programmers generate red noise using the FFT.

The rest I understand (and have for ages) but I haven’t a clue why you say “exponential decay”.

[ Any conceivable reding or pinking filter would have an impulse response consisting of a sum of decaying complex exponentials, but this would involve the inverse FFT first.]

Thanks.

Bernie

Additional on Red Noise by FFT

If we use the conventional method of converting white—>red by the use of the FFT, something curious happens. Because for zero frequency (k=0) we would be multiplying by the reciprocal of k as 1/0, this is not allowed; and instead we multiply this one point by 0 (Matlab for example discards this as “NAN” – not a number). This removes the mean – automatically! In my experiment of generating length-1000 red noises, from which I then snipped a subset of 100, removing the mean from the length-1000 had a strong tendency to (of course) greatly reduce any dc offset or “drift” apparent in the length-100 subset. So it may automatically APPEAR superior:

http://electronotes.netfirms.com/redguysbyFFT.jpg

which can be compared to the original result redguys.jpg also there (time-series integration method). But of course, it is easy to directly remove the means intentionally, completely if we wish. Obviously – nothing we do with means changes the positions of max and min.

Bernie

No, but you still have not said WHAT you are doing WITH an exponential decay.What I have presented should’ve been enough, I’ll state it again:

1) Take a white noise series (spatial/time domain)

2) Forward FFT

3) FFT series (real and imaginary terms)

-> for each wavenumber (to both real and imaginary components)

apply scalar (scalar = 1/(|w|^B) where B is arbitrarily chosen (whether you want red/fractal noise for example)

4) Back transform FFT -> red noise series.

But this is all immaterial now. I thought there was something more profound at the time going on.

exponential decayBecause it communicates what the power spectrum looks like without actually having to define a flipping equation. I had assumed most people would know what I meant.

Because for zero frequency (k=0) we would be multiplying by the reciprocal of k as 1/0, this is not allowed; and instead we multiply this one point by 0Ah by k you mean w (wavenumber w -N/2 to N/2)? We’re talking cross purposes here. You’re the type of chap that uses j instead of i when writing complex numbers? You’re an engineer – right?

The noise in step 1 is always centred so that w = 0 should be immaterial, and yes I don’t do anything with it, so step 3 is only for all w != 0 (in C you typically get an arithmetic exception, so even more critical to ignore).

And finally, AND MOST IMPORTANTLY, is my explanation for the distribution of extremes correct (localised drift). If so then what they hell was all the fuss about? I thought there was something truly remarkable going on.

Bernie

I think I’ve realised were most of the confusion is coming from I’m thinking spatially so hence the wavenumber (for the purposes of the experiment it doesn’t really matter as far as I’m concerned). I note you correctly refer to frequency (given that Willis refers to time series, but as I say this could be a cross section of altitude) but the “equivalence” is there.

cd –

We are getting close.

Yes I am an engineer (engineering physics) so I understood your term “wave number” to be my k, and I did explicitly define the DFT in a comment well above as involving n (time), k (frequency) and of course j. I taught signal processing for 35 years.

I still do not see why you refer to an exponential decay. You write out “scalar = 1/(|w|^B” which would be a correct frequency weighting for red if B=1 (AND you are apparently thinking cosine transform, and not FFT as you say). Because w is the variable here, an exponential series would be, for example, B^w, not w^B. That’s why we should write things in math language.

As a concrete example: If you have a length 7 signal x(n), n=0…6, and take it’s FFT X(k), k=0…6, then you would achieve a red filtering by multiplying each X(k) by 1/k, a series of reciprocals, reflected at the midpoint. Specifically for length 7 this would be the series 0 1/2 1/3 1/4 1/4 1/3 1/2 where the value of 0 for k=0 is necessary to avoid infinity, and does remove the DC term. When you take the inverse FFT following the multiply the result is real (you usually have to remove a tiny imaginary part that is due to roundoff).

Bernie

ICU says:

April 28, 2014 at 7:46 pm

Thanks, ICU, good question. It’s a random generation, so I can’t give you that particular series, but

a series with the same AR and MA … it’s 16 Mbytes. It is a zipped CSV file, 2000 rows by 1000 columns. It’s in column order, so each column is a separate realization of the pseudodata.here’sHere’s what I used to generate the data, although the details are unimportant—any autocorrelated dataset shows the phenomenon.

w.

Bernie

Yes I am an engineer (engineering physics) so I understood your termWell then your’re the expert.

I still do not see why you refer to an exponential decay. You write out “scalar = 1/(|w|^B” which would be a correct frequency weighting for red if B=1 (AND you are apparently thinking cosine transform, and not FFT as you say).In my field the production of fractal red noise is ubiquitous for simulating uncertainty. B is a non-integer (>1). It all depends what you wish to use.

As for cosine transform this was a red herring when I was trying to get a handle on what was going on.

Because w is the variable here, an exponential series would be, for example, B^w, not w^B.Yes I stand corrected. As I say it seemed like a good way to get across the general idea but inaccurate.

But again, do you agree that the symmetric histogram distribution of extremes is down to small sample windows likely sampling local drift?

Thanks for your time.

cd asks: “But again, do you agree that the symmetric histogram distribution of extremes is down to small sample windows likely sampling local drift?”

Essentially – YES! I think this was was the understanding Willis had too, as did many other commenters.

What remains to be agreed is the notion of a “small sample window”. With red noise there is always a lower frequency of larger amplitude so I suspect that the window size does not matter – just the correlation properties. I am working on this.

Best wishes

Bernie

Bernie

I think this was was the understanding Willis had tooThis should have been stipulated more explicitly in the post as it was said that:

…any given time window…This is what threw me as this isn’t true for stationary autocorrelated series – is it? For example, I’ll try to be clearer here, if one were to use large windows (with lengths greater that the range of the autocorrelation function for the series), then I can’t imagine how the type of distribution shown could be reproduced. By range, I mean that for typical autocorrelated stationary series the autocorrelation decreases “exponentially” (you know what I mean) before stabilising about 0 for all lag distances thereafter.

cd-

I think it is pretty much fractal, independent of length. I just complete a program similar to my previous displays of 10 red signals, but here one set of 10 for length-100 signals and the other set of 10 for length-16,000 signals:

http://electronotes.netfirms.com/redguys-SL.jpg

The figure shows the same clustering at the ends as we have been observing, as in the histograms Willis posted. I considered a max/min to be at the end if it was within 0-10% or 90%-100%. By chance, it should have been 20% (4 of the possible 20), but it was 40% to 60% in the examples. For the short sequences, this was 8 of 20 for the figure (five other runs gave 11, 8, 4, 6, and 8 of the 20 possible extremes). For the long sequences, the figure shows 12 of 20 (in the five repeats, this was 8, 12, 6, 5, and 11).

This convinces me that window length does not matter.

I think it is just what red noise is.

Bernie

Bernie

You’ve done one of either of two things (correct me if I’m wrong):

1:You’re entire series is plotted in each plot? If so then you have drift in most (a trend across the entire series – they’re not stationary). If you were to analyse the autocorrelation for these sets, I suspect they wouldn’t stabilise about 0 (or fit a regression line I suspect that the t magnitude would would be => 2; p = >0.05).

2You have sampled a larger series and plotted these sub-samples.

Now before you say that the effect is the same, can I say that you need first to derive the autocorrelation for you’re entire series to ensure the series is stationary. If the autocorrelation function does not converge on 0, and thereafter remain relatively constant, then you’re original series isn’t stationary! Which is one of the conditions of the experiment. If the series is stationary, then you need to run the experiment using

samplewindows greater than the range (the lag distance at which the correlation converges on 0).To do this you need to create a:

1) stationary series

2) determine the series autocorrelation function

3) ensure that the series is stationary – correlation converges on 0 (for lag which is the range), and thereafter remains constant (for all lags above range). If not then reject and goto 1.

4) sub-sample the series using windows with lengths greater than the range of the autocorrelation function.

However, if you did case 2 then I’m baffled and will, at some time in the future repeat what I suggest.

Oops:

(or fit a regression line I suspect that the t magnitude would would be => 2; p = >0.05).

Should be:

(alternatively fit a regression line, which I suspect would have a slope coefficient different to 0 with a t value => 2 (or 0.05).

Aargh:

(alternatively fit a regression line, which I suspect would have a slope coefficient different to 0 with a t value => 2 (or 0.05).

to

(alternatively fit a regression line, which I suspect would have a slope coefficient different to 0 with a t value => 2 (

or p < 0.05).cd –

My illustrations are simply red noise sequences of potentially indefinite durations, which I happen to start at n=0 and end at n=15999. Each of the 10 examples plotted is thus 16000 long and the length 100 examples are the same sequences from sample 4000 to 4099. The samples of red noise (xr) are simply obtained iteratively from the original white noise (xw) as: xr(m) = xr(m-1) + xw(m) [discrete integration]. I happened to start the sequences with xr(0)=xw(0) but this just adds a particular DC offset that does not effect the indices of min/max. (We could equally well have used your FFT method to get xr). The results seem to me to be fractal.

These are classic “random walks” and while you believe you SEE drifts, it is only a matter or waiting longer to find a return to zero (proven 100 years ago or so). Remember that the “drunkard” finds his way home with probability 1 with a finite number of iterations (may be very large). See my length 16000 examples, second column, middle, which happens to be an outstanding example. Yes – it is counter-intuitive.

You would presumably not attempt to fit, in a meaningful way, a linear regression to white noise. Neither should you attempt to fit a linear regression to red noise – ultimately the slope is zero.

[ In fact, in a larger context, no polynomial (for which a first-order linear regression is an example) should be used to “model” a signal. Polynomials may be useful when applied to local segments of signals for interpolations and/or smoothing purposes (e.g. linear interpolation of tables of trig function values). But polynomials amplitudes all run to + infinity or to – infinity non-locally (they are vertical), and are inherently unsuited to signals that are infinite in time (even if zero at ends), finite amplitude, and thus essentially horizontal. For example, an apparent first-order upslope in global temperature could only be temporary. ]

I think that the one thing we believe, at least empirically, is that the extreme values trend strongly to the ends of any window chosen, due to correlation of successive samples. Perfect insight, as usual, is elusive!

Bernie

Bernie

My job involves producing commercial software. I am the principal designer and programmer of most of our statistical tools used for spatial analysis and modelling, ranging from Kriging (you may know it as a spatial linear regression or GPR), multivariate regression, principal components, stochastic modelling, highly optimised systems for solving large linear and non-linear systems (of the order of 100,000s) to name but a few. So please don’t assume just because I don’t express something that somehow I’m not as up-to-speed as you are. When I say there is likely to be drift in your series – there probably is! But again I don’t have the data.

Neither should you attempt to fit a linear regression to red noise – ultimately the slope is zero.That’s slightly patronising and not what I said (I’m talking about local drift). That said, what you say will only stand if it the series you’re testing is stationary, it doesn’t matter if the process will ULTIMATELY produce a stationary process if given enough time to evolve as such.

These are classic “random walks” and while you believe you SEE drifts, it is only a matter or waiting longer to find a return to zero (proven 100 years ago or so).You need to create a stationary series, not one that given enough time will prove to be stationary – remember Willis is talking about sub-sampling a stationary series, not sub-sampling a sample of one that is stationary. By the way you’re the guy with the series, just for fun fit a simple regression line to all YOUR series.

Until you actually create only stationary series (not a portion of something that will undoubtedly become stationary), and then run the test on sub-samples from them you can’t repeat the experiment.

cd-

Apologies If I stepped on toes – it’s difficult to guess what another person knows or does not know from a few online comments.

When you did not seem to recognize that a auto-correlation was self-cross-correlation and thought them to be “algorithms” I had to make an assumption. Then you berated me (twice at least) for not obviously understanding your FFT red process when this was because YOU misled me for two days when you yourself were confusing a “decaying exponential“ with a series of reciprocals. Really!

Anyway, this thread is too old now.

Bernie

Apologies If I stepped on toesWell please don’t put words into my mouth, you did this with regard to fitting trends to local drift which you then expanded to suggest I was talking about global trends for a stationary series. And then here again. I guess the problem here is that we’re talking cross-purposes most of the time.

When you did not seem to recognize that a auto-correlation was self-cross-correlation and thought them to be “algorithms” I had to make an assumption.

First of all, you never SAID that, you said a cross-correlation WAS THE SAME AS an autocorrelation – it was just terminology! Are you saying that this holds in the general case of cross-correlation then all cross-correlation must hold the same properties as all autocorrelations for finite, discrete series. As I explained to you this is not true:

1) To begin with an autocorrelation is always symmetric, something that cannot be said in the general case of the cross-correlation.

2) For a stationary series the normalized auto-correlation is equal to its normalized autocovariance which cannot be assumed for the generalised cross-correlation.

And because of this you cannot just blankly say they are one in the same – BTW, I’m sure you know all this and this wasn’t what you meant by it is just “terminology”; I understood you to be saying that autocorrelation is just a special case of cross-correlation – I wasn’t disagreeing with that I was disagreeing with…

The process of breaking a length 1000 random sequence into two length 100 sub-segments and correlating these, as I described, is a cross-correlation. But that is just terminology.The use of two length segments, by segments did you mean lag distance? This seemed very confused to me, probably because you use a terminology I’m not familiar with.

No I’ve described an autocorrelation. You’re bivariate statistic comes from the same series. Cross-correlation samples two different series.Now tell me where did I say that autocorrelation was not a special instance of cross-correlation.

Then you berated me (twice at least) for not obviously understanding your FFT red process when this was because YOU misled me for two days when you yourself were confusing a “decaying exponential“ with a series of reciprocals.Yes I took that on the chin and I’m sorry for berating you (I’ve already said this). I was using the term far to liberally which you corrected me on. Again…I stand corrected.

Bernie

For autocorrelation: by “special case”, I mean the cross-correlation of a series with itself and with regard to symmetry this is for a finite series of real values.

Willis, I found this by accident and it may support the point you made here.

This guy started a Twitter account in 2008:

twitter.com/jpbimmer

He has only posted one tweet per year, each tweet being on the same day of the same month. So what’s the big deal? Look at the number of retweets for each tweet. The very first and very last tweets have more retweets than the ones in between. The gap has narrowed since I last checked, though.