A Way To Calculate Effective N

Guest Post by Willis Eschenbach

One of the best parts of writing for the web is the outstanding advice and guidance I get from folks with lots of experience. I recently had the good fortune to have Robert Brown of Duke University recommend a study by Demitris Koutsoyiannis entitled The Hurst phenomenon and fractional Gaussian noise made easy. It is indeed “made easy”, I recommend it strongly. In addition, Leif Svalgaard recommended another much earlier study of a similar question (using very different terminology) in a section entitled “Random series, and series with conservation” of a book entitled “Geomagnetism“. See p.584, and Equation 91. While it is not “made hard”, it is not “made easy” either.

Between these two excellent references I’ve come to a much better understanding of the Hurst phenomenon and of fractional gaussian noise. In addition, I think I’ve come up with a way to calculate the equivalent number of independent data points in an autocorrelated dataset.

So as I did in my last post on this subject, let me start with a question. Here is the recording of the Nile River levels made at the “Roda Nilometer” on the Nile river. It is one of the longest continuous climate-related records on the planet, extending from the year 622 to the year 1284, an unbroken stretch of 633 years. There’s a good description of the nilometer here, and the nilometer dataset is available here.

nilometer river levels cairoFigure 1. The annual minimum river levels in Cairo, Egypt as measured by the nilometer on Roda Island.

So without further ado, here’s the question:

Is there a significant trend in the Nile River over that half-millennium plus from 622 to 1284?

Well, it sure looks like there is a trend. And a standard statistical analysis says it is definitely significant, viz:

Coefficients:

                 Estimate Std.      Error t value P-value less than
(Intercept)          1.108e+03  6.672e+00 166.128  < 2e-16 ***
seq_along(nilometer) 1.197e-01  1.741e-02   6.876 1.42e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ' 1

Residual standard error: 85.8 on 661 degrees of freedom
Multiple R-squared:  0.06676, Adjusted R-squared:  0.06535
F-statistic: 47.29 on 1 and 661 DF,  p-value: 1.423e-11

That says that the odds of finding such a trend by random chance are one in 142 TRILLION (p-value less than 1.42e-11).

Now, due to modern computer speeds, we don’t have to take the statisticians’ word for it. We can actually run the experiment ourselves. It’s called the “Monte Carlo” method. To use the Monte Carlo method, we generate say a thousand sets (instances) of 663 random numbers. Then we measure the trends in each of the thousand instances, and we see how the Nilometer trend compares to the trends in the pseudodata. Figure 2 shows that result:

histogram trends random normal pseudodataFigure 2. Histogram showing the distribution of the linear trends in 1000 instances of random normal pseudodata of length 633. Mean and standard deviation of the pseudodata has been set to the mean and standard deviation of the nilometer data.

As you can see, our Monte Carlo simulation of the situation agrees completely with the statistical analysis—such a trend is extremely unlikely to have occurred by random chance.

So what’s wrong with this picture? Let me show you another picture to explain what’s wrong. Here are twenty of the one thousand instances of random normal pseudodata … with one of them replaced by the nilometer data. See if you can spot which one is nilometer data just by the shapes:

twenty random pseudodataFigure 3. Twenty random normal sets of pseudodata, with one of them replaced by the nilometer data.

If you said “Series 7 is nilometer data”, you win the kewpie doll. It’s obvious that it is very different from the random normal datasets. As Koutsoyiannis explains in his paper, this is because the nilometer data exhibits what is called the “Hurst phenomenon”. It shows autocorrelation, where one data point is partially dependent on previous data points, on both long and short time scales. Koutsoyiannis shows that the nilometer dataset can be modeled as an example of what is called “fractional gaussian noise”.

This means that instead of using random normal pseudodata, what I should have been using is random fractional gaussian pseudodata. So I did that. Here is another comparison of the nilometer data, this time with 19 instances of fractional gaussian pseudodata. Again, see if you can spot the nilometer data.

twenty random fgn pseudodataFigure 4. Twenty random fractional gaussian sets of pseudodata, with one of them replaced by the nilometer data.

Not so easy this time, is it, they all look quite similar … the answer is Series 20. And you can see how much more “trendy” this kind of data is.

Now, an internally correlated dataset like the nilometer data is characterized by something called the “Hurst Exponent”, which varies from 0.0 to 1.0. For perfectly random normal data the Hurst Exponent is 0.5. If the Hurst Exponent is larger than that, then the dataset is positively correlated with itself internally. If the Hurst Exponent is less than 0.5, then the dataset is negatively correlated with itself. The nilometer data, for example, has a Hurst Exponent of 0.85, indicating that the Hurst phenomenon is strong in this one …

So what do we find when we look at the trends of the fractional gaussian pseudodata shown in Figure 4? Figure 5 shows an overlay of the random normal trend results from Figure 2, displayed on top of the fractional gaussian trend results from the data exemplified in Figure 4.

histogram trends fractional gaussian random normal pseudodataFigure 5. Two histograms. The blue histogram shows the distribution of the linear trends in 1000 instances of random fractional gaussian pseudodata of length 633. The average Hurst Exponent of the pseudodata is 0.82 That blue histogram is overlaid with a histogram in red showing the distribution of the linear trends in 1000 instances of random normal pseudodata of length 633 as shown in Figure 2. Mean and standard deviation of the pseudodata has been set to the mean and standard deviation of the nilometer data.

I must admit, I was quite surprised when I saw Figure 5. I was expecting a difference in the distribution of the trends of the two sets of pseudodata, but nothing like that … as you can see, while standard statistics says that the nilometer trend is highly unusual, in fact it is not unusual at all. About 15% of the pseudodata instances have trends larger than that of the nilometer.

So we now have the answer to the question I posed above. I asked whether there was a significant rise in the Nile from the year 622 to the year 1284 (see Figure 1). Despite standard statistics saying most definitely yes, amazingly, the answer seems to be … most definitely no. That amount of trend in a six-century plus dataset is not enough to say that it is more than just a 663-year random fluctuation of the Nile. The problem is simple—these kinds of trends are common in fractional gaussian data.

Now, one way to understand this apparent conundrum is that because the nilometer dataset is internally correlated on both the short and long term, it is as though there were fewer data points than the nominal 633. The important concept is that since the data points are NOT independent of each other, a large number of inter-dependent data points acts statistically like a smaller number of truly independent data points. This is the basis of the idea of the “effective n”, which is how such autocorrelation issues are often handled. In an autocorrelated dataset, the “effective n”, which is the number of effective independent data points, is always smaller than the true n, which is the count of the actual data.

But just how much smaller is the effective n than the actual n? Well, there we run into a problem. We have heuristic methods to estimate it, but they are just estimations based on experience, without theoretical underpinnings. I’ve often used the method of Nychka, which estimates the effective n from the lag-1 auto-correlation (Details below in Notes). The Nychka method estimates the effective n of the nilometer data as 182 effective independent data points … but is that correct? I think that now I can answer that question, but there will of course be a digression.

I learned two new and most interesting things from the two papers recommended to me by Drs. Brown and Svalgaard. The first was that we can estimate the number of effective independent data points from the rate at which the standard error of the mean decreases with increasing sample size. Here’s the relevant quote from the book recommended by Dr. Svalgaard:

svalgaard quote hurstFigure 6. See paper for derivation and details. The variable “m” is the standard deviation of the full dataset. The function “m(h)” is the standard deviation of the means of the full dataset taken h data points at a time.

Now, you’ve got to translate the old-school terminology, but the math doesn’t change. This passage points to a method, albeit a very complex method, of relating what he calls the “degree of conservation” to the number of degrees of freedom, which he calls the “effective number of random ordinates”. I’d never thought of determining the effective n in the manner he describes.

This new way of looking at the calculation of neff was soon complemented by what I learned from the Koutsoyiannis paper recommended by Dr. Brown. I found out that there is an alternative formulation of the Hurst Exponent. Rather than relating the Hurst Exponent to the range divided by the standard deviation, he shows that the Hurst Exponent can be calculated as a function of the slope of the decline of the standard deviation with increasing n (the number of data points). Here is the novel part of the Koutsoyiannis paper for me:

brown quote hurstFigure 7. The left hand side of the equation is the standard deviation of the means of all subsets of length “n”, that is to say the standard error of the mean for that data. On the right side, sigma ( σ ), the standard deviation of the means, is divided by “n”, the number of data points, to the power of (1-H), where H is the Hurst Exponent. See paper for derivation and details.

In the statistics of normally distributed data, the standard error of the mean (SEM) is sigma (the standard deviation of the data) divided by the square root of N, the number of data points. However, as Koutsoyiannis shows, this is a specific example of a more general rule. Rather than varying as a function of 1 over the square root of n (n^0.5), the SEM varies as 1 over n^(1-H), where H is the Hurst exponent. For a normal dataset, H = 0.5, so the equation reduces to the usual form.

SO … combining what I learned from the two papers, I realized that I could use the Koutsoyiannis equation shown just above to estimate the effective n . You see, all we have to do is relate the SEM shown above,  \frac{\sigma}{{n}^{1-H}} . , to the number of effective independent data points it would take to give you the same SEM. In the case of independent data we know that the SEM is equal to \frac{\sigma}{{n_{eff}}^{0.5}} . Setting the two expressions for the SEM equal to each other we get

StD\begin{bmatrix}\bar{X}_n\end{bmatrix}=\frac{\sigma}{n^{1-H}}=\frac{\sigma}{{n_{eff}}^{0.5}}

where the left hand term is the standard error of the mean, and the two right hand expressions are different equivalent ways of calculating that same standard error of the mean.

Inverting the fractions, cancelling out the sigmas, and squaring both sides we get

n_{eff}=n^{2-2H}

Egads, what a lovely result! Equation 1 calculates the number of effective independent data points using only n, the number of datapoints in the dataset, and H, the Hurst Exponent.

As you can imagine, I was quite interested in this discovery. However, I soon ran into the next oddity. You may recall from above that using the method of Nychka for estimating the effective n, we got an effective n (neff) of 182 independent data points in the nilometer data. But the Hurst Exponent for the nilometer data is 0.85. Using Equation 1, this gives us an effective n of 663^(2-2* 0.85) equals seven measly independent data points. And for the fractional gaussian pseudodata, with an average Hurst Exponent of 0.82, this gives only about eleven independent data points.

So which one is right—the Nychka estimate of 182 effective independent datapoints, or the much smaller value of 11 calculated with Equation 1? Fortunately, we can use the Monte Carlo method again. Instead of using 663 random normal data points stretching from the year 622 to 1284, we can use a smaller number like 182 datapoints, or even a much smaller number like 11 datapoints covering the same time period. Here are those results:

histogram trends fractional gaussian random normal pseudodata7Figure 8. Four histograms. The solid blue filled histogram shows the distribution of the linear trends in 1000 instances of random fractional gaussian pseudodata of length 633. The average Hurst Exponent of the pseudodata is 0.82 That blue histogram is overlaid with a histogram in red showing the distribution of the linear trends in 1000 instances of random normal pseudodata of length 633. These two are exactly as shown in Figure 2. In addition, the histograms of the trends of 1000 instances of random normal pseudodata of length n=182 and n=11 are shown in blue and black. Mean and standard deviation of the pseudodata has been set to the mean and standard deviation of the nilometer data. 

I see this as a strong confirmation of this method of calculating the number of equivalent independent data points. The distribution of the trends with 11 points of random normal pseudodata is very similar to the distribution of the trends with 663 points of fractional gaussian pseudodata with a Hurst Exponent of 0.82, exactly as Equation 1 predicts.

However, this all raises some unsettling questions. The main issue is that the nilometer data is by no means the only dataset out there that exhibits the Hurst Phenomena. As Koutsoyiannis observes:

The Hurst or scaling behaviour has been found to be omnipresent in several long time series from hydrological, geophysical, technological and socio-economic processes. Thus, it seems that in real world processes this behaviour is the rule rather than the exception. The omnipresence can be explained based either on dynamical systems with changing parameters (Koutsoyiannis, 2005b) or on the principle of maximum entropy applied to stochastic processes at all time scales simultaneously (Koutsoyiannis, 2005a).

As one example among many, the HadCRUT4 global average surface temperature data has an even higher Hurst Exponent than the nilometer data, at 0.94. This makes sense, because the global temperature data is heavily averaged over both space and time. As a consequence the Hurst Exponent is high. And with such a high Hurst Exponent, despite there being 1,977 months of data in the dataset, the relationship shown above indicates that the effective n is tiny—there are only the equivalent of about four independent datapoints in the whole of the HadCRUT4 global average temperature dataset. Four.

SO … does this mean that we have been chasing a chimera? Are the trends which we have believed to be so significant simply the typical meanderings of high Hurst Exponent systems? Or have I made some foolish mistake?

I’m up for any suggestions on this one …

Best regards to all, it’s ten to one in the morning, full moon is tomorrow, I’m going outside for some moon viewing …

w.

UPDATE: Demetris Koutsoyiannis was kind enough to comment below. In particular, he said that my analysis was correct:

Demetris Koutsoyiannis July 1, 2015 at 2:04 pm

Willis, thank you very much for the excellent post and your reference to my work. I confirm your result on the effective sample size — see also equation (6) in

http://www.itia.ntua.gr/en/docinfo/781/

As You Might Have Heard: If you disagree with someone, please have the courtesy to quote the exact words you disagree with so we can all understand just exactly what you are objecting to.

The Data Notes Say:

###  Nile river minima
###
##  Yearly minimal water levels of the Nile river for the years 622
##  to 1281, measured at the Roda gauge near Cairo (Tousson, 1925,
##  p. 366-385). The data are listed in chronological sequence by row.
##  The original Nile river data supplied by Beran only contained only
##  500 observations (622 to 1121).  However, the book claimed to have
##  660 observations (622 to 1281).  I added the remaining observations
##  from the book, by hand, and still came up short with only 653
##  observations (622 to 1264).

### — now have 663 observations : years  622–1284  (as in orig. source)

The Method Of Nychka: He calculates the effective n as follows:

nychka neff

where “r” is the lag-1 autocorrelation.

Advertisements

181 thoughts on “A Way To Calculate Effective N

  1. It’s ten after one in the morning here. Alas, I can’t do much moon viewing – the monsoon has set in, and it’s just a big bright spot in the clouds.

    Truly fascinating – I will be following these papers up myself. This looks like an excellent approach to auto-correlated data (and for determining when you are actually dealing with it, rather than a normal set).

    • You can also do it easily as follows:

      Calculate Rt as the difference between observations X(t) – X(t-1). That is, create a new series of differences at lag 1. Use this Rt series in your Monte Carlo simulations: randomly shuffle it 10,000 times. Create 10,000 new series of X. Calculate the trends of the 10,000 new series. If the original series trend is more extreme than 95% of the generated series, then bingo, it’s significant (and you can tell how significant by the proportion of random series are more extreme).

      The key point is to use the differences between observations, not the observations themselves. Caveats may apply.

      • I think I may have to leave my own caveat there ‘cos I think the series generated will all have the same slope. It’s been a long year since I did this and it was for a different purpose. (To separate random walks from bounded series, etc). I guess it holds but only for subsections of the data. Sorry ’bout that. Anyway I’ll now sit down with pencil and paper and work out a simple Monte Carlo way… if there is a way…

      • Final gasp:
        I can’t see an easy way to do it as I was so boldly claiming above. If you randomize the differences you don’t always get the same slope, but you do always get the same start and end value, which is not so hot. (‘cos the sum of differences is always the same no matter how you reorder them). It keeps the same persistence, but also the slope you want to test. If you randomize the observations themselves, you get a range of slopes centred on 0, but you lose the persistence.

        Any answers on a postcard welcome…

    • Mann writes in his paper

      In the absence of any noise (i.e. modeling only the pure radiatively forced
      component of temperature variation), we obtained the following values of H for the
      full Period (AD 850–1999): 0.870, pre-instrumental period (AD 1850–1849): 0.838,
      and instrumental period (AD 1850–1999): 0.903.

      And according to the neff formula, the total number of effective climate data points we have is

      (1999-850)x12 ^ (2 – 2 x 0.870) = ~12

      12 climate data points since before the MWP. Ouch.

      and

      (1999-1850)x12 ^ (2-2 x 0.903) = ~4

      So Mann agrees with you Willis. We have 4 climate data points in the modern era. Nice.

    • It would be funnier if there was nothing better the money could have been spend on, eh Mr. Worrall?

  2. Thank you Willis.
    Does the method of calculating the number of effective data points say anything about where they are, in particular?
    Do they correspond in any way to the decades to centuries-long fluctuations in the Nile data?
    If I look at the Nile data graph from across the room, I can discern several trends up and down. Maybe seven of them, maybe eleven, depending on where I stand.

    Thank you again, very interesting and thought provoking as always.

    BTW, Ten of five here, and I have to wake up in an hour for work :-)

  3. Thanks Willis, you found an excellent way to explain this relatively complicated topic.
    We have had a Climate Dialogue about Long Term Persistence, in which Koutsoyiannis participated, together with Armin Bunde and Rasmus Benestad. It’s fascinating to see how mainstream climate scientists (in this case Benestad) are a bit well let’s say hesitant to accept the huge consequences of taking into account Hurst parameters close to 1 :)
    See http://www.climatedialogue.org/long-term-persistence-and-trend-significance/

    Marcel

    • Thanks for that link, Marcel. It is a most fascinating interchange, with much to learn. I’m sorry I didn’t know about it at the time.

      You say:

      It’s fascinating to see how mainstream climate scientists (in this case Benestad) are a bit well let’s say hesitant to accept the huge consequences of taking into account Hurst parameters close to 1 :)

      When I realized how small the actual number of effective independent data points is in high-Hurst datasets, I had the same thought. I thought, mainstream climate scientists are not going to be willing to truly grasp this nettle … despite the fact that I was able to use the Monte Carlo analysis to verify the accuracy of the neff of 11 in the dataset in the head post.

      All the best to you,

      w.

      • Thanks, Willis, for an intriguing conundrum (and thanks, Marcel, for doing Climate Dialogue). I have a suggestion that might help to resolve the issue (or might just demonstrate that when it comes to statistics I’m totally ignorant). Suggestion : take a highly correlated data series where there are both known genuine trends and “random” variation, for example temperatures at one location throughout one day, or daily maximum temperatures at one location over a year, and see how your new method perceives it.

  4. …extending from the year 622 to the year 1284, an unbroken stretch of 633 years. …

    Er… 1284 – 622 = 662, not 633. Minor transposition error – your reference gives a correct 662.

    In fact, that Nilometer has several gaps in its record – there has been a Nilometer on that spot since around 715, but there were earlier ones (presumably stretching back to some 3000BC?). Shame we haven’t got all the data…

    • Er…No.
      Which year don’t you include, 662 or 1284. Include them both and it’s a span of 633 years.

      • Were does 662 come from? The OP says <i<"from the year 622 to the year 1284, an unbroken stretch of 633 years"

        There is no ‘662’ mentioned at all…

  5. The basic assumption underlying the error estimates in regression techniques is that that the deviations from whatever relationship you fit are: a) uncorrelated, that is the next deviation does not depend on the foregoing one, and b) they are (in this case) Gaussian distributed.

    One glance at the plot shows that both conditions are not satisfaied.Hence any “error” estimate on the fitted parameters are meaningless. In fact, it is obvious from the graph that there is no long term trend.

    • Meeting these assumptions bothers me with most climate data, too. Here, as well, Willis has revealed the persistent memory within these sets. I’m not sure we’re entitled to make any inferences at all with these data, although I appreciate the temptation. That leaves only making observations/descriptions of data and attempting to find words to make “real world” sense of the observations.

      The post-normal whackos will say that this is another failure of classical scientific methods and instead these data clearly show we need to collect a bunch of taxes and damn the river.

    • Yes, basic statistics always starts with the assumption that data samples are representative of the population.
      Global surface temperature data certainly do not represent the globe. A few individual surface stations do have good data for their microclimate, eg some rural stations. Such data show zero warming.

      The statement “no long term trend” is not the same as saying that the slope of the trend line is insignificantly different from zero. This gets into hypothesis testing, and assumptions about the underlying nature of what is being meaured. My observation of the same chart is that the trend line is irrelevant to the data. Or, the first year of data are insignificantly different from the last year. Or, etc, etc.

    • That’s my take as well. There appears to be a lot of signals other than Gaussian noise in the data. My experience from fitting data is that if increasing “N” doesn’t cause a decrease in the residual, then there is some sort of signal lurking in the original that the fitting routine is not fitting for.

      First thing I would do is to do a Fourier transform of some sort to see if any spikes show up.

      This is also why ignoring the PDO and AMO an really bite when trying to determine ECS.

  6. Willis,
    “Or have I made some foolish mistake?”
    If your analysis of HadCRUT4 indicates there is no significant trend then, regrettably, I think the answer to your question is yes.

    Certainly, the HadCRUT data has been doctored and “adjusted” to within an inch of its life, like all the other surface series, in order to get the desired warming trend.
    But the data, as it stands, clearly shows a significant trend. There must be something wrong with your analysis – but as a non-statistician, I’ve no idea what.

    In the first set of random examples, the real data does stand out: there’s less vertical random deviation. I find it slightly suspicious that you have to switch to another kind of random data to get the desired result.

    Also, the Nile data corresponds nicely to the climate of the time. It shows falling levels as the world descended into the colder climate of the Dark Ages and then rising and peaking spot-on on the Medieval Warm Period. In general, drought is less likely in a warmer climate because warmer air can carry far more moisture.

    You do have to be very careful about complex statistical analysis – the Internet is full of such warnings, and advice that analysis must be tempered by using the Mark One Eyeball as a sanity check. Your analysis on the South American river solar correlation was an excellent example of this: the scientists had used advanced statistics to arrive at a conclusion that was almost certainly completely wrong.

    I’m sorry, but both the Nile data and HadCRUT both show clear trends. Reminds me of that old saying about lies, damned lies and statistics….
    Chris

      • Excellent question and it takes more than statistics to answer – we frequently found statistical significance with very small measured differences (pretty well controlled biology studies) that held no behavioral significance.

    • @chris, Of course we see trends in these data. Any gambler in Vegas on a “streak” sees a trend. The question of statistical significance is: Can what we see be explained by random chance? Or is there a “system” at work?
      @willis, Another issue that should be considered is whether the underlying process randomness is Gaussian. You might consider running you Monte Carlo using non-normal pseudo random number generators. I have also found that people regularly under estimate the number of MC iterations required to achieve a converged result. The simple trick of computing the relevant statistic over each half of the sample set and increasing the sample size until they are in acceptable agreement is often helpful.

    • Yes, they exhibit clear trends. But the question is whether those trends are “significant.”

      If my end-of-month checking-account balance falls for three successive months, that’s a clear trend. But that trend is not strong evidence that over the long term (as evidenced by, say, my last sixty years of checking-account balances) my checking-account balance’s month-to-month change is more likely to be negative than to be positive: the trend is clear, but it’s not “significant.”

      Mr. Eschenbach is saying that the same may be true of the temperature trends.

    • Analysis of autocorrelated data is always fraught. The “normal” statistical methods all fail without drastic correction for autocorrelation. This was pounded home in laboratory methods coursework in the 80’s. There were many tests we had to perform on the datasets before we could even begin serious statistical work, and the professor was diabolical in giving data to analyze that would fail odd tests but would give totally bogus but great looking results to the wrong analysis if you skipped prequalification.

      I haven’t seen any courses like that since I have been back in academia. Maybe that is part of the problem in climate science – except most of the practitioners went to school back when I did or a decade before.

    • But the data, as it stands, clearly shows a significant trend.
      ====================
      nope. the data shows a trend. under the assumption that the data is normally distributed (that temperature acts like a coin toss), then the trend is significant.

      however, what the Hurst exponent tells us is that HadCRUT4 data does not behave like a coin toss. And as such, the statistical tests for significance that rely on the normal distribution need to account for this.

      In particular, “n”, the sample size. We all understand that a sample of 1 is not going to be very reliable. We think that a sample size of 1 million will be significant. that we can trust the result.

      However, what Willis’s result above tells us is that as H goes to 1, the effective sample size of 1 million samples also goes to 1. Which is an amazing result. H-1 tells us that 1 million samples is no more reliable than 1 sample.

      • Thanks Ferd, for a such a concise summary of the background to (yet another) fascinating article from Willis. Your comment lit the lamp of understanding as far as I am concerned.

      • ferd berple: “However, what Willis’s result above tells us is that as H goes to 1, the effective sample size of 1 million samples also goes to 1. Which is an amazing result. H-1 tells us that 1 million samples is no more reliable than 1 sample.”
        Excellent observation. Is that plausible? I think not. So I suspect that the method of estimating H fails as H goes to 1.

        I guess I will have to go an create a bunch of artificial data sets for which I know the answer, and see what I get.

    • Chris, you really need to reread Willis’ article with emphasis on methods and methodology. The Mk 1 eyeball is ideal for understanding Willis choice, if you actually follow his text.

  7. All Willis is saying is that when a system has a lag time that is comparable to or greater than the time spacing of the data points, neighboring data points are not independent. That is simply common sense. Willis is just going into the math. Consider AMO/PDO as just two examples of *known* long-timelength effects. Any measurement they effect must be recognized as one in which the point-to-point data are correlated. Suppose you have a variable that spends years in a “low” state, then an event happens that sends it to a “high” state, but the measurements are autocorrelated such that the measured value stays “high” for many years. The first points are lower than the last ones, but that does not make a trend.

    There are a multitude of timescales for just thermal inertia, to take only one example: air warms and cools quickly, the ground not so much. I live in an old, stone building, and the thermal lag seems to be a few days for the building itself to warm up or cool down during periods of extremely hot/cold weather. If the walls are warm to the touch at noon, they will still be warm at midnight and warm again tomorrow. I can tell that even without a PhD in Climate Science.

    • “There are a multitude of timescales”

      This is where I thought he was going but he didn’t.

      Once upon a time, when the earth was still young and warm, auto-corelations were frequently used. They were helpful in quickly showing the underlying timescales in the data, kind of like neff but not really. With the time scales somewhat estimated people would then take a stab at fitting various functions around the data with asymptotic expansions or wavelet analysis, etc., i.e. make up a function and see if it could be wrapped around the data in some fashion. Everyone had their favorites. With the advent of digital signal processing and the Cooley Tukey algorithm, the Fourier series kind of won out over the others.

  8. Would anyone like to run the Central England Temperature data set as a nice test of a long time series for U.K. temperatures?

    • Yes, I did this statistical test some years ago – just for my own edification. The result was unambiguous: the Hurst exponent, H, was significantly different from 0.5.

  9. Looks like the issue here is how one defines ‘trend’ and the time scales applied.

    The Nile clearly varies with latitudinal shifting of climate zones.

    This is a neat description of statistical techniques but is rather akin to arguing how many angels can fit on a pin.

    We need to know why the climate zones shift in the first place and the most likely reason is changes in the level of solar activity.

    • I think that the problem is we have politicians that are akin to Pharaohs arguing that the Nile is following a clear statistically significant rising trend in floods since those pyramids were built so is planning a pyramid tax, but the statistics are being incorrectly applied due to their not taking note of the Hurst phenomenon. As with today’s politicians they are completely disinterested in what the reasons are for the Nile floods and will call you a Nile Flood d*nier should you question their ‘statistical trend’ and thus their reasons for filling their coffers.

  10. Why muck around with all the calculations?

    If you want to work out the effective N, why not simply do a fourier transform, look at the spectrum and work out how much it is depleted from normal white noise?

    In effect if you’ve only got 70% of the bandwidth, then information theory says you only have 70% of the information = statistical variation.

    • To find the white noise floor you need to know the variance of the noise component. How would you determine this in a dataset with significant autocorrelation? In other words, if we’re able to seperate the noise from the signal, we’d be done and the FT would be superfluous.

  11. Thank you very much for this post; to me it was one of your more interesting.

    But I am embarrassed to confess that, despite intending to for some years, I have not allocated the time to master this derivation, and I won’t today, either. Nonetheless, this post and the references it cites have gone into the folder where I keep the materials (including a Nychka paper I think I also got from you) that I’ll use to learn about n_eff if a long enough time slot opens before I pass beyond this vale of tears.

    If it’s convenient, the code that went into generating Fig. 8 would be a welcome addition to those materials. (I’m not yet smart enough about Hurst exponents to generate the synthetic data.)

  12. Dear Willis,

    The nilometer data, for example, has a Hurst Exponent of 0.85, indicating that the Hurst phenomenon is strong in this one …

    I have a reasonable doubt. How do you know what the Hurst Exponent is for the nilometer data? I have no idea, but I assume that it is something that you calculate by looking at the data itself, and not something based on any previous understanding of the underlying physics of the phenomenom. Please correct me if I am wrong.

    In case that I am correct, and you calculate the Hurst Exponent by looking at the data, then the Hurst Exponent is useful to DESCRIBE the data, but is not useful in any way to calculate the likelihood of having data like that. The only way to calculate such likelihood is based on the understanding of the underlying physics.

    Let’s say that I have a six-sided dice and I roll ten times, getting 1,1,2,2,3,3,4,5,6,6 for a fantastic upward trend. The likelihood of getting such a result is extremely poor, because the distribution is indeed normal (rolling dices). But if you are going to calculate the Hurst Exponent by looking at the data and not at the underlying physics, you are going to get a high Hurst Exponent, because the data indeed looks like it is very autocorrelated. And if you then run a Monte Carlo analysis with random variables with the same Hurst Exponent, The Monte Carlo analysis will tell you that, well, it was not so unusual.

    I hope that you understand my point. If you compare something rare with other things that you force to be equally rare, the rare result will not look rare anymore.

    Best regards.

      • Thanks a lot for that. As can be seen in your link, the Hurst Exponent is calculated from the data. Therefore it is good for describing the data, and probably also good for modeling possible behaviour in the future based on past behaviour (the original intention), but it tells you nothing about how rare the past behaviour was. Nothing at all. It can’t.

        When using it to model the future, it will do a good job only if the past behaviour was not rare in itself for any reasons, be it due to changes of the underlying physics or “luck”. So, the more normal and less influenced by luck your past data is, the better job the Hurst Exponent will do at characterising future behaviour.

        The Hurst Exponent is ASSUMING that the variations experienced in the past characterise the normal behaviour of the variable being studied. And this may be a correct assumption… or not. The longer the time series, the more likely it will be a safe assumption. But it will still tell you nothing about how rare the past behaviour was. It can’t be used for that, because it already assumes that the past was normal. It assumes so in order to use the information to predict the future. So if you take your data and compare it to other random data with the same Hurst Exponent, of course it will not look rare. You have made them all behave in the same way. This doesn’t mean that your data wasn’t rare by itself.

        Exercise: run 1000 times random series of pure white noise with 128 data points. Calculate the trend for each of them. Take the one series with the greatest trend (either positive or negative). We know it is a rare individual, it is 1 among 1000. But let’s imagine that we didn’t, and we wanted to find out whether it was rare or not. Let’s do Willis’ exercise: calculate its Hurst Exponent. It should not be any surprise that it departs somewhat from 0,5. Now we compare it with another 1000 series of Random Fractlonal Gaussian Pseudodata with the same Hurst Exponent, instead of white noise. We calculate the trends of those series. Surprise! The trend of our initial dataset doesn’t look rare at all when compared with the trends of the other Random Fractlonal Gaussian Pseudodata series. Following Willis exercise, we would conclude that our data was not rare at all. And we would be wrong.

        I wish I had the time to do the exercise myself.

      • Nylo has a very made strong point “Following Willis exercise, we would conclude that our data was not rare at all. And we would be wrong.”

        But I think he misses the fact that outside of the industrialized “CO2” period, the earth’s climate didn’t have an H of 0.5 at all like his white noise did. So what is a good value of H to use?

    • I second this notion, particularly the final statement

      “If you compare something rare with other things that you force to be equally rare, the rare result will not look rare anymore.”

    • Terrific point, Nylo. I might add that the usual statistical formulas for error estimates require an unknown quantity: the standard deviation of the population. Estimates of that quantity from the sampled data are typically quite poor, unless one has a very large number of data points. So I wonder how reliable is H, and therefore effective n, estimated from the data.

  13. Hi Willis,
    Interesting post. I think there may be a problem with the contribution of a real underlying trend to the value for the Hurst exponent. If you detrended the Nile data and then calculated the Hurst exponent, you might get more effective data points.

    Separating ‘causal’ long term variation from autocorrelation driven long term variation is no simple task. I suspect you could look at the variability of absolute values of slopes for different series lengths of real Nile data and synthetic autocorrelated data and see differences in the ‘distribution of slopes’ for different series lengths.

  14. I hate linear trend lines drawn on non-linear data. Climate scientists and other social scientists are particularly enamoured of them.

    • The secular trend could simply be slow subsidence on the island.
      It’s all relative, dependent on the reference frame assumed as constant.

      I doubt they had differential GPS surveyors back then to know which is the case. ;)

  15. Interesting stuff, but my intuition is that something has gone wrong here. The Nile data visually does have a clear trend and that trend looks significant.

    I wonder, if you took a timeseries consisting of the integers 1 through 1000 and applied the same methods to it, what would N(eff) for that be?

    • Nigel Harris July 1, 2015 at 5:38 am
      Interesting stuff, but my intuition is that something has gone wrong here. The Nile data visually does have a clear trend and that trend looks significant.

      Isn’t the comment “something has gone wrong here. The Nile data visual does have a clear trend…” equivalent to not accepting the climate data because it doesn’t agree with the theory. There may be many reasons to question the accuracy of this post, but because it doesn’t agree with a “visually clear trend” is not one of them.

      • We’ll have to disagree on that.

        However, I’m pretty sure something is wrong here.

        The equation N[eff] = n^(2-2H) implies that for H between 0 and 0.5 (negative autocorrelation) the effective n would be larger than n, which is clearly impossible.

      • for H between 0 and 0.5 (negative autocorrelation) the effective n would be larger than n, which is clearly impossible.

        Wouldn’t such a case be indicative of a very well behaved time series where intermediate points between sampled points are highly predictable? When a time series function is known exactly, doesn’t the Neff approach infinity?

      • for H between 0 and 0.5 (negative autocorrelation) the effective n would be larger than n, which is clearly impossible.
        Wouldn’t such a case be indicative of a very well behaved time series where intermediate points between sampled points are highly predictable? When a time series function is known exactly, doesn’t the Neff approach infinity?

        I’m glad you wrote this, as I can’t explain it in proper terms, but this is exactly what I was thinking.
        And the 2n relates to the 2 samples required to perfectly sample a periodic function, think nyquist sampling.

      • The original reference restricted the solution to H >= 0.5; however since values of H < 0.5 imply negative correlation, I suspect but can not prove that simply replacing 2-2H with 1-abs(2H-1) that the equation would cover the full range of H correctly.

      • Yes, a negative autocorrelation DOES mean that the effective n is greater than n.

      • Nigel Harris writes “the effective n would be larger than n, which is clearly impossible.”

        I dont think so. The way I see it, 2 “data” points describe a line and neff is 2. 3 “data” points can lie on a line but neff is still 2…

  16. 4 data point, that sounds right to me, I need to figure out how to do this on the difference dataset I’ve created with no infilling and no homogenization.
    I know the bi-annual trend is strong, as it should be (also highly auto-correlated), and the annual max temp has almost no trend, not sure about min temps.
    But I’m still calculating solar forcing for each station for each day, in the second week of running, but I’ve redone it a couple times, and it’s now doing about 8,000 records/minute, it’s finished 54 million out of ~130 million.

    But to answer your fundamental question, yes as I’ve said for a few years now, all of the published temp series are junk, and the trend is a result of the process, and they’re all about the same because they all do the same basic process.And my difference process is different, and it gives a very different set of answers (it’s more than one answer).
    Max temps do not have an upward trend, what they do have is a different profile, ie the amount of time during the year we’re at the average max temp has changed, but this trend looks like it’s a curve that peaked ~2000.

    • On a more serious note, I am dubious of all conclusions about statistics that try to comprehend data from a system where physical function properties of that system are not understood. It’s like analysing the stock market graphs. Easy to do in hindsight. Apply whatever equation you like to it, it still won’t help you predict the future any better. Trends are only trends until they are trends no longer.

  17. Excellent article Willis and of course thanks to RGB and Dr S for the core material.

    It is not surprising that temperature data is strongly auto correlated since the thermal inertia ensure it has to be. HadCruft$ [sic] is a bastard mix of incompatible data. IMO it should be completely ignored when doing anything about the physics of climate.

    It would be more relevant to look at land and sea separately. I would expect SST to be more strongly autocorrelated than land SAT.

    This simple AR1 behaviour can be removed by taking the first difference ( dT/dt ). This is also the immediate effect of an imbalance in radiative ‘forcing’.

    It may be more interesting to see how many separate points you get from that.

    This seems to echo the argument Doug Keanan was having with the Met Office ( that ended up in the House of Lords ) about what the relevant statistical model was to apply to the data when making estimations of significance of global warming. AFAIR he used a different model and showed the change was not significant.

    I’ve been saying for years that this obsession with “trends” is meaningless and simply reflects the technical limitations of those so keen drawing straight lines through everything they find, with the naive belief that a “trend” is some inherent property which reveals the true nature of everything.

    You point about processing is also a good one. How do the Hurst exponents differ for ICOADS SST and HadSST3 ( since 1946 when SST has been more consistently measured )?

  18. A question that I think needs to be asked:

    If the volume of water that flows down the Nile River each year is measured by the height of the water each year over the span of 633 years then the comparison of annual water flows would only work if there was no changes in the river.

    But over the span of 633 years I think it would be unreasonable to think there would be no changes in the river and hence accurate comparisons of water flows as measured by heights of the water level would be questionable.

    • Remember this time series only records the year’s low point of the river. It says nothing about most of the year and nothing about flood.

      On a guided tour of Zion National Park, the guide informed us that the puny little river in the bottom of the canyon is a raging torrent in the spring melt such that 99% of the year’s flow is accounted for in just a couple of days.

    • Combotechie

      The data could also be showing that the Nile is silting up. That could be tested: if it is high one year (a peak flow cleans out silt) is it always lower the next? Is there a slow rise then a peak that lowers the river bed ‘suddenly’? How about three high years in a row? It is a river, after all.

      If there is a 2 ft rise over 6 centuries, the silting effect and a 1470 weather year cycle could easily produce it. The importance of your point is that one has to consider the system being examined, not just the numbers. Viewing the flood height as only caused by rainfall and therefore flow volume is to ignore the fact it is a river and that there is a ‘Nile Delta’ for a very good reason. Flood height at the measurement height could be dominated by the shifting sand bars downstream.

      Very interesting phenomenon to analyse.

  19. The four effective data points of global SST:

    1915,1934,1975,1995

    next effective data point 1919. ;)

  20. Oh dear earlier post went AWOL.

    Basically: great article Willis.
    HadCruft4 is a meaningless mix of incompatible data, irrelevant to any physical study.
    How about doing same thing on first difference ( dT/dt ) to remove the obvious autocorrelation SST.

    cf H for icoads SST and hadSST3 since 1946 ?

  21. previous post probably in bit bin due to use of the B word. Which was used as a cuss word to indicate the illegitimate intermarriage of land and sea data.

  22. A very nice post, Willis.

    Some time ago, ca. 2008, with assistance from Tomas Milanovic and a brief exchange with Demitris Koutsoyiannis, I did a few Hurst coefficient analyses of (1) the original Lorenz 1963 equation system, (2) some data from the physical domain, and (3) some calculated temperature results from a GCM. In the second file, I gave a review of the results presented in the first file; probably too much review. The errors in the content of the files are all mine.

    The second file has the Hurst coefficient results for the monthly average Central England Temperature (CET). I said I would get to the daily average for the CET record, but I haven’t yet.

    The Lorenz system results, pure temporal chaotic response, showed that for many short periods the Hurst exponent has more or less H = 1.0. For intermediate period lengths the exponent was H = 0.652, and for fewer and longer periods the exponent was H = 0.50.

    I found some temperature data for January in Sweden that covered the years 1802 to 2002. The exponent was about 0.659. The monthly-average CET record gave H = 0.665. A version of the HadCRUT yearly-mean GMST data gave H = 0.934 for a 160 year data set.

    I got some 440 years of monthly-average GCM results for January from the Climate Explorer site:

    # using minimal fraction of valid points 30.00
    # tas [Celsius] from GFDL CM2.0, 20C3M (run 1) climate of the 20th Century experiment (20C3M) output for IPCC AR4 and US CCSP

    The Hurst exponent was basically 1.0, as in the case of the Lorenz system for many short segments.

    I have recently carried out the analysis for the Logistics Map in the chaotic region; the exponent is H = 0.50. I haven’t posted this analysis.

    Note: I do not build turnkey, production-grade software. My codes are sandbox-grade research codes. If you want a copy of the codes I’ll send them. But you’ll need a text editor and Fortran complier to get results different from what is hard-wired in the version you get. I use Excel for plotting.

  23. The more I see “studies” and EPA estimates and such, the more I know the importance of statistical misunderstanding and misrepresentation. And the more I see that the subject is in a foreign language that I do not speak, despite that fact that I was able to get an A in my statistics classes taught by Indian instructors who really didn’t care to explain it all. So, it’s Willis and a random PhD against a world of activist experts. This does not end well.

  24. Chris Wright makes some comments I would like to pick up on:

    Certainly, the HadCRUT data has been doctored and “adjusted” to within an inch of its life, like all the other surface series, in order to get the desired warming trend.

    It is always a matter of concern when the adjustments to the data exhibit a clear linear trend which is around half of the linear increase in the final measured results after adjustment.

    But the data, as it stands, clearly shows a significant trend. There must be something wrong with your analysis – but as a non-statistician, I’ve no idea what.

    If you follow the link to Koutsoyiannis’ linked paper you will see a nice simple example of the problem of estimating a linear trend from a short data series which does not have a linear trend. For example, if your observation period is significantly less than any characteristic periodicity in the data, you will easily be fooled into seeing a trend where in fact what you have is a long term stationary, periodic process. Similarily, auto-correlated processes wander in consistent directions for quite significant periods, even when they are long term stationary. Just because it looks to your eye like a trend does not mean that there really is a trend. That’s what autocorrelated processes do to you – tricksy they are.

    Also, the Nile data corresponds nicely to the climate of the time. It shows falling levels as the world descended into the colder climate of the Dark Ages and then rising and peaking spot-on on the Medieval Warm Period. In general, drought is less likely in a warmer climate because warmer air can carry far more moisture.

    This comment has nothing to do with Willis’ conclusion. So if the Nile data links to climate, and if we believe there is a cause and effect between the two, then this correlation would logically lead us to believe that it is the climate that is auto-correlated and we see the result of this in the proxy data of the Nile. None of that comment or a conclusion arising from it would negate the point that identifying a trend from data, and especially auto-correlated data, is not trivial.

    When people here talk about long term trends they are essentially arguing for a non-stationary element in the time series. For example, in HadCRU, climatologists argue the time series is essentially a long term, non-stationary linear trend plus a random deviation. Even without introducing autocorrelation, such a decomposition into a trend + residual is largely arbitrary, and particularly so for natural phenomena where no physical basis for the driving forces is really known. Arguing that there is a trend and then saying CO2 is rising, therefore it caused it, is a tautology, not evidence. And it makes no difference how complex a model you design to try and demonstrate this, including a coupled GCM. Its still a tautology.

    As someone myself who works in geostatistics and stochastic processes, I can say Koutsoyiannis’ work is highly regarded and of very high quality. It is absolutely relevant to the questions of climate “trends” (or not!). I am glad Willis has discovered this area and written a good quality summary here. I am also interested in his number of effective samples – there are a number of ways to compute this. I am somewhat surprised by the result that the autocorrelation in the temp series is such that Neff = 4. It would be interesting to see other methods compared.

  25. “Adjusted R-square = 0.06….”

    Stop right there. Everything after that point is runaway (and ineffectual) imagination, in other words mathematical noise (unproved statistical hypotheses and shallow observations) piled upon actual measurement noise. You HAVE to have a substantial R-square to keep your “feet on the ground” or your “head on straight”. And you don’t have it.

    Fire all statisticians, as they have obviously overstepped their abilities to model the real world. It’s just not that complicated.

    • I think it just means 6% of the variation in the dependent variable is explained by the trend, using assumptions of normality. This isn’t a multivariate model where Willis is trying to explain variation in the data series using standard assumptions of normality. If that were the case, then your rant would make sense.

  26. n_{eff}=n^{2-2H}
    Very simple. Too simple, surely?

    My question is, “What does H actually mean?”
    Is it just the likelihood of event n+1 being close to n?
    In which case, isn’t this in some ways a circular argument?

  27. What we are trying to do is make real world sense of the data. This type of statistical analysis over the whole data set doesn’t provide any useful information and in fact is a waste of time because it obscures the patterns which are obvious to the eye i.e clear cut peaks ,trends and troughs in the data. To illustrate real world trends in this data for comparison with say various temperature proxy time series or the 10Be record simply run e.g. a 20 year moving average.

      • All data is cherry picked one way or another. The competence of the cherry picker ( i e whether the patterns exist or not ) can only be judged by the accuracy of the forecast made or the outcome of experiment.

      • Auroral records have very interesting correlations with Nile River levels. Old work.
        ============

    • One of the functions of statistics is to provide an objective test of some of the patterns we *perceive* to be in data. It has long been recognised that the human brain, like most animals has an uncanny ability to see patterns in things. Just because you can see a dogs face in a cloud does not mean there are flying dogs up there.

      One thing you certainly don’t want to do is start distorting your data with a crappy filter that puts peaks where there should be troughs and vice versa.

      • It is not immediately clear that statistics necessarily provide an ” objective ” test of anything in the real world. For example you can calculate the average or mean or probability of the range of outcomes of the climate models. The real world future temperature may well fall completely outside the model projections because all your models are wrongly structured.

  28. Willis a fine piece of mathematical work, truly adding something new which you have an enviable propensity for. I am, however, troubled by the ‘fewness’ of the n_eff, although I can see nothing wrong with your math. The trouble, it seems to me, is 7 or 11 from 663 data points is too few for any ‘representativeness’ (can’t think of the right word) and robust conclusion on whether there is a trend or not.

    1) 622CE to 1284CE does span from the Dark Ages cool period to the Medieval Warm Period and so a trend for the river (of one kind or another) should be expected on climatological grounds.

    2) Is it possible that the math on this particular data in some way gave a spurious result. Perhaps the “trend” was downwards from the Roman Warm Period to the Dark Ages and the data unfortunately begins at the extreme low. I wonder if Egyptians were more miserable at the beginning of the graph and got happier with more abundant waters in the MWP.

    3) If you were to divide the data in half and redid both haves, it would be interesting to see what the n_eff would be – maybe a dozen or more n_eff would occur in one or both segments.

    • Thanks, Gary. Like you I’m troubled by the “fewness” of neff, but I can find no errors in my math. Not only that, but the low neff is backed up by the analysis of the actual fractional gaussian pseudodata … not conclusive, but definitely supportive of the low numbers.

      Always more to learn …

      w.

      • Willis,

        “Like you I’m troubled by the “fewness” of neff, but I can find no errors in my math.”
        Me three.

        But there were “no error in the math” showing that the trend was highly significant. The question, in both cases, is whether the formulas are being applied to data for which they are truly applicable. We have a clear “no” for the standard trend analysis, but not for the n_eff.

        Terrific post.

      • Not only that, if we take Mann’s value of H from his paper referenced near the top of the thread (ie 0.870) and calculate n_eff, we get 12 data points since the year 850. That means that it is actually 95 years between data points and not 30 that is traditionally used. If you go with the 4 “modern data points” then its still 50 years.

        So if CO2 induced warming was to have started in around 1970 then we still dont even have a data point to measure it by!

    • “I am, however, troubled by the ‘fewness’ of the n_eff, although I can see nothing wrong with your math.”

      A bit surprised but I think it is correct. As a sanity check I plotted the ACF vs time lag for the Hadcrut4 monthly series, the sample-to-sample correlation persists for about 824 months. Thus in the 1976 months in the dataset I had handy there are only 2.40 independent samples. Mathematica estimates the Hurst exponent to be .9503 giving an effective n (by Willis’ formula) of 2.13. Within shouting range of the BOE method above.

      Removing the AR(1) component by first difference, the Hurst estimate falls to .166, indicative perhaps of a cyclical (anti-persistent) component and an AR component in the data.

      I found that he Hurst exponent estimate depends greatly on the assumed underlying process. Using a Wiener process instead of Gaussian results in a Hurst exponent estimate (by Mathematica’s unknown algorithm) of .213 where here h=.5 indicates a Wiener (random walk process).

  29. Sidestepping the math for a moment, here’s a very interesting quote from the nilometer article:

    They found that the frequency of El Ninos in the 1990s has been greatly exceeded in the past. In fact, the period from 700 to 1000 AD was a very active time for El Ninos, making our most recent decade or two of high El Nino frequency look comparatively mild.

    • They found that the frequency of El Ninos in the 1990s has been greatly exceeded in the past. In fact, the period from 700 to 1000 AD was a very active time for El Ninos, making our most recent decade or two of high El Nino frequency look comparatively mild.

      My thoughts are that El Nino’s are a response to warming, they’re a change-pump oscillator, if this is true their rate would be based on warming, more warming the more often they cycle.

      • Or maybe more solar activity. Sun warms oceans which leads to more El Nino events which then leads to more atmospheric warming.

      • I gave no explanation to the source of the warming, but I do not believe the oceans are warming from the atmosphere, the atmosphere is warming from the oceans.

  30. In economics this is called picking stocks : ) Chart readers look for signs like a head and shoulders configuration or a double dip and if it isn’t a dead cat bounce it is time to buy or sell, depending on whether it is a full moon or not.

    Stocks are random, the weather is random, the climate is random, except when they aren’t : ) Then the fix is in.

  31. n_{eff}=n^{2-2H}
    =============
    Willis, there may be a problem with this in that it is not symmetric around H=0.5.

    At H=1 you get n_{eff}=n^{2-2H} = n_{eff}=n^{2-2} = 1

    Which means that when H=1, your effective sample size is always 1, regardless of haw many samples you have. This may well be correct, as the entire sample is perfectly auto-correlated, telling us that all samples are determined by the 1st sample.

    However, what bothers me is H=0

    At H=0 you get n_{eff}=n^{2-0} = n_{eff}=n^{2} = n^2

    I’m having a very hard time seeing how negative auto-correlation can increase the effective sample size beyond the sample size. I could well be wrong, but intuitively H=0 is exactly the same as H=1. The first sample will tell you everything about all subsequent samples.

    So while I think you are very close, it seems to me that you still need to look at 0<=H<0.5.

    • The original reference was restricted to H>=0.5, for negatively correlated values I suspect replacing 2-2H with 1-abs(2H-1) will do the trick.

      • 1-abs(2H-1) will do the trick
        ============
        yes. though it makes me wonder if perhaps the exponent is non-linear. Instead of a V shape, more of a U shape?

        I would like to see more samples. there is a small difference in the N=11 plot that suggest in this case it might be slightly low. or the small difference may be due to chance.

        however, the simplicity of the method is certainly appealing. It certainly points out that as H approaches 0 or 1, you cannot rely on sample size to calculate expected error. Instead you should be very wary of reported trends in data where H is not close to 0.5.

        Especially the H=0.94 for HadCRUT4 global average surface temperature. This is telling you that pretty much all the data is determined by the data that came before it. As such, the missing data before 1850 likely has a bigger effect on today’s temperature than all current forcings combined.

    • Fred , it’s good to check things by looking at the extreme cases. Here I think H=1 matches 100% autocorrelation with no die off and no random contribution: ie xn+1=xn

      H=0.965 means a small random variation from that total autocorrelation.

      AR1 series are of the form:
      xn+1=xn+α. * rand()

      There is an implicit exponential decay in that formula. A pulse in any one datum gets multiplied by α at each step and fades exponentially with time. This is also what happens in a system with a relaxation to equilibrium feedback, like the linearised Planck feedback.

      What this Hurst model of the data represents is a random input driving dT/dt ( in the case of mean temps ) with a negative feedback. That seems like a very good model on which to build a null hypothesis for climate variation. In fact it’s so obvious, I’m surprised we have not seen it before.

      OK tell me about three decades of wasted effort trying to prove a foregone conclusion instead of doing research. :(

      I’ll have to chase through the maths a bit but I suspect H is the same thing as alpha in such a mode.

      α=0 gives you H=1.

      • AR1 series are of the form:
        xn+1=xn+α. * rand()
        ==============
        something like?

        for H = 1
        xn+1=1 * xn + 0 * a * rand()

        for H = 0.5,
        xn+1= 0 * xn + 1 * α. * rand()

        for H = 0
        xn+1= -1 * xn + 0 * a * rand()

        solve:
        H=1 (1,0)
        H=0.5 (0,1)
        H=0 (-1,0)

  32. Willis, I think I’ve found the problem. Looking at figure 7 we find this statement:

    “where H is a constant between 0.5 and 1.”

    It appears the problem is that the equation from the Koutsoyiannis paper is only valid for H >= 0.5. Which explains why your formula doesn’t appear to work for H < 0.5

    • Data: (For definitions and equations see the methods section of Foster and Rahmstorf, 2011)
      http://iopscience.iop.org/1748-9326/6/4/044022/pdf/1748-9326_6_4_044022.pdf
      Hence for each data series, we used residuals from the
      models covering the satellite era to compute the Yule–Walker
      estimates ρˆ1, ρˆ2 of the first two autocorrelations. Then we
      estimate the decay rate of autocorrelation as
      φˆ = ρˆ2/ρˆ1 . (A.7)
      Finally, inserting ρˆ1 and φˆ into equation (A.6) enables us to
      estimate the actual standard error of the trend rates from the
      regression.
      ============
      It is unclear to me why only the satellite era residuals were used. It seems like they failed to account for the auto-correlation in the satellite data, but rather calculated the error of the surface based readings under the assumption that the satellites have zero error?

      • It’s worse than you think. Note that they are using the residuals calculated from the MODELS, not actual data.

    • http://www.tandfonline.com/doi/pdf/10.1080/02626660209492961

      “Hurst phenomenon has been verified in several environmental quantities, such as …
      global mean temperatures (Bloomfield, 1992) … but rather they use AR, MA and ARMA models, which cannot reproduce the Hurst phenomenon”

      Dr. Kevin Cowtan site is apparently based on ARMA(1, 1), (Foster and Rahmstorf 2011) (see Appendix. Methods) which suggest that the result does not account for the Hurst phenomenon

      • Statistics says that you can reduce the measuring error by increasing the number of independent measurements. Here, statistical methods are discussed to determine the number of independent measurements. I prefer a discussion of the data quality. The HADCRUT4 data are not measurements but a calculated product: the monthly temperature anomaly. Now calculate the 60 yr trend (1955-2015) for the anomaly in °C/decade. Monthly : 0.124+-0.003; annual 0.124+-0.01; running annual mean 0.123+-0.002. The same procedure with the monthly temperature: 0.114+-0.03; 0.124+-0.01; 0.123+-0.003. Use your common sense to find out why the errors are different and what is the best error estimate. Unfortunately, this is not the total error because systematic errors (i.e. coverage error, change of measurement methods) are not taken into account.

  33. Seems to me that when the effective N is very different from the actual N that we are sampling faster than the natural time scale of the driver of the correlated variability. So if you divide the N=663 by the 11 that seems appropriate, the driver of the variability has about a 60 year time scale. The same might be true of Hadcrut4, where you could only sandwich in 3 complete cycles with 4 point. This is almost down at the noise level for Hadcrut4.

    Now, just where might one find some climate variable with a 60ish year period?

    • The 60 yr period is approximately the average life expectancy of mankind. (You can also use a longer life span.) If the global temperature strongly increases during this life span (for instance 5°C) then many people will die, although in the long end mankind will survive. I think the difference between mathematics and physics is that physics tries to describe the real world while mathematics defines its world.

  34. Cohn, T. A. and Lins, H.F., 2005, Nature’s style: Naturally trendy, Geophysical Research Letters, 32(L23402), doi:10.1029/2005GL024476 discussed very similar data sets and may be of interest. One of their conclusions:

    From a practical standpoint, however, it may be preferable to acknowledge that the concept of statistical significance is meaningless when discussing poorly understood systems.

    • I just re-read Cohn and Lins and was greatly troubled by that statement … it seems like just throwing up your hands and saying “Nope, too hard”.

      In addition, if my results on the small values for neff are correct, the problem seems to be that the concept of statistical significance is meaningless when improperly applied even to well understood systems.

      w.

      • True !! When dealing with time series selected from multiple quasi independent variables in complex systems such as climate, the knowledge , insight and understanding come before the statistical analysis. The researcher chooses the windows for analysis to best illustrate his hypotheses and theories. Whether they are ” true” or not can only be judged by comparing forecast outcomes against future data. A calculation of statistical significance doesn’t actually add to our knowledge but might well produce an unwarranted confidence in our forecasts.

      • Since significance is necessarily in relation to a supposed statistical model assumed for that system, that is certainly the case.

        This is what Doug Keenan was arguing about with Hadley Centre. It is also at the heart of Nic Lewis’ discussions about having the correct ‘prior’ in Bayesian analysis.

        Cohn and Lins are probably correct in the strictest sense but “poorly understood” is very vague. Clearly surface temp is not a coin toss random process, and I think this article a great help in applying a more suitable model.

      • But that is still absolutely true, Willis. Sometimes things are just too hard, until they aren’t. I assume you’ve read Taleb’s “The Black Swan” (if not, you should). One of the most dangerous things in the world is to assume “normal (in the statistical sense) behavior as a given in complex systems. IMO, the whole point of Koutsoyiannis’ work is that we can’t make much statistical sense out of data without a sufficient understanding of the underlying process being measured.

        This difficulty goes all the way down to basic probability theorem, where it is expressed as the difficulty of defending any given set of priors in a Bayesian computation of some joint/conditional probability distribution. One gets completely different answers depending on how one interprets a data set. My favorite (and simplest) example is how one best estimates the probability of pulling a black marble out of an urn given a set of presumably iid data (a set of data representing marbles pulled from the urn). The best answer depends on many, many assumptions, and in the end one has to recompute posterior probabilities for the assumptions in the event that one’s best answer turns out not to work as one continues to pull marbles from the urn. For example, we might assume a certain number of colors in the urn in the first place. We might assume that the urn is drawn from a set of urns uniformly populated with all possible probabilities. We might assume that there are just two colors (or black and not-black) and that they have equal a priori probability. Bayesian reasoning is quite marvelous in this regard.

        Taleb illustrates this with his Joe the Cabdriver example. Somebody with a strong prior belief in unbiased coins might stick to their guns in the face of a long string of coin flips that all turn up heads and conclude that the particular series observed is unlikely, but that is a mug’s game according to Joe’s common wisdom. It is sometimes amusing to count the mug’s game incidence in climate science by this criterion, but that’s another story…

        It is also one of the things pointed out in Briggs’ lovely articles on the abuse of timeseries in climate science (really by both sides). This is what Koutsoyiannis really makes clear. Analyzing a timeseries on an interval much smaller than the characteristic time(s) of important secular variations is a complete — and I do mean complete — waste of time. Unless and until the timescale of secular variations is known (which basically means that you understand the problem) you won’t even know how long a timeseries you need to be ABLE to analyze it to extract useful statistical information.

        Again, this problem exists in abundance in both physics and general functional analysis. It is at the heart of the difficulty of solving optimization problems in high dimensional spaces, for example. Suppose you were looking for the highest spot on Earth, but you couldn’t see things at a distance. All you could do is pick a latitude/longitude and get the radius of that particular point on the Earth’s surface from its center of mass. It would take you a long, long time to find the tippy top of Mt. Everest (oh, wait, that isn’t the highest point by this criterion, I don’t think — I don’t know what is!) One has to sample the global space extensively before one even has an idea of where one MIGHT refine the search and get A very high point, and one’s best systematic efforts can easily be defeated if the highest point is the top of a very narrow, very high mountain that grows out of a very low plain.

        So often we make some assumptions, muddle along, and then make a discovery that invalidates our previous thinking. Developing our best statistical understanding is a process of self-consistent discovery. If you like, we think we do understand a system when our statistical analysis of the system (consistently) works. The understanding could still be wrong, but it is more likely to be right than one that leads to a statistical picture inconsistent with the data, and we conclude that it is wrong as inconsistency emerges.

        This is all really interesting stuff, actually. How, exactly, do we “know” things? How can we determine when a pattern is or isn’t random (because even random things exhibit patterns, they just are randomly distributed patterns!)? How can we determine when a belief we hold is “false”? Or “true”? These are mathematically hard questions, with “soft” answers.

        rgb

  35. A P value of 1.42e-11 corresponds to a probability of about 1 in 70 billion, not 1 in 142 trillion.

  36. Hmmmm. Would this not also be the case for IPCC model runs? Would they not be just a silly series of autocorrelated insignificant runs…unless specifically made to fall outside this (elegantly demonstrated in the above post) metric?

  37. Willis:
    I went to the site to retrieve the data. There is another time series there which indicates that they time series analysed is not the original time series but is a logarithmic transformation. Can you clarify this for me.
    I also noticed that the serial correlation extends out to 60 lags.

  38. Willis: “Best regards to all, it’s ten to one in the morning, full moon is tomorrow, I’m going outside for some moon viewing …”

    Tonight get there at dusk and see a bright Venus very close to Jupiter. ;)

  39. Damn. W., you have made me think, and by surprise… Very well done.

    I’m now wondering what the implications are for Neff when data points are homogenized… That MUST increase the degree of auto-corelation and that MUST make any trends more bogus.

    GAK! I need more coffee…

    ;-)

  40. I looked at the graph presented by Mr. Eschenbach and the results of his linear least squares analysis.

    First, I would like to agree with the comments made by Owen bin GA, July 1, 5:49 am and by harrydhuffman, July 1, at 6:55 am in their responses to the presentation by Mr. Eschenbach.

    Second, I offer my reaction as follows:

    1. Looking at the graph and removing the red line, I see no visual clear trend. We have “augmented” Y axis to make us maybe think so that there may be a trend, but in 600 years the increase is 10%. So I would not even think of linear regression when observing the huge variation.

    2. I looked at the linear regression results. High P value. Fine, so it is “statistically significant”

    3. I looked at the R squared (R2) value obtained and listed in the results with the P value. Something that Mr. Eschenbach did not consider. This value is 0.066. Now, R2 values will vary between 0 and 1 (some authors will use 0 and 100% instead of 0 and 1, fine, obviously the closer to 1 or 100% the better). Why ignore this value? There is no “accepted” value for how low R2 must be before we dismiss the linear regression analysis as important (as will be explained in the Minitab blog below) but 0.066 is so low, I would simply dismiss any important linear regression for this data set.

    4. To have a statistically significant P value and an extremely low R2 value is nothing new.

    5. To understand what this mean and how this can happen, I am attaching perfect examples from a very competent statistician at Minitab. This is a very widely used statistical analysis software and the explanation of what P and R2 measure is perfect. You cannot take P only and ignore R2.

    6. The examples he gives, two separate graphs, with explanations given are simply classic and what to do with a high P value and extremely low R2 value before accepting that any trend is IMPORTANT and can PREDICT anything, or that linear regression can EVEN be used.

    7. After looking at the presentation, if you have time, start reading the blog. You will see a question by Mira Ad. She is having some problem with a time series of temperature anomalies! Interesting.

    Minitab blog Regression Analysis

    http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-a-regression-model-with-low-r-squared-and-low-p-values

    • If you download the data into Excel, adding a linear trendline will give the same equation that Willis posts above. However, if you choose a polynomial of order 4, you get a much better fit. Statistics only summarize thing. They do not add any new information. Chances are that if what you claim to be there is not obvious it is an artifact of the statistical analysis. What Willis shows in his excellent analysis is that in the case of autocorrelation, the number of degrees of freedom is substantially reduced. The normal confidence limits you assume from a normal distribution do not apply. Recall, that t test and F test apply to small samples. When you have a large number of samples you effectively are using a normal distribution.

      • Yes, any software using linear least squares regression analysis will produce the same results presented by Willis. The problem is that he ignored the R2 value in drawing the conclusion. If the 95% prediction interval (not confidence interval) is plotted this interval would be huge because the R2 value is so low.This is what the statistician from Minitab is demonstrating with the two graphs he presented.
        If you remove the red line from the graph and you visually inspect the data, what you see is 3 “waves” with a minimum at 800, a minimum at 1000 and one at 1200. So indeed your polynomial of order 4 will yield a much better description of the data.

  41. Willis, thank you very much for the excellent post and your reference to my work. I confirm your result on the effective sample size — see also equation (6) in http://www.itia.ntua.gr/en/docinfo/781/

    I also thank all discussers for their kind comments and Marcel Crok for notifying me about the post. I wish I had time to comment on the several issues/questions put. But you may understand that here in Greece the political/economic situation is quite critical and thus we cannot concentrate right now on important scientific issues like this. Currently we have to struggle to keep our functions alive; you may perhaps imagine (or not?) how difficult it is. So, sorry for not participating more actively… I can only provide some additional references which may shed some light to issues discussed here:

    http://www.itia.ntua.gr/en/docinfo/537/

    http://www.itia.ntua.gr/en/docinfo/1001/

    http://www.itia.ntua.gr/en/docinfo/1351/

    (my collection related to Hurst-Kolmogorov is in http://www.itia.ntua.gr/en/documents/?authors=koutsoyiannis&tags=Hurst )

    I also confirm that H < 0.5 (antipersistence) results in higher effective length; you may understand the reasons thinking of a trivial example of a simple harmonic as indicated in Fig. 8 in http://www.itia.ntua.gr/en/docinfo/1297/

    • Thanks for your contribution here on this question generally.

      I’ve been following the last 6mth of “negotiations” closely. I wish your country all the best in shaking off the mantle of perpetual economic bondage and externally enforced asset stripping of the nation’s wealth. Clearly a large amount of what is laughably described as ‘aid’ has not gone to help the greek economy but to bail out private banks and investors in rest of Europe leaving Greece saddled with a level of debt from which it could never recover. That debt must be restructured. Something that is only just beginning to be accepted by the international loan sharks.

      It’s going to be difficult whatever way it goes. I wish you all, the courage and imagination to create a viable future for your country and people.

    • Ah, so my suggestion previously of 1-abs(2H-1) to produce symmetry about 0.5 is quite wrong. Thank you for the clarification. Please accept my best wishes and hopes for you and yours in the terrible situation in Greece. When government battle, non-combatant civilians always seem to be the first to suffer.

    • The use of statistics to determine cause of variance in a chaotic random walk system is a problem that on the face of it screams impossibility. At least to me. Yet several have made the attempt and here is the one often sited.

      http://go.owu.edu/~chjackso/Climate/papers/Crowley_2000_Causes%20of%20Climate%20Change%20Over%20the%20Past%201000%20Years.pdf

      I have all kinds of issues with this kind of study. But the most egregious error is the use of models as the definitive “sieve”. At any point in time along the 2000 year time line, a random trend could have appeared in the “data” that in reality, was wholly unrelated to the perceived cause, and the models themselves were forced to “trend”, ergo the conclusions.

      This kind of study drives me nuts!

    • Thank you for commenting, Dr. Koutsoyiannis. If you have time, I’d like to hear more about your situation there.

  42. Hopefully I have got this wrong, because otherwise is a shame that Mann M. never thought of this way of reasoning and method of argument, as he could easily have proved to any one that he was always right when claimed and implied that there has not been any natural climate change in the last 3K years, or any climate change at all till man started to ride cars instead of donkeys..

    Seems not very hard to prove that in this particular way……but maybe I just missed the main point of Willis generalism……….sorry if that the case.

    cheers

  43. While the estimation framework developed by Kolmogoroff in the 1930s and subsequently popularized by Hurst is a step in the right direction, it’s entirely restricted to diffusion-like “red noise” processes with monotonically decaying acfs and power spectra. It fails to characterize far-more-distinctly oscillatory processes encountered in geophysics, such as temperature, whose power densities have strong peaks and valleys. The upshot is that the confidence intervals for ostensible “trends” in records of a century or less are far wider than anything shown here. Until climate science comes to grips with the spectral structure of geophysical signals, it will continue to flounder in antiquated delusions about what constitutes a secular trend.

    • Thank you, 1sky1. Your comment has the ring of truth explains a lot of what others here (me included) have been puzzling over.

  44. H is defined from 1-2H being the frequency dependence of the data. ie white ( or gaussian ) noise that has a flat frequency spectrum has 1-2H=0 or h=0.5. Integrate that and you have red noise which a 1/f frequency spectrum. 1-2H=-1 or H=1. “Fractional” noise lies somewhere in between.

    So data with H close to 1 represent processes where it is the rate of change that is random, not the time series itself.

    • Then, how should we interpret the time series data presented by Mr. Eschenbach. There is a statistically significant increase? Or maybe there is nothing to conclude about a trend being present?

  45. You start with a question; I will start with a question: why would anybody ask if there is a linear trend to the Nile data? Apart from an example of what not to do of course. Continuing on that line, what does it say that the Nile flow would be n thousand years in the future? Something absurd, nobody can believe that Nile flow is a linear function of time with some randomness added, so why bother calculating it? The variations are far more interesting and more useful for explanation. There is a lot of short term change, yet there is also a lot of longer term stability. So maybe something happened to the weather and climate in that watershed, and maybe correlations could be found with other variables that might tell us something useful. Other factors besides time. Those factors appear here as autocorrelations, but that’s not so useful.

    Great discussion.

  46. Can this be explained by dominoes? As H goes to 1 N goes to 1. That would be like setting up one of these domino endeavors you can witness on U-tube. There are thousands of Dominoes but they are all arranged to be auto-correlated. If the first one falls all of them will fall in succession so there is no randomness in the process… thousands of data points but none of them independent. Pushing the fist one over is very similar to pushing all of them over. That would be an effective N of 1.

  47. That is what is needed. A simple way to explain the extremes. A coin flip is an excellent way to explain completely independent data of size N. And dominoes may be a simple way to explain the opposite end of the spectrum… and effective N of 1 no matter how big the data set.

    • Scott, my interpretation of N=1, was taking an average on a single value (this does imply I’m even in the right universe ). With Neff=2 enough to draw a line.

  48. “..not enough to say that it is more than just a 663-year random fluctuation of the Nile”.

    Isn’t that still a trend?

  49. Demetris Koutsoyiannis July 1, 2015 at 2:04 pm

    Willis, thank you very much for the excellent post and your reference to my work. I confirm your result on the effective sample size — see also equation (6) in http://www.itia.ntua.gr/en/docinfo/781/

    Demetris, thank you for your kind comments. I am not surprised at all that your Equation (6) is the same as my Equation (1) above, and that you had plowed this ground before I arrived. You’re the man in this arena. I’m always happy to find out that things I came up with on my own have been anticipated by others—it proves that I’m on the right track and that I do understand the issues.

    I also thank all discussers for their kind comments and Marcel Crok for notifying me about the post. I wish I had time to comment on the several issues/questions put. But you may understand that here in Greece the political/economic situation is quite critical and thus we cannot concentrate right now on important scientific issues like this. Currently we have to struggle to keep our functions alive; you may perhaps imagine (or not?) how difficult it is. So, sorry for not participating more actively…

    My friend, I can’t tell you how much I’ve learned in the last decade or so from your work. I have recommended it many times here on WUWT. Whether you contribute here on this thread or not, your contributions to my understanding are and continue to be great. As little or as much as you wish to add to my poor efforts is much appreciated.

    As to Greece and your struggles, I’m sure that I speak for many here when I say that I wish you well in what are very difficult conditions.

    You’ll likely enjoy what will likely be my next post, regarding an easy way to calculate the statistical significance of trends in situations with a small number of effectively independent data points.

    Stay well, I wish you safety in parlous times …

    In friendship,

    w.

  50. At a guess, this may be a non-normality issue. Is Hadcrut actually compatible with a fractional gaussian process? It is only gaussian processes where means and covariances tell you everything about the probabilities. It is not true for other random processes and the Hurst exponent seems unlikely to have an interpretation in terms of independence in that case. This general point is often illustrated by showing that any Hurst exponent can (more or less) be reproduced by a suitable Markov process (actually diffusion). Markov processes only remember the immediate past, while large values of H for a fractional Gaussian process are associated with long memory/persistence. In other words, once the assumed random dynamics does not apply, the associated interpretation of H does not apply.

  51. Hi Willis,
    I’m enormously glad that you enjoyed the paper(s) of Koutsoyiannis. This is the most ignored work in climate science, and it transforms the perpetual computation of short time linear trends into the utter statistical bullshit that it is.

    The problem is that even your estimate of 11 independent samples in 700 odd years may be excessive. We won’t know until we have data that is even longer, because the number of independent samples depends on the longest — not the shortest the longest — timescales on which secular or random variation occurs. We cannot detect the longest timescales with only a dataset on the same order of number of samples any more than a fourier transform can pick up long period fourier components from a short time series. Indeed, if we discussed Laplace transforms instead of FTs, we’d actually be discussing the same phenomenon. It’s all a matter of convolution.

    rgb

    • The bad news is that the best we can hope for in mainstream climate science is bullshit. The good news is that there’s plenty of it and it’s peer-reviewed. .

  52. Hurst’s work has been well-known to investment analysts for many years. Mandelbrot explained how he first came across this power law in 1963 while he was teaching at Harvard. There is an interesting chapter in his book “The (Mis)Behaviour of Markets” devoted to Hurst. In it he mentions Edgar E Peters who reported on the Hurst exponents that he found in the performance of leading shares in 1991. Independently a Scottish actuary, Robert Clarkson, used the Hurst exponent in several papers and seminars in the 1980s and 1990s to rubbish the then fashionable and now discredited “modern portfolio theory”.

  53. Willis,

    Perhaps I missed this, but have you tried creating an artificial data set, such as a straight line plus Gaussian noise, and computing n_eff to see if you get the correct value? My concern is that a pattern in the data, such as a trend or a sign wave, will introduce correlation, which will reduce n_ff.

    I will try it myself, once I figure out just how to do the calculations.

    • Willis,

      I have tentatively come to the conclusion that your method of analysis is fundamentally mistaken. I created a series of random numbers and analyzed them using the method of Koutsoyiannis. I got a Hurst exponent close to 0.5; so far so good. Then I added linear trends of increasing magnitude. Once the trend was large enough to be visible in a scatter plot, two things started to happen: the log-log plot of sigma vs. n started to curve and the fit value of the exponent started to increase towards unity. But clearly, from the way the data were constructed, we MUST have n_eff = n (of course, numerical analysis of a finite data set might give a bit of variation).

      A little reflection reveals the cause of the problem. Plots as in Figure 5 of Koutsoyiannis are testing whether the variance of the data is behaving as uncorrelated random (white) noise, with exponents near 0.5 indicating uncorrelated noise and larger values indicating something else. But the variance in my test comes from two sources: the noise and the linear trend. So once the trend is significant, the variance no longer behaves as uncorrelated random noise. But that does not mean that the noise is correlated, only that there are correlations in the data.

      The same thing will happen with any data containing a signal: whether linear, exponential, sinusoidal, etc. And you can’t detrend the data unless you know just what type of signal must first be removed.

      Cohn and Lins may not have been pessimistic. At best, what you might be able to do is to use this method to test residuals for uncorrelated random behavior.

      • Hi Mike M.

        I think Willis has done this work, as reported in the head-post, using fractional gaussian pseudo data. But obviously, Willis’s response would be more credible and informative than anything I can provide.

        But on another note, while the low N-eff for climate data is clearly interesting, the real impact is the extent that this finding significantly increases the probabilistically determined uncertainty of global temperature data series extracted from the measurement record … and possibly the resulting contrast with those uncertainty bounds calculated and forecast by the climate models. This could lead to another quantitative and informative constrast between data and model.

        I read the Mann paper referenced early on in the comment above; he seemed to report credible work but with dismissive conclusion. I was surprised no on commented on this … I wonder how Willis reacted.

        So Willis, we await your next edition. . . wherever it takes you, and us!

        Thanks

        Dan

      • Dan,

        I think the pseudo-data had no actual trend, only spurious trends generated by random chance.

        I think that the real impact can not be decided until we know just what this can tell us. That is probably much less than most people (me included) would have guessed.

        I can tell you why I have not commented on the Mann paper: Haven’t got around to it.

      • Hi Mike –

        I really can’t comment about fractional gaussian pseudo data and associated trends and their impact, because I really don’t know what process(es) or code Willis used. I also acknowledge that my interest in the outcome of the results of the “Hurst Phenomenon” is biased by my disbelief that climatic global temperature uncertainty is inversely proportional to n^ (-0.5).

        Yet, on the other hand, I struggle about what the Hurst Phenomenon means and how the associated representation of the past bears on our ability to predict the future. Yes, it yields an attractive perspective for uncertainty calculation for the past … and maybe the future. But how do we know?

        Maybe if an honest broker(s) of climate modeling could truly understand, represent and implement the climatic shocks (associated with PDO, AMO, El-Nino, volcanic eruptions, solar irradiance etc.) in a series of stochastic predictions, the climate science community (on both sides) could more deeply differentiate between natural variation, measurement error, and exogenous influences such as aerosols, the sun, and GHGs. Maybe then we could begin to parse the key drivers of climate change (natural, anthropogenic, and erroneous) and then begin to understand what (if anything) is important for future policy.

        OH, shake me, oh wake me up …. I’m dreaming non-sense … or maybe I’m still just dreaming!!

        Best to all … I actually wonder about this stuff …. please help me to understand better …. that all I ask!!

        Thanks

        Dan

      • Mike writes ” But clearly, from the way the data were constructed, we MUST have n_eff = n ”

        n_eff = 2 describes a linear trend.

  54. This post gets my nomination for one of the ” Posts of 2015 ” here at ” SUP ” – whatS UP with that . There is so much to learn and then consider and also so many on topic responses that further dive into the subject, rather than the degeneration into an endless back and forth between two or three posters – going on ad nausea m .

    Thanks for a great post ! Perhaps I should get busy and try to compile a master table of all Climate data sets, by name, and add information as to N, calculated by who, when and ” notes ” ? The ” notes ” field could be a ref to a set of equations showing ” how ” N was derived by that author.

  55. Willis,
    I’ve seen many charts where lines have been added arbitrarily to plot various temperature trends.

    Would it be possible to automatically locate the approximate location of each “effective n” point? And it’s value? And then automatically draw the trend lines?

    Being a software developer, but not a mathematician I thought: could you iterate your “effective N” calculation using an increasing start date, or decreasing ending date, and note when the “effective N “changes? I have no idea how sharp the change would be in this circumstance, but it might be interesting.

    I very much enjoyed your article, very understandable. Thank you.

  56. Looking to R as you would expect there is a package to estimate the Hurst exponents. In the praam package the hurstexp function. I ran the time series of the nile levels from the source in the article and got the following output.

    Simple R/S Hurst estimation: 0.7374908
    Corrected R over S Hurst exponent: 0.8877882
    Empirical Hurst exponent: 0.7302425
    Corrected empirical Hurst exponent: 0.733243
    Theoretical Hurst exponent: 0.5300408
    Warning message:
    In matrix(x, n, m) :
    data length [620] is not a sub-multiple or multiple of the number of rows [311]

    Some variation in the exponent. I would have to read a lot more before knowing what they mean and the warning message may mean something was not correct in my dataset copying or some other issue.

  57. Willis,
    A truly superb post, thanks to you and the replies that follow, I believe I now have a much better understanding of autocorrelation.
    In particular your equation n_eff = n^(2-2H) is very instructive. When H = 0.5 (no correlation) the equation reduces to n_eff = n (because n^1 = n) while in the case of complete correlation, when H = 1, the equation reduces to n_eff = 1 (because n^0 = 1)
    So in the extreme case of H = 1, where the data series is totally correlated, it follows that with one known data point the value for all data in the series can be predicted. Similarly with H = 0.5 (no correlation) it is impossible to predict the next value (or any other value) in a time series of data points. This then leads me to wonder about scale invariance. With values of H close to 0.5 is it correct to assume that the autocorrelation distance is small? Data further away in the time series has a value that is independent of distant past data points. While for values of H that approach 1 the degree of scale invariant self-similarity (i.e. the fractal nature of the data) increases?
    Just asking.

  58. I thank Willis for an intriguing and important post.

    Verification:
    I have independently verified some of Willis’ results. Although Willis say “About 15% of the pseudodata instances have trends larger than that of the nilometer” my computations put the figure closer to 9.8%. Nevertheless, his histogram (Figure 5) compares well to my own, which you may find here: https://www.dropbox.com/s/dgpvt1pxkhonzfg/histogram%20for%20WUWT.png?dl=0. Only about 5% of my simulations yield a trend of 1.52 or more, so the actual Nilometer trend of 1.197 is not unusual.

    Transparency:
    True science requires enough transparency to allow examination/replication by others. My verification above is in a Mathematica notebook file here: https://www.dropbox.com/s/60dwtv5ekbsbhpw/Nilometer%20for%20WUWT.nb?dl=0.

    Hypothesis:
    Willis notes “you can see how much more “trendy” this kind of data is.” I have a vague suggestion why. I suggest that the “climate” is a collection of (perhaps loosely) coupled chaotic parts. Each part behaves “consistently” for a while, then “suddenly” switches to a different “consistent” behavior (i.e., change of attractor). Such a change in one part will act as a “forcing” on some other parts, which may (a) thereby trend in some direction for a while seeking a new “rough equilibrium” or (b) also change their attractor. Looking at any such chaotic part will be trendy-but-random, just what we often see in high-Hurst data.

    Question:
    What page in Geomagnetism does Willis’ Figure 6 come from?

    Interesting:
    The Koutsoyiannis paper notes “the assertion of the National Research Council (1991) that climate ‘changes irregularly, for unknown reasons, on all timescales’”.

    Another Resource:
    I found Ian L. Kaplan’s “Estimating the Hurst Exponent” enlightening: http://www.bearcave.com/misl/misl_tech/wavelets/hurst/.

    • Thanks, Needlefactory. I greatly appreciate folks who run the numbers themselves, it’s how I’ve gotten as far as I have. Until I can actually do to the math myself , I don’t feel like I understand what is going on.

      Further notes:

      The Figure 6 comes from p. 584, as I mentioned at the start of the head post.

      Thanks also for the Mathematica notebook file and the Kaplan paper, I’ll take a look as soon as I get time.

      I’ve used the wavelet method, but I prefer the new method that Koutsoyiannis proposes, the log/log analysis of the rate of decline of the standard error of the mean. I like it in particular because it provides the connection between the Hurst Exponent and the number of degrees of freedom. This does not exist with e.g the wavelet analysis method.

      w.

    • I found a typo (662 for 622) in my Mathematica Notebook. This caused simulations to be 40 years too short, with mild changes to some statistics:
      (1) Probability of a trend ≥ nilometer trend is ~8.22% (Willis computes ~15%).
      (2) 5% of my simulations result in a trend of more than 1.414 cm/decade.
      Differences between Willis and me likely result from different codes for fractional Gaussian noise.

      Nevertheless, the main conclusion is unchanged: the historic trend is neither unusual nor significant.

      The revised Notebook has been uploaded; the link remains the same as above. Those without Mathematica may review the document (both code and output) with the free CDF player at https://www.wolfram.com/cdf-player/

  59. Its a bit like saying a square peg doesn’t fit a round hole because it has four corners. It is obvious from the HadCrut4 data that it fits a function like asin60/y.t + ct much better than a linear trend. I suspect that taking the first term from the data leaves you with something where H is close to 0.5.

    The real question is how certain can you be that something changed after 1950 (after the question of whether HadCRUT4 is worth analysing is addressed) as temperatures around the globe are likely to have warmed since the LIA. Does a larger value of c fit the data better after 1950? Is it significantly larger? Does the data fit the hypothesis that a ten-fold increase in use of fossil fuels warmed the globe (fits better if the second term is t^2).

    • The problem is that one can fit the data perfectly well with a lovely function that works excellently and still have the fit mean nothing at all! You really do need to the link(s) and read at least the first hydrodynamics paper I referenced. It contains a figure that makes the point better than I can ever make it with words, but I’ll try.

      If, in complete ignorance, you fit the temperature outside your front door from midnight to 6 am (when the temperature outside falls by maybe 10 C) to an assumed linear trend, you might be tempted to conclude that the entire planet will “catastrophically” reach 0 degrees absolute in less than a month. If you fit the temperature from midnight to noon (let’s imagine a warming of 5 C) and assume a linear trend, you might conclude that the Earth will become Venus in a matter of months. Only if you look at the temperature from midnight to midnight, and average over many days, can you discern from the temperature data only a systematic diurnal trend and infer, without ever seeing it, “the sun”.

      If you now take advantage of your knowledge of a diurnal variation to average over it and plot the temperatures day to day, if you start in January and run to June or August, you will be tempted to conclude (again) the Earth is warming dangerously and unless immediate action is taken heat will kill everybody on the planet in at most a few years. If you started in July and ran to December, you’d conclude the opposite. Only when you analyze data over many years can you clearly resolve a systematic annual trend on top of the diurnal trend in the data and deduce (without seeing it) the inclination of the planet and/or eccentricity in its orbit or some unknown cause with an appropriate period.

      OK so you know about the seasons. But when you plot the average temperature (however you define it) for a year and compare year to year, you will find (again) many periods where if you only look at 4 or 5 years, there is a strong linear trend going up or down. If you look at decadal trends, they can still be going up, going down, or in between. If you look at century scale trends, some centuries it appears to warm, some it cools. If you look at thousand year timescales, you find that the thousand year mean temperatures vary substantially and appear to “trend” for many thousands of years at a time. If you look at million year timescales, you see patterns within patterns within patterns, and some — most — of the patterns you see cannot be reliably predicted or hindcast. Sometimes we can propose the moral equivalent of the sun and rotating earth or the revolving tipped earth to explain some part of some of those patterns, but in general these proposed patterns are no longer consistent and systematic — they appear and seem to hold for a few cycles and then disappear, and maybe reappear later, only a bit different. In particular, they don’t appear to be pure Fourier components indicating a truly periodic “cause”, and in most cases we cannot point to something as simple as the sun and the Earth’s rotation and say “this is the cause”.

      How can you identify a change in the pattern due to a single variation in some underlying variable under these conditions? You cannot predict what the pattern should be without the variation. How can you predict it with it? You might see a linear trend, but how can you tell if the linear trend is random noise (which sometimes will generate a linear trend over at least some intervals) or the rising part of a sinusoid associated with a day, or the rising part of a sinusoid of sinusoids associated with “spring”, or with some other sinusoid such as the possible 67 year variation in global temperature — “possible” because maybe not, maybe it is just an accident, maybe it is part of some long time scale noise or a quasi-stable pattern that is just about to break up and reform in some entirely different way?

      The answer is that you can’t, not without the ability to independently see the sun and the earth’s rotation, not without an independent theory of radiation and heat transfer to explain the correlation in the data in a way consistent with physical causality. And in sufficiently complex or chaotic systems, it may well be that you just plain can’t. Not everything is reducible to simple patterns.

      I will conclude by restating something I’ve said many times on this list. There is some point in fitting a physically motivated model to data. There is much less of a point in fitting arbitrary functions to data (almost no point, but I will refrain from saying no point at all absolutely, as one often has to bootstrap an understanding from little more than this). A linear model is pretty much arbitrary, and in the case of climate data where we know we have secular variations on multiple timescales, fitting linear functions on a tiny chord of temporal data (compared to the longer known scales) is a waste of time. It is as silly as fitting morning time temperatures and extrapolating the rise six months into the future.

      rgb

      • Using your analogy. If someone shows you that the temperature today went up 10 deg today from 6am to noon and screamed Armageddon because we lit a fire, then even with only 24 h worth of data you can point out that it could just be cyclic. Then if they say that its warmer today than yesterday, you can ask where is the evidence that something suddenly changed when the fire was lit.

      • Using your analogy. If someone shows you that the temperature today went up 10 deg today from 6am to noon and screamed Armageddon because we lit a fire, then even with only 24 h worth of data you can point out that it could just be cyclic.

        Cyclic, cubic, polynomial in any function of order greater than three, or an apparently cyclic bit of noise on a curve whose true shape is — anything at all. Mickey Mouse ears. Exponential growth. Exponential decay. A meaningless squiggle.

        The problem is precisely with the word “evidence”. In order to statistically interpret the data, one has to first state your priors, the assumptions you are making that justify the use of axiomatic statistics (generally Bayes theorem, the central limit theorem, etc) in some particular way. If you assume “the data are (approximately) periodic with period 24 h” with a single 24 h sample when in fact, the data are approximately periodic with period 24 h nobody can argue with the consistency of your assumption, but at the same time the data constitute almost zero evidence in favor of your prior assumption. To use a simpler case, if you pull a black marble out of a hat full of marbles, and then assume “all the marbles in this hat are black” nobody is going to be able to disprove your assertion, but the one black marble is only evidence for the existence of a single black marble in the entire hidden statistical universe inside of the hat and one learns nothing beyond the consistency of your assertion from the single black marble, especially with the assertion being made a posteriori (with the marble already in hand) and with no “theory of marbles in hats” that e.g. specifies the number of colors of marbles that could be in the hat, that specifies how many marbles might be in the hat, that specifies whether or not you put the black marble back and give the hat a shake before you draw out the next marble, that specifies whether or not the hat contains a malicious gnome who puts only certain marbles into your hand according to some hidden rule even though the hat contains a lot more marbles of a lot more kinds than you think.

        One gets an entirely different progression of successive “best estimate” probabilities for drawing a black marble from the hat on the basis of a succession of “experiments” of drawing additional marbles from the hat, with or without replacement for all of the different prior assumptions. In the long run, one accumulates enough evidence to eliminate, in the specific sense of reducing their posterior probabilities to zero or close(r) to zero, most of these prior assumptions (except for the malicious gnome or invisible fairies who are hidden and who hide “reality” from your experiments, who can literally never be eliminated in any theory and who thereby allow a hypothesis for the existence of god(s), demons, and pink unicorns to be added to any set of priors that you like without any possibility of refutation via evidence).

        This is “probability theory as the logic of science”, and it is well worth your time, if you want your worldview to be rocked, to invest in E. T. Jaynes book of this name and read it. In it he actually defines “evidence” in a sensible way, and shows how evidence can and should shift an entire coupled set of joint and conditional probabilities around as one accumulates it, lifting up the ones that remain consistent with the evidence and dropping down those that are not. Or (better yet, and) read Richard Cox’s monograph: “The Algebra of Probable Inference”. Or work through Shannon’s theorem, although it is harder going to make the connection to probability theory than it is to take Cox’s (slightly prior) result and obtain Shannon’s.

        This is by no means trivial stuff. In physics, we have (so far) observed three generations of both quarks and leptons. This has led many theorists to hypothesize that that’s all that there are. This makes the number “three” special, in some sense, in our physical universe — why not one, or two, or seven — and hence seems odd. But this is reasoning a posteriori, and hence has almost no force in and of itself, because the basis for that belief is that so far, pulling marbles out of the hat, one almost always obtains red marbles, rarely obtains green marbles, and only if one shakes the hat just right and holds it at a special angle can one sometimes pull out a blue marble. We’ve seen all three colors, so duh, the hat contains at least three colors of marble.

        But the day somebody pulls out an octarine marble (the color of magic, for Terry Pratchett fans) the “three color” theory of marbles in the hat has to be thrown out and the four color of marble theorists, who up to now have languished, unheralded and prone to excessive drinking, suddenly become world famous and the three color theorists add a color or take up basket weaving or work as Wal Mart greeters instead. And don’t forget the five color theorists, or the pesky infinite color theorists who can’t see any particularly good reason to limit the number of marble colors and who want to count marbles with polka dots and stripes as well.

        To conclude, the problem as posed constitutes almost no evidence at all for anything. Some days the temperature hardly varies at all and does NOT follow the diurnal cycle. If one happens to make one’s observations on that one day, one might be tempted to assume “temperature is a constant”, and of course the single day’s worth of data you have is consistent with that assumption. But what of the brave soul who sees that constant data and says, “no, temperature is a sinusoid with a period of one day”? Or one hour. Or two weeks. Or one whole year. Indeed, there is an infinite number of possible periods, and the very long periods are going to look nearly flat on the scale of a single day and then there is an absolutely unknown possibility of aperiodic variation that can be superimposed on any of these assumptions or all of them together.

        How can the data substantially favor one of these assumptions over any of the others, given only the one observation, no supporting theory with independent evidence (which can strengthen your priors substantially and make it much more difficult for them to be moved in the computation of posterior probabilities) and the fact that you are making your assumption a posteriori and not using it to make predictions that are subsequently either validated or refuted?

        rgb

      • Since I resemble this, let me see if I’m wasting my time.
        I started looking at the actual data because while taking astro pictures it became obvious how quickly it cools after sunset.
        So my goal was to see if the rate of cooling had changed over time. While looking at night time cooling, a) it wasn’t clear how to select only clear nights, b) the selection process was so ambiguous any results would be labeled “cherry picking”.
        So I started to include all days, comparing yesterday’s warming to last night’s cooling.

        I also realize that out of the tropics the length of day to night changes through out the year, and you can use the day of the year it changes from warming to cooling as one measure of energy balance.
        So, I wanted to compare daily warming to the the following nights cooling, and the day of the year it went from warming to cooling.

        From input from others for daily cooling I realized that because of the effect of orbital tilt I had to include only daily records from a station if it was a full year, any missing days would bias the results, that still left me with over 69 million daily records for 1940 to 2013 that I can directly compare. If Co2 was reducing night time cooling, there should be a trend. I found two surprising things, when you average all 69 million records, its cooling, and I couldn’t figure out how it could be cooling when in general the temperature has gone up, and on an annual basis some years show more warming other years more cooling.
        First, the cooling years seem to align well with warmer years, that makes sense, and the warming years match the cooler years, that passed my smell test, but I still couldn’t get why over all it showed some cooling, and then I realized that warm moist air is transported from the tropical waters to the land, where it cools and rains out excess moisture.

        The next thing I did was to use the same day to day difference in temp, which on a daily basis shows warming (NH) in the spring, and cooling in the fall, if Co2 was reducing the earths ability to cool, that day it turns to cooling should be moving later in the year, as it would require a longer night to release that excess energy to space, the transition day moved around some, but there wasn’t a clear trend. I then thought that as well as the day moving, the slope of the day to day change should be changing, it is, but it’s more curve than straight line (or wiggly line, I avoid doing a lot of smoothing, but provide the data so if someone wants to smooth it, they can).

        Now this could all be orbital changes, and I carry far too many significant digits (again really to allow for others to decide where they should be stopped), if you round to the same single digit as the source data the difference is 0.0F +/-0.1F on 69 million samples from around the world, to me it looks like there’s no impact to cooling from Co2. Beyond this, there are many thing that I could do, most I don’t really know how (I do include a lot more data when I do my processing, as it’s easy to add things like average dew point, min and max temp, I’m adding solar forcing now, but most of it was investigating the day to day change over time.

      • if you round to the same single digit as the source data the difference is 0.0F +/-0.1F on 69 million samples from around the world, to me it looks like there’s no impact to cooling from Co2.

        Ah, but did you remember to adjust your data? If you don’t adjust it to show warming, it can’t possibly be valid. That’s why we employ so many professional data adjusters worldwide in climate science.

        As was pointed out to the US congress (who unfortunately didn’t understand what was being said), if you want cherry pie, you have to pick cherries.

        rgb

      • I like cherry pie !

        I think the way I do my calculations you can’t just change min or max, it would be really hard to make a warming trend where one didn’t exist.

Comments are closed.