Dr. Michael Mann, Smooth Operator

Guest Post by Willis Eschenbach

People sometimes ask why I don’t publish in the so-called scientific journals. Here’s a little story about that. Back in 2004, Michael Mann wrote a mathematically naive piece about how to smooth the ends of time series. It was called “On smoothing potentially non-stationary climate time series“, and it was published in Geophysical Research Letters in April of 2004. When I read it, I couldn’t believe how bad it was. Here is his figure illustrating the problem:

Figure 1a. [ORIGINAL CAPTION] Figure 1. Annual mean NH series. (blue) shown along with (a) 40 year smooths of series based on alternative boundary constraints (1) – (3). Associated MSE scores favor use of the ‘minimum roughness’ constraint. 

Note the different colored lines showing different estimates of what the final averaged value will be, based on different methods of calculating the ends of the averages. The problem is how to pick the best method.

I was pretty naive back then. I was living in Fiji for one thing, and hadn’t had much contact with scientific journals and their curious ways. So I innocently thought I should write a piece pointing out Mann’s errors, and suggesting a better method. I append the piece I wrote back nearly a decade ago. It was called “A closer look at smoothing potentially non-stationary time series.”

My main insight in my paper was that I could actually test the different averaging methods against the dataset by truncating the data at various points. By doing that you can calculate what you would have predicted using a certain method, and compare it to what the true average actually turned out to be.

And that means that you can calculate the error for any given method experimentally. You don’t have to guess at which one is best. You can measure which one is best. And not just in general. You can measure which one is best for that particular dataset. That was the insight that I thought made my work worth publishing.

Now, here comes the story.

I wrote this, and I submitted it to Geophysical Research Letters at the end of 2005. After the usual long delays, they said I was being too hard on poor Michael Mann, so they wouldn’t even consider it … and perhaps they were right, although it seemed pretty vanilla to me. In any case, I could see which way the wind was blowing. I was pointing out the feet of clay, not allowed.

I commented about my lack of success on the web. I described my findings over at Climate Audit, saying:

Posted Oct 24, 2006 at 2:09 PM

[Mann] recommends using the “minimum roughness” constraint … apparently without noticing that it pins the endpoints.

I wrote a reply to GRL pointing this out, and advocating another method than one of those three, but they declined to publish it. I’m resubmitting it.

w.

So, I pulled out everything but the direct citations to Mann’s paper and resubmitted it basically in the form appended below. But in the event, I got no joy on my second pass at publishing it either. They said no thanks, not interested, so I gave up. I posted it on my server at the time (long dead), put a link up on Climate Audit, and let it go. I was just a guy living in Fiji and working a day job, what did I know?

Then a year later, in 2007 Steve McIntyre posted a piece called “Mannomatic Smoothing and Pinned End-points“. In that post, he also discussed the end point problem.

And now, with all of that as prologue, here’s the best part.

In 2008, after I’d foolishly sent my manuscript entitled “A closer look at smoothing potentially non-stationary time series” to people who turned out to be friends of Michael Mann, Dr. Mann published a brand new paper in GRL. And here’s the title of his study …

“Smoothing of climate time series revisited”

I cracked up when I saw the title. Yeah, he better revisit it, I thought at the time, because the result of his first visit was Swiss cheese.

And what was Michael Mann’s main insight in his new 2008 paper? What method did he propose?

“In such cases, the true smoothed behavior of the time series at the termination date is known, because that date is far enough into the interior of the full series that its smooth at that point is largely insensitive to the constraint on the upper boundary. The relative skill of the different methods can then be measured by the misfit between the estimated and true smooths of the truncated series.”

In other words, his insight is that if you truncate the data, you can calculate the error for each method experimentally … curious how that happens to be exactly the insight I wasted my time trying to publish.

Ooooh, dear friends, I’d laughed at his title, but when I first read that analysis of “his” back in 2008, I must admit that I waxed nuclear and unleashed the awesome power that comes from splitting the infinitive. The house smelled for days from the sulfur fumes emitted by my unabashed expletives … not a pretty picture at all, I’m ashamed to say.

But before long, sanity prevailed, and I came to realize that I’d have been a fool to expect anything else. I had revealed a huge, gaping hole in Mann’s math to people who were obviously his friends … and while for me it was an interesting scientific exercise, for him it represented much, much more. He could not afford to leave the hole unplugged or have me plug it.

And since I had kindly told him how to plug the hole, he’d have been crazy to try something else. Why? Because my method worked … hard to argue with success.

The outcome also proved to me once again that I could accomplish most anything if I didn’t care who got the credit.

Because in this case, the sting in the tale is that at the end of the day, my insights on how to deal with the problem did get published in GRL. Not only that, they got published by the guy who would have most opposed their publication under my name. I gotta say, whoever is directing this crazy goat-roping contest we call life has the most outré, wildest sense of humor imaginable …

Anyhow, that’s why I’ve never pushed too hard to try to publish my work in what used to be scientific journals, but now are perhaps better described as popular science magazines. Last time I tried, I got bit … so now, I mostly just skip getting gnawed on by the middleman and put my ideas up on the web directly.

And if someone wants to borrow or steal or plagiarise my scientific ideas and words and images, I say more power to them, take all you want. I cast my scientific ideas on the electronic winds in the hope that they will take root, and I can only wish that, just like Michael Mann did, people will adopt my ideas as their own. There’s much more chance they’ll survive that way.

Sure, I’d prefer to get credit—I’m as human as anyone, or at least I keep telling myself that. So an acknowledgement is always appreciated.

But if you just want to just take some idea of mine and run, sell it under another brand name, I say go for it, take all you want, because I’ve learned my lesson. The very best way to keep people from stealing my ideas is to give them away … and that’s the end of my story.

As always, my best wishes for each of you … and at this moment my best wish is that you follow your dream, you know the one I mean, the dream you keep putting off again and again. I wish you follow that dream because the night is coming and no one knows what time it really is …

w.

[UPDATE] In my above-mentioned comment on Steve McIntyre’s blog, I mentioned the analysis of Mannian smoothing by Willie Soon, David Legates, and Sallie Baliunas, entitled Estimation and representation of long-term (>40 year) trends of Northern-Hemisphere-gridded surface temperature: A note of caution. 

Dr. Soon has been kind enough to send me a copy of that study, which I have posted up here. My thanks to him, it’s an interesting paper.

=====================================================

APPENDIX: Paper submitted to GRL, slightly formatted for the web.

—————

A closer look at smoothing potentially non-stationary time series

Willis W. Eschenbach

No Affiliation

[1] An experimental method is presented to determine the optimal choice among several alternative smoothing methods and boundary constraints based on their behavior at the end of the data series. This method is applied to the smoothing of the instrumental Northern Hemisphere (NH) annual mean, yielding the best choice of these methods and constraints.

1. Introduction

[2] Michael Mann has given us an analysis of various ways of smoothing the data at the beginning and the end of a time series of data (Mann 2004, Geophysical Research Letters, hereinafter M2004).

These involve minimizing different boundary conditions at those boundaries, and are called the “minimum norm”, “minimum slope”, and “minimum roughness” methods. These methods minimize, in order, the zeroth, first, and second derivatives of the smoothed average. M2004 describes the methods as follows:

“To approximate the ‘minimum norm’ constraint, one pads the series with the long-term mean beyond the boundaries (up to at least one filter width) prior to smoothing.

To approximate the ‘minimum slope’ constraint, one pads the series with the values within one filter width of the boundary reflected about the time boundary. This leads the smooth towards zero slope as it approaches the boundary.

Finally, to approximate the ‘minimum roughness’ constraint, one pads the series with the values within one filter width of the boundary reflected about the time boundary, and reflected vertically (i.e., about the ‘‘y’’ axis) relative to the final value. This tends to impose a point of inflection at the boundary, and leads the smooth towards the boundary with constant slope.” (M2004)

[3] He then goes on to say that the best choice among these methods is the one that minimizes the mean square error (MSE) between the smoothed data and the data itself:

“That constraint providing the minimum MSE is arguably the optimal constraint among the three tested.” (M2004)

2. Method

[4] However, there is a better and more reliable way to choose among these three constraints. This is to minimize the error of the final smoothed data point in relation, not to the data itself, but to the actual final smoothed average (which will only be obtainable in the future). The minimum MSE used in M2004 minimizes the squared error between the estimate and the data points. But this is not what we want. We are interested in the minimum mean squared error between the estimate and the final smoothed curve obtained from the chosen smoothing method. In other words, we want the minimum error between the smoothed average at the end of the data and the smoothed average that will actually be obtained in the future, when we have enough additional data to determine the smoothed average exactly.

[5] This choice can be determined experimentally, by realizing that the potential error increases as we approach the final data point. This is because as we approach the final data point, we have less and less data to work with, and so the potential for error grows. Accordingly, we can look to see what the error is with each method in the final piece of data. This will be the maximum expected error for each method. While we cannot determine this for any data nearer to the boundary than half the width of the smoothing filter, we can do so for all of the rest of the data. It is done by truncating the data at each data point along the way, calculating the estimated value of the final point in this truncated dataset using the minimum norm, slope, and roughness methods, and seeing how far they are from the actual value obtained from the full data set.

[6] In doing this, a curious fact emerges — if we calculate the average using the “minimum roughness” method outlined above, the “minimum roughness” average at the final data point is just the final data point itself. This is true regardless of the averaging method used. If we reflect data around both the time axis and the y-axis at the final value, the data will be symmetrical around the final value in both the “x” and “y” directions. Thus the average will be just the final data point, no matter what smoothing method is used. This can be seen in Fig. 1a of M2004:

ORIGINAL CAPTION: Figure 1. Annual mean NH series. (blue) shown along with (a) 40 year smooths of series based on alternative boundary constraints (1)–(3). Associated MSE scores favor use of the ‘minimum roughness’ constraint. (Mann 2004)

[7] Note that the minimum roughness method (red line) goes through the final data point. But this is clearly not what we want to do. Looking at Fig. 1, imagine a “smoothed average” which, for a data set truncated at any given year, must end up at the final data point. In many cases, this will yield wildly inaccurate results. If this method were applied to the data truncated at the high temperature peak just before 1880, for example, or the low temperature point just before that, the “average” would be heading out of the page. This is not at all what we are looking for, so the choice that minimizes the MSE between the data and the average (the “minimum roughness” choice) should not be used.

[8] Since the minimum roughness method leads to obvious errors, this leaves us a choice between the minimum norm and minimum slope methods. Fig. 2 shows the same data set with the point-by-point errors from the three methods (minimum norm, minimum slope, and minimum roughness) calculated for all possible points. (The error for the minimum roughness method, as mentioned, is identical to the data set itself.)

[9] To determine these errors, I truncated the data set at each year, starting with the year that is half the filter width after the start of the start of the dataset. Then I calculated the value for the final year of the truncated data set using each of the different methods, and compared it to the actual average for that year obtained from the full data set. I am using a 41-year Gaussian average as my averaging method, but the underlying procedure and its results are applicable to any other smoothing method. I have used the same dataset as Mann, the Northern Hemisphere mean annual surface temperature time series of the Climatic Research Unit (CRU) of the University of East Anglia   [Jones et al., 1999], available at http://www.cru.uea.ac.uk/ftpdata/tavenh2v.dat.

Figure 2. Errors in the final data point resulting from different methods of treating the end conditions. The “minimum roughness” method error for the dataset truncated at any given year is the same as the data point for that year.

3. Applications

[10] The size of the errors of the three methods relative to the smoothed line can be seen in the graph, and the minimum slope method is clearly superior for this data set. This is verified by taking the standard deviation of each method’s point-by-point distance from the actual average. Minimum roughness has the greatest deviation from the average, a standard deviation of 0.110 degrees. The minimum norm method has a standard deviation of 0.065 degrees from the actual average, while the minimum slope’s standard deviation is the smallest at 0.048.

[11] Knowing how far the last point in the average of the truncated data wanders from the actual average allows us to put an error bar on the final point of our average. Here are the three methods, each with their associated error bar (all error bars in this paper show 3 standard deviations, and are slightly offset horizontally from the final data point for clarity).

Figure 3. Potential errors at the end of the dataset resulting from different methods of treating the end conditions. Error bars represent 3 standard deviations. The minimum slope constraint yields the smallest error for this dataset.

[12] Note that these error bars are not centered vertically on the final data point of each of the series. This is because, in addition to knowing the standard deviation of the error of each end condition, we also know the average of each error. Looking at Fig. 2, for example, we can see that the minimum norm end condition on average runs lower than the true Gaussian average. Knowing this, we can improve our estimate of the error of the final point. In this dataset, the centre of the confidence limits for the minimum norm will be higher than the final point by the amount of the average error.

3.1 Loess and Lowess Smoothing

[13] This dataset is regular, with a data point for each year in the series. When data is not regular but has gaps, loess or lowess smoothing is often used. These are similar to Gaussian smoothing, but use a window that encompasses a certain number of data points, rather than a certain number of years.

[14] When the data is evenly spaced, both lowess and loess smoothing yield very similar results to Gaussian smoothing. However, the treatment of the final data points is different from the method used in Gaussian smoothing. With loess and lowess smoothing, rather than using less and less data as in Gaussian smoothing, the filter window stays the same width (in this case 41 years). However, the shape of the curve of the weights changes as the data nears the end.

[15] The errors of the loess and lowess averaging can be calculated in the same way as before, by truncating the dataset at each year of the data and plotting the value of the final data point. Fig. 4 shows the errors of the two methods.

Figure 4. Lowess and loess smoothing along with their associated end condition errors.

[16] The end condition errors for lowess and loess are quite different, but the average size of the errors is quite similar. Lowess has a standard deviation of .062 from the lowess smoothed data, and loess has a standard deviation of .061 from the loess smoothed data. Fig 5 shows the Gaussian minimum slope (the least error of the three M2004 end conditions), and the lowess and loess smoothings, with their associated error bars.

Figure 5. Gaussian, lowess and loess smoothing along with their associated error bars. Both lowess and loess have larger errors than the Gaussian minimum slope error.

  [17] Of the methods tested so far, the error results are as follows:

METHOD                      Standard Deviation of Error

Gaussian Minimum Roughness            0.111

Gaussian Minimum Norm                 0.065

Lowess                                0.062

Loess                                 0.061Gaussian Minimum Slope                0.048

[18] Experimentally, therefore, we have determined that of these methods, for this data set, the Gaussian minimum slope method gives us the best estimate of the smoothed curve which we will find once we have enough additional years of data to determine the actual shape of the curve for the final years of data.

3.2 Improved and Alternate Methods

[19] At least one better method of dealing with the end conditions exists. I call it the “minimum assumptions” method, as it makes no assumptions about the future state of the data. It simply increases the result of the Gaussian smoothing by an amount equal to the weight of the missing data. Gaussian smoothing works by multiplying each data point within the filter width by a Gaussian weight. This weight is greatest for the central point of the filter. From there it decreases in a Gaussian “bell-shaped” curve for points further and further away from the central point. The weights are chosen so that the total of the weights summed across the width of the filter adds up to 1.

[20] Let us suppose that as the center of the filter approaches the end of the dataset, the final two weights do not have data associated with them because they are beyond the end of the dataset. The Gaussian average is calculated in the usual manner, by multiplying each data point with its associated weight and summing the weighted data. The final two points, of course, do not contribute to the total, as they have no data associated with them.

[21] However, we know the total of the weights for the other data points. Normally, all of the weights would add up to 1, but as we approach the end of the data there are missing data points within the filter width. Their total of the existing data points might only be say 0.95, instead of 1. Knowing that we only have 95% of the correct weight, we can approximate the correct total by dividing the sum of the existing weighted data points by 0.95. The net effect of this is a shifted weighting which, as the final data point is approached, shifts the center of the weighting function further and further forwards toward the final data point.

[22] The standard deviation of the error of the minimum slope method, calculated earlier, was 0.048. The standard deviation of the error of the minimum assumptions method is 0.046. This makes it, for this data set, the most accurate of the methods tested. Fig. 6 shows the difference between these two methods at the end of the data set.

Figure 6. Gaussian minimum slope and minimum assumptions error bars. The minimum assumptions method provides the better estimate of the future smoothed curve.

[23] We can also improve upon an existing method. The obvious candidate for improvement is the minimum norm method. It has been calculated by padding the data with the average of the full dataset, from the start to the end of the data. However, we can choose an alternate interval on which to take our average. We can calculate (over most of the dataset) the error resulting from any given choice of interval. This allows us to choose the particular interval that will minimize the error. For the dataset in question, this turns out to be padding the end of the dataset with the average of the previous 5 years of data. Fig 7 shows the individual errors from this method, compared with the minimum assumptions method. Since the results from the two very different methods are quite similar, this increases confidence in the conclusion that these are the best of the alternatives.

Figure 7. Smoothed data (red), minimum assumptions errors (green), tuned minimum norm (previous 5-year average) errors (blue)

[24] The standard deviation of the error from the minimum norm with a 5-year average is slightly smaller than from the minimum assumptions method, 0.045 versus 0.046.

4. Discussion

[25] I have presented a method for experimentally determining which of a number of methods yields the closest approximation to a given smoothing of a dataset at the ends of the dataset. The method can be used with most smoothing filters (Gaussian, loess, low-pass, Butterworth, or other filter). The method also experimentally determines the average error and the standard deviation of the error of the last point of the dataset. Although the Tuned Minimum Norm method yields the best results for this dataset, this does not mean that it will give the best results for other datasets. It also does not mean that the Tuned Minimum Norm method is the best smoothing method possible; there may be other smoothing methods out there, known or unknown, which will give a better result on a given dataset.

[26] The method for experimentally determining the smoothing method with the smallest end-point error is as follows:

1)  For each data point for which all of the data is available to determine the exact smoothed average, determine the smoothed result that would be obtained by each candidate method if that data point were the final point of the data. (While this can be done by truncating the data at each point, padding the data if required, and calculating the result, it is much quicker to use a modified smoothing function which simply treats each data point as if it were the last point of the dataset and applies the required padding.)

2)  For each of these data points, subtract the actual smoothed result of the given filter at that point from the smoothed result of treating that point as if it were the final point. This gives the error of the smoothing method for the series if it were truncated at that data point.

3)  Take the average and the standard deviation of all of the errors obtained by this analysis.

4)  Use the standard deviation of these errors to determine the best smoothing method.

5)  Use the average and the standard deviation of these errors to establish confidence limits at the final point of the smoothed data.

5. Conclusions

1)  The Minimum Roughness method will always yield the largest standard deviation of the endpoint error in relation to the smoothed data and is thus the worst method to choose.

2)  For any given data set, the best method can be chosen by selecting the method with the smallest standard deviation of error as measured on the dataset itself.

3)  The use of an error bar at the end of the smoothed average allows us to gauge the reliability of the smoothed average as it reaches the end of the data set.

References

Mann, M., 2004, On smoothing potentially non-stationary climate time series, Geophysical Research Letters, Vol. 31, 15 April 2004

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
5 5 votes
Article Rating
207 Comments
Inline Feedbacks
View all comments
Eliza
March 31, 2013 10:24 am

Re willis v Mann. There is no need to do anything.. The tide is turning, even the economist is now doubting… once this starts, basically those guys are essentially finito.

TimC
March 31, 2013 11:55 am

Willis – I have essentially the same problem as BarryW: accepting that (a) that one can postulate – and test – several alternative smoothing methods to determine their behaviours leading up to the end of the data series and (b) chart each method with sensible (perhaps three sigma, as your paper) end error bars – but (other than making the obvious check that the actual end-point lies within the error bars) while this perhaps might allow a view to be taken on what is likely to occur I don’t see that this can be truly predictive of the future. For example, a paradigm shift (whether in absolute values, or perhaps in the first or second differential) might occur soon after the end of the series that it is just not possible to predict – other than saying (with the benefit of hindsight) that its occurrence was improbable in the light of the data previously known.
So: one must either truncate the series so as to end at the latest smoothed average point and be content with hindsight or, ultimately, acknowledge the possibility of a paradigm shift – possibly one of previously unknown or unanticipated origin which was not tested as it had never occurred previously. In plain terms – we humans can’t predict what the future might hold: the best we can do is make the assumption that everything will muddle along more or less in the same old way as it has always done before.
Am I missing something here?

Frank
March 31, 2013 3:26 pm

Willis: Your post left out some important details. Did the editor of GRL sent you paper out for the usual anonymous peer review, or did he decide on his own that the material wasn’t suitable for publication. Perhaps I too trusting, but even the climategate emails don’t suggest that a busy editor – who is constantly referring disputes between competing scientists – would share your paper with Mann, but there is ample precedent that a peer reviewer could have. The easiest way would have been to forward an electronic copy of your paper.
It is difficult to prove that an idea has been plagiarized, but far easier to demonstrate that text or examples have been plagiarized. Have you used software to try to detect common passages between your rejected paper(s) and Mann’s published work?
You should consider sending the editor your draft papers and Mann’s published work and ask they used Mann to peer review your work. It would be difficult to ignore serious misconduct of this type.

Bart
March 31, 2013 4:22 pm

david moon says:
March 30, 2013 at 5:28 pm
“Fourier analysis does not “assume” sinusoidal components. “
I agree. See comment to Mark below.
“Gaussian averaging in the time domain- the frequency response as a low pass filter is not that great.”
It depends on how it is truncated, and what you are trying to accomplish. It can have an excellent rate of roll-off, but not a very good bandwidth to length relationship. So, it is good for suppressing a high frequency disturbance, but passing other stuff through. However, a bandstop filter designed for that purpose is generally better, if you have the tools to construct one.
“Infinite Impulse Response (IIR) filters can be designed for a desired frequency response with much less delay”
But, with nonlinear phase. Generally speaking, the delay of an IIR filter within the passband will be comparable on average to the delay of a similar bandwidth FIR filter.
The great advantage of linear phase symmetric FIR filters is that all signals experience the same delay, and thus we get the marvelous clarity of sound reproduction of modern digital systems without phase distortion.
Mark T says:
March 30, 2013 at 7:24 pm
“Indeed, the only thing an FFT will “detect” (it doesn’t really “detect” anything) is a sinusoid, and if there are none there, it is a relatively useless tool.”
Have to disagree there. The FFT is a fast method of computing the Discrete Fourier Transform (DFT), which is a sampled frequency version of the Discrete Time Fourier Transform (DTFT), which is a continuous function of frequency. Every L2 bounded signal has a unique DTFT which, as you say, is a measure of the correlation of the signal with a sinusoidal functional basis. The DFT can be made to approach the DTFT, i.e., the grid of sampled frequencies can be made more dense, by zero-padding.
“But again, even an optimal model (w.r.t. any given criteria) will fail as soon as the statistics of the data change, which is what happens with non-stationary data.”
The signals we are looking at give every indication of having increments which are effectively wide sense stationary. The global average temperature anomaly is composed, in and beyond the past century, mostly of
an at-most lightly damped sinusoidal system, with energy concentrated near the 60 year cycle, plus a trend. The CO2 data is dominated by the integration of a function of temperature, which can be approximated to high fidelity as a constant coefficient affine function over the past 55 years.

Bart
March 31, 2013 4:24 pm

Mod: Apologies, I missed a closing tag. Could you substitute the following for the post I just submitted at 4:22 pm? Thanks.
david moon says:
March 30, 2013 at 5:28 pm
“Fourier analysis does not “assume” sinusoidal components. “
I agree. See comment to Mark below.
“Gaussian averaging in the time domain- the frequency response as a low pass filter is not that great.”
It depends on how it is truncated, and what you are trying to accomplish. It can have an excellent rate of roll-off, but not a very good bandwidth to length relationship. So, it is good for supressing a high frequency disturbance, but passing other stuff through. However, a bandstop filter designed for that purpose is generally better, if you have the tools to construct one.
“Infinite Impulse Response (IIR) filters can be designed for a desired frequency response with much less delay”
But, with nonlinear phase. Generally speaking, the delay of an IIR filter within the passband will be comparable to the delay of a similar bandwidth FIR filter.
The great advantage of linear phase symmetric FIR filters is that all signals experience the same delay, and thus we get the marvelous clarity of sound reproduction of modern digital systems without phase distortion.
Mark T says:
March 30, 2013 at 7:24 pm
“Indeed, the only thing an FFT will “detect” (it doesn’t really “detect” anything) is a sinusoid, and if there are none there, it is a relatively useless tool.”
Have to disagree there. The FFT is a fast method of computing the Discrete Fourier Transform (DFT), which is a sampled frequency version of the Discrete Time Fourier Transform (DTFT), which is a continuous function of frequency. Every L2 bounded signal has a unique DTFT which, as you say, is a measure of the correlation of the signal with a sinusoidal functional basis. The DFT can be made to approach the DTFT, i.e., the grid of sampled frequencies can be made more dense, by zero-padding.
“But again, even an optimal model (w.r.t. any given criteria) will fail as soon as the statistics of the data change, which is what happens with non-stationary data.”
The signals we are looking at give every indication of having increments which are effectively wide sense stationary. The global average temperature anomaly is composed, in and beyond the past century, mostly of an at-most lightly damped sinusoidal system, with energy concentrated near the 60 year cycle, plus a trend. The CO2 data is dominated by the integration of a function of temperature, which can be approximated to high fidelity as a constant coefficient affine function over the past 55 years.

george e smith
March 31, 2013 4:54 pm

“””””…..
david moon says:
March 30, 2013 at 5:28 pm
Re: various comments about Fourier/frequency domain analysis above:
Fourier analysis does not “assume” sinusoidal components. It will detect them if they are there. White noise will be a “flat” spectrum with no prominent components. As an EE I do this all the time when looking at noisy signals- set my oscilloscope to FFT and see what’s happening in the frequency domain…….”””””
Not sure who’se assertion that was, But I am wracking my brains to try and think of any other well known continuous mathematical function, for which the word “frequency” has any meaning whatsoever. I’m not saying that none exists; just that I can’t think of any.
So far as I know, any continuous function for whiich f(t – p) = f(t) for any (t) and some fixed parameter (p); which is not a sinusoid (or cosinusoid if you like), can itself be replaced by a set of sinusoids (cosinusoids) that are harmonically related in frequency.
Fourier analysis, is just one of a vast number of representations, whereby a continuous function is synthesized as a sum of other functions, so long as those functions form an orthonormal set.
Bessel functions, Legendre Polynomials, Tchebychev Polynomials, are just a few examples of orthogonal functions that can be used to synthesize any continuous function. The word “frequency” has no meaning for any of those functions.
Fourier synthesis IS limited to sinusoidal representations. I believe the continuous function must be strictly periodic ( and therefore of infinite duration) in order to get a harmonic series expansion; but finite (in time), or aperiodic functions, require the integral form (Fourier transform).
And as DirkH and others point out, the Fourier transform itself has an end point truncation problem too.
Engineers tend to be blase about the robustness of a solution. We do tend to think,if we can get a solution or answer, it must be the correct answer.
But the pure mathematicians spend a lot of time on existence theorems, and wonder whether a solution exists; instead of simply finding that solution.
Who the hell else, but a pure mathematician, would bother to prove rigorously, that an “absolutely convergent” series, converges; I mean, it has to, doesn’t it ?
Unfortunately, I have to live in both worlds

markx
March 31, 2013 6:23 pm

Hoser says: March 31, 2013 at 5:19 am
…… truncation is still BS. …….Let’s say you truncate data from the right side where those points have a fit slope of 1. Now you test various smoothing methods A, B, and C, and then restore the missing data. Then you check how well you did using methods A, B, and C. You find method A worked the best. Now replace the right side data with another set of points with a fit slope of -1 (or anything else you can imagine that makes sense). Does method A still do the best job of smoothing/fitting the new data set? Not necessarily. How does truncation test the intrinsic superiority of various smoothing methods? Seems too much like climate science to me.
Hoser, that may be the weirdest bit of logic I have ever read.
Basically this ‘best fit’ is trying to predict the future, (in that we would like the end point of the best fit curve to be as accurate as possible) and as baseball-playing philosopher, Yogi Berra said; “It’s tough to make predictions, especially about the future”.
You can only work with the data you do have, and Willis has demonstrated probably the most logical way to come up with a best fit smoothing curve using the data available at the time.
And sure, if future data points then depart the trend, that best fit curve will be proven wrong (and will be moved accordingly as data points accumulate) …. but it was the best available at the time.

david moon
March 31, 2013 7:00 pm

I went back to the original Mann paper in question. I had thought the issue was using an FIR filter, and how to extrapolate when the “data runs out”. From the paper:
“We first make use of a routine that we have written in
the ‘Matlab’ programming language which implements
constraints (1)–(3), as described above, making use of a
10 point ‘‘Butterworth’’ low-pass filter for smoothing”
They further show graphs of “40 year smooth” and “20 year smooth”
The Matlab reference for the Butterworth function shows parameters “n” (order) and Wn (normalized cutoff frequency). So what is “10 point”- is that n? And was Wn changed to give the 20 or 40 year smooth?
The function is also IIR using only past samples (z^-1, z^-2, etc.). So why the need to extend the series past the end?
And then in the conclusions, to paraphrase, if we extend the data with the same slope as the few end years, our smoothed data continues to rise, which might be non-stationary, i.e. a new trend or change in the statistics. Duh- if we assume our result, then we see it.

Bart
March 31, 2013 7:07 pm

george e smith says:
March 31, 2013 at 4:54 pm
“Fourier synthesis IS limited to sinusoidal representations.”
A Fourier Series is. A Fourier Transform is much more general, and can represent any L2 bounded function.

Joe
March 31, 2013 7:32 pm

Maybe this will help you guys understand the process:
The second thought is that a consequence of Baconian, goal-oriented science is that the goal can eat up the science. When science was reoriented to the service of engineering and industry, it guaranteed that eventually there would be steady pressure on research in the direction of pre-determined goals. Call it The Revenge of the Final Causes. The [extra-scientific] goal becomes so important that the research must be cut and fit to support it, and any science (or scientist) that does not fit gets screened out by “peer review.”
Interestingly, peer review was a medieval method developed in theology to ensure orthodoxy in the writings of theologians. (It was because he did not accept the alterations suggested by the peer reviewers that William of Ockham never received his doctorate.) It is now considered “scientific” but its methods and purposes remain the same.
http://tofspot.blogspot.com/2013/03/science-in-drag.html#more

David
March 31, 2013 9:38 pm

How the heck is GRL not guilty of knowingly giving credit to Mann, for work from Willis, which they previousely rejected, but were fully aware of?. Is their official fines etc, for such acts? Mann got paid to produce work which both he and GRL knew Willis had done. Willis may not be able to prove Mann knew, but GRL appears to be caught dead to rights.
From the post…
….”Back in 2004, Michael Mann wrote a mathematically naive piece about how to smooth the ends of time series. It was called “On smoothing potentially non-stationary climate time series“, and it was published in Geophysical Research Letters in April of 2004. When I read it, I couldn’t believe how bad it was. Here is his figure illustrating the problem…
….Now, here comes the story.
I wrote this, and I submitted it to Geophysical Research Letters at the end of 2005. After the usual long delays, they said I was being too hard on poor Michael Mann, so they wouldn’t even consider it … and perhaps they were right, although it seemed pretty vanilla to me. In any case, I could see which way the wind was blowing. I was pointing out the feet of clay, not allowed….
….So, I pulled out everything but the direct citations to Mann’s paper and resubmitted it basically in the form appended below…..
…..In 2008, after I’d foolishly sent my manuscript entitled “A closer look at smoothing potentially non-stationary time series” to people who turned out to be friends of Michael Mann, Dr. Mann published a brand new paper in GRL. And here’s the title of his study …
“Smoothing of climate time series revisited”
….And what was Michael Mann’s main insight in his new 2008 paper? What method did he propose?,,,,
….In other words, his insight is that if you truncate the data, you can calculate the error for each method experimentally … curious how that happens to be exactly the insight I wasted my time trying to publish.
Again I ask, how the hell is GRL not guilty of knowingly giving credit to Mann, for work from Willis, which they rejected, but were fully aware of?.

TimC
March 31, 2013 10:42 pm

Willis: you of course picked up my loose terminology (“….truly predictive of the future”) in my last post – I was, perhaps unsuccessfully, seeking to differentiate between actual (true, experimental) data collection on the one hand and formulation of theory on the other.
My working definition of statistics is “mathematics of the collection, organization, and interpretation of numerical data”. This of course refers to actual (raw, observed) data. This might customarily be averaged, as to climate for example. However, in that case the actual (true, averaged) data stops at the latest average point, at least half the averaging length behind present day. Anything past that point is (progressively degrading) guesswork not data – perhaps it is “informed” guesswork based on some theory or other, or on better or worse “statistics” (which can truly only be the assumption that everything will muddle on much in the same ways as experienced in the past). IMHO it should always be expressly caveated, and any theory based on it has to be regarded as suspect until the actual (averaged) data is available later.

george e. smith
April 1, 2013 1:35 am

“””””…..Bart says:
March 31, 2013 at 7:07 pm
george e smith says:
March 31, 2013 at 4:54 pm
“Fourier synthesis IS limited to sinusoidal representations.”
A Fourier Series is. A Fourier Transform is much more general, and can represent any L2 bounded function…….””””””
I said nothing at all about what functions can (not) be represented by the Fourier transform..
I did say that periodic functions (unbounded in the time domain) can be synthesized as a series sum of harmonically related sinusoids.
But a time bounded function, (starts and stops) or a non-periodic function, cannot be represented as a harmonically related sum of sinusoids.
So the Fourier transform represents “any L2 bounded function” as a spectrum of what ? If not sinusoids. What non-sinusoidal function that is not itself a sum of sinusoids, is unbounded in time, and is periodic, with a defined frequency ?

Paul Vaughan
April 1, 2013 4:20 am

Minimizing SD isn’t the only important consideration. Note that the 2 methods preferred based on this narrow criterion are systematically biased (e.g. always too low on the long rise). What of minimizing bias? A consideration of both accuracy & precision is due. Precisely inaccurate estimates have been prioritized. Why? The analogy: Under systematically predictable conditions (the long rise), all the bullets are hitting very close to the exact same spot (low SD), but the spot is not the bull’s eye. It’s not only the size of the errors that matters; the distribution of the errors should show random scatter — i.e. no systematic patterns. Why not trade a bit of that precision for some more accuracy?
An interesting, worthwhile topic.

Bart
April 1, 2013 10:21 am

george e. smith says:
April 1, 2013 at 1:35 am
“But a time bounded function, (starts and stops) or a non-periodic function, cannot be represented as a harmonically related sum of sinusoids.”
The Fourier Transform is not a sum of harmonically related sinusoids. It is an integral (in essence, an infinte sum) of sinusoids over a densely packed continuum of frequencies. The Fourier Transform represents a square integrable function relative to an infinte dimensional functional basis which spans L2. When you take the limit to infinity, the representation is no longer limited to periodic functions.
“What non-sinusoidal function that is not itself a sum of sinusoids, is unbounded in time, and is periodic, with a defined frequency ?”
For example, the Fourier Transform (under the usual EE convention) of the very non-periodic exp(-t) for t >= 0 is 1/(j*omega + 1), where omega is radial frequency and j is the square root of -1. This function has a magnitude 1/sqrt(omega^2 + 1) which represents how the components of the inifine sum are scaled, and a phase atan(omega) which represents how they are shifted in time relative to one another. This is an exceedingly elementary example.
Any L2 function, whether periodic or not, can be represented by its Fourier Transform, from which the original signal can be fully recovered, so both the time series and the frequency domain representation hold the same information, and are thereby considered equivalent.
Now, we are limited in evaluating Fourier Transforms for functions which are necessarily limited in time. Over a finite time interval, any function can be represented as the sum of periodic signals. The function simply repeats itself beyond the time interval, so it has no innate predictive value.
However, because we have lots of experience with Fourier Transforms and exceedingly common functional forms which occur in nature, we can generally extend the result in continuous fashion beyond the final time with high confidence in the extrapolation. All the more so if we have a theoretical basis for the true functional form the series should take, and can parameterize the theoretical model based on spectral analysis. But, the ubiquity of complex exponential and low order polynomial functions in nature generally allows us to do this even if we do not yet have a firm theoretical basis.
Noise is a hindrance to this endeavor, but we have found ways to get around it. The FFT itself is lousy at dealing with stochastic signals. That is why we estimate power spectral densities instead, and there are many methods for producing a PSD from noisy data for the purpose of identifying the underlying system model. The easiest and least constrained generally rely on the FFT, but it requires special processing by a qualified analyst.
In the field of identifying underlying system models from noisy data, other fields of specialty are well advanced beyond the apparently meager skills of the climate science establishment. The relationships in the climate data can, in fact, be discerned by the naked eye and are quite elementary. It is very apparent that their theoretical constructs are entirely wrong as regards the dynamics of this system. To me, they look like witch doctors or voodoo practitioners, vainly trying to force their hypotheses onto the data, and it is very clear that they will ultimately fail in that endeavor, the only question being how much damage they will do to science and the public weal before they realize it.

Bart
April 1, 2013 10:45 am

Incidentally, to any who are interested, I use “L2” to describe both square integrable and square summable functions in the continuous and discrete, respectively, time domains. Conventionally, the upper case is used to designate square integrable continuous time functions, and lower case for square summable discrete time series. However, “l2” looks like “eye-two” in the standard fonts, so I am using upper case for both.
As we are dealing with a continuous system for which the data are sampled, and the tools we have at our disposal are applied in the digital domain, we have to flip back and forth between the two paradigms, but I don’t want to muck up the conversation too much explaining every detail. Anyone who follows the discussion should be able to discern which normed space I am talking about relative to the context. Anyone who doesn’t, please disregard this message and carry on.

April 1, 2013 11:46 am

Joe:
In your post at March 31, 2013 at 7:32 pm you say

Interestingly, peer review was a medieval method developed in theology to ensure orthodoxy in the writings of theologians.

Well, sort of.
But you do remind of important issues which remain important and have relevance to the present day.
The practices and principles of the modern scientific method were all adopted from the methods of classical Christian theology.
This is not surprising because theology was the main subject for study in every university course at the time of the Reformation. Modern science came about when it was decided that the unassailable authority is empirical evidence and not any other authority (e.g. the Church and/or any scripture). This decision was applied, and it was a revolution in thought which was exemplified in the motto adopted by the Royal Society; i.e. nullius in verba (on the authority of nobody).
From that sprang all the benefits of science and technological advance which today we take for granted.
Two issues derive from this.
Firstly, and relatively trivially, some people attempt to pretend there is a dichotomy between science and religion. This is not and never has been true: one of the oldest astronomical observatories is in the Vatican and is operated by the Roman Catholic Church, most great scientists have been religious practitioners, etc..
The methods of theological and scientific thought are the same but acknowledge different “unassailable evidence” because they have different purposes. Hence, arguments about science OR religion are pointless and disrupt serious discussion (including often on WUWT).
Secondly, and much more importantly, any claim to any authority other than empirical evidence is a denial of the most fundamental scientific principle. Hence, appeals to “consensus” or any other authority are a denial of the scientific method which attempt to return us to pre-Reformation thought.
Peer review can be – and often has been – abused, but it is a method to determine if a scientific paper opposes the only unassailable authority of science; viz. empirical evidence. If the paper is in disagreement with empirical evidence then the only scientific decision is to reject it otherwise its publication should be allowed.
But, of course, not every paper which should be allowed publication deserves to be published. The contents of a paper determine if a paper deserves publication, and this is not affected by who presents the paper (nullius in verba).
The experience of peer review reported by Willis Eschenbach in his article not only injures him: it also injures the most fundamental of all scientific principles. It is a disgrace.
Richard

george e. smith
April 1, 2013 4:14 pm

“””””….. Bart says:
April 1, 2013 at 10:21 am
george e. smith says:
April 1, 2013 at 1:35 am
“But a time bounded function, (starts and stops) or a non-periodic function, cannot be represented as a harmonically related sum of sinusoids.”
The Fourier Transform is not a sum of harmonically related sinusoids. It is an integral (in essence, an infinte sum) of sinusoids over a densely packed continuum of frequencies…….”””””
Bart,
Absolutely NOWHERE have I ever stated, or suggested, that the Fourier Transform IS a sum of harmonically related sinusoids: I quote:-……….”””””………“But a time bounded function, (starts and stops) or a non-periodic function, cannot be represented as a harmonically related sum of sinusoids.”…………..”””””””””
There, in fact I have specifically said exactly the opposite.
Only a periodic continual (never starts, never stops) function CAN be represented as a Fourier Series of harmonically related sinusoids.
The original question being discussed, was what exactly is being used by the Fourier transform to represent a finite time (starts and stops) NON-periodic time function. I and others said the frequency spectrum consists only of sinusoidal functions; and I added they are NOT harmonically related, but some sort of continuum spectrum of frequencies; and they ARE sinusoids, at anyt non zero frequency in the transformed spectrum.
You and others have asserted they aren’t sinusoids, so I have simply asked; then WHAT are they; those time functions that are used to represent that arbitrary time function ??

Bart
April 2, 2013 12:28 am

george e. smith says:
April 1, 2013 at 4:14 pm
“You and others have asserted they aren’t sinusoids, so I have simply asked; then WHAT are they; those time functions that are used to represent that arbitrary time function ??”
It isn’t so straightforward because you are dealing with not mere superposition, but integration over an infinite and dense expanse of “frequency” for the representation. At any specific frequency, the Fourier spectrum of an L2 function has measure zero, so you cannot say that it is composed of any specific sinusoid at all. Strictly speaking, a persistent sinusoid isn’t even an L2 function, though a truncated sinusoid confined to a finite interval is.
So, it is difficult to provide a concrete answer to your question – it’s a little like trying to describe how a four dimensional object looks – our minds just aren’t built for it. But, the FT is still a (extremely) useful abstraction, and you can almost always gain a lot of insight into a given system by Fourier-based analysis. Especially if you have a lot of experience with such analysis, and recognize general features which commonly manifest themselves in it when dealing with natural systems.

Bart
April 2, 2013 12:33 am

Bart says:
April 1, 2013 at 10:21 am
“…the Fourier Transform … of … exp(-t) for t >= 0 is 1/(j*omega + 1), … and a phase atan(omega) “
Missed the negative sign – the phase is atan(-omega).

george e. smith
April 2, 2013 12:17 pm

Well, isn’t the whole point, that either a Fourier series, or a Fourier transform is only a mathematical fiction, and if you try to pick out a specific frequency from the spectrum, the narrower you confine the frequency, the longer must be its time of duration, so that for a single frequency (sinusoid) the signal would have to exist for all time. It simply reinforces my original assertion, that the most accurate representation of a set of experimental data values, is that set of data values itself. Truncating a continuous signal, must broaden its spectral width in the frequency domain.

Bart
April 2, 2013 12:51 pm

george e. smith says:
April 2, 2013 at 12:17 pm
But, the data values themselves do not give any insight into the underlying process. We know the form of common processes, and the signatures they create in the PSD. That allows us to infer how the process will evolve in the future based on voluminous experience with other natural systems.
We may be arguing (are we arguing?) at cross purposes. I agree that arbitrary filtering of data is … arbitrary, and is likely to lead to erroneous results. However, there are powerful tools available for identifying systems and propagating their observed characteristics into the future, as I advocated here. Nobody that I know of is doing or has done this so, if you are disparaging methods which have been used, it is likely I agree with you.