Guest Post by Willis Eschenbach
People sometimes ask why I don’t publish in the so-called scientific journals. Here’s a little story about that. Back in 2004, Michael Mann wrote a mathematically naive piece about how to smooth the ends of time series. It was called “On smoothing potentially non-stationary climate time series“, and it was published in Geophysical Research Letters in April of 2004. When I read it, I couldn’t believe how bad it was. Here is his figure illustrating the problem:
Figure 1a. [ORIGINAL CAPTION] Figure 1. Annual mean NH series. (blue) shown along with (a) 40 year smooths of series based on alternative boundary constraints (1) – (3). Associated MSE scores favor use of the ‘minimum roughness’ constraint.
Note the different colored lines showing different estimates of what the final averaged value will be, based on different methods of calculating the ends of the averages. The problem is how to pick the best method.
I was pretty naive back then. I was living in Fiji for one thing, and hadn’t had much contact with scientific journals and their curious ways. So I innocently thought I should write a piece pointing out Mann’s errors, and suggesting a better method. I append the piece I wrote back nearly a decade ago. It was called “A closer look at smoothing potentially non-stationary time series.”
My main insight in my paper was that I could actually test the different averaging methods against the dataset by truncating the data at various points. By doing that you can calculate what you would have predicted using a certain method, and compare it to what the true average actually turned out to be.
And that means that you can calculate the error for any given method experimentally. You don’t have to guess at which one is best. You can measure which one is best. That was the insight that I thought made my work worth publishing.
Now, here comes the story.
I wrote this, and I submitted it to Geophysical Research Letters at the end of 2005. After the usual long delays, they said I was being too hard on poor Michael Mann, so they wouldn’t even consider it … and perhaps they were right, although it seemed pretty vanilla to me. In any case, I could see which way the wind was blowing. I was pointing out the feet of clay, not allowed.
I commented about my lack of success on the web. I described my findings over at Climate Audit, saying:
Posted Oct 24, 2006 at 2:09 PM
[Mann] recommends using the “minimum roughness” constraint … apparently without noticing that it pins the endpoints.
I wrote a reply to GRL pointing this out, and advocating another method than one of those three, but they declined to publish it. I’m resubmitting it.
So, I pulled out everything but the direct citations to Mann’s paper and resubmitted it basically in the form appended below. But in the event, I got no joy on my second pass at publishing it either. They said no thanks, not interested, so I gave up. I posted it on my server at the time (long dead), put a link up on Climate Audit, and let it go. I was just a guy living in Fiji and working a day job, what did I know?
Then a year later, in 2007 Steve McIntyre posted a piece called “Mannomatic Smoothing and Pinned End-points“. In that post, he also discussed the end point problem.
And now, with all of that as prologue, here’s the best part.
In 2008, after I’d foolishly sent my manuscript entitled “A closer look at smoothing potentially non-stationary time series” to people who turned out to be friends of Michael Mann, Dr. Mann published a brand new paper in GRL. And here’s the title of his study …
“Smoothing of climate time series revisited”
I cracked up when I saw the title. Yeah, he better revisit it, I thought at the time, because the result of the first visit was swiss cheese.
And what was Michael Mann’s main insight in his new 2008 paper? What method did he propose?
“In such cases, the true smoothed behavior of the time series at the termination date is known, because that date is far enough into the interior of the full series that its smooth at that point is largely insensitive to the constraint on the upper boundary. The relative skill of the different methods can then be measured by the misfit between the estimated and true smooths of the truncated series.”
In other words, his insight is that if you truncate the data, you can calculate the error for each method experimentally … curious how that happens to be exactly the insight I wasted my time trying to publish.
Ooooh, dear friends, I’d laughed at his title, but when I first read that analysis of “his” back in 2008, I must admit that I waxed nuclear and unleashed the awesome power that comes from splitting the infinitive. The house smelled for days from the sulfur fumes emitted by my unabashed expletives … not a pretty picture at all, I’m ashamed to say.
But before long, sanity prevailed, and I came to realize that I’d have been a fool to expect anything else. I had revealed a huge, gaping hole in Mann’s math to people who were obviously his friends … and while for me it was an interesting scientific exercise, for him it represented much, much more. He could not afford to leave the hole unplugged or have me plug it.
And since I had kindly told him how to plug the hole, he’d have been crazy to try something else. Why? Because my method worked … hard to argue with success.
The outcome also proved to me once again that I could accomplish most anything if I didn’t care who got the credit.
Because in this case, the sting in the tale is that at the end of the day, my insights on how to deal with the problem did get published in GRL. Not only that, they got published by the guy who would have most opposed their publication under my name. I gotta say, whoever is directing this crazy goat-roping contest we call life has the most outré, wildest sense of humor imaginable …
Anyhow, that’s why I’ve never pushed too hard to try to publish my work in what used to be scientific journals, but now are perhaps better described as the popular science magazines. Last time I tried, I got bit … so now, I mostly just skip getting gnawed on by the middleman and put my ideas up on the web directly.
And if someone wants to borrow or steal or plagiarise my scientific ideas and words and images, I say more power to them, take all you want. I cast my scientific ideas on the electronic winds in the hope that they will take root, and I can only wish that, just like Michael Mann did, people will adopt my ideas as their own. There’s much more chance they’ll survive that way.
Sure, I’d prefer to get credit—I’m as human as anyone, or at least I keep telling myself that. So an acknowledgement is always appreciated.
But if you just want to just take some idea of mine and run, sell it under another brand name, I say go for it, take all you want, because I’ve learned my lesson. The very best way to keep people from stealing my ideas is to give them away … and that’s the end of my story.
As always, my best wishes for each of you … and at this moment my best wish is that you follow your dream, you know the one I mean, the dream you keep putting off again and again. I wish you follow that dream because the night is coming and no one knows what time it really is …
[UPDATE] In my above-mentioned comment on Steve McIntyre’s blog, I mentioned the analysis of Mannian smoothing by Willie Soon, David Legates, and Sallie Baliunas, entitled Estimation and representation of long-term (>40 year) trends of Northern-Hemisphere-gridded surface temperature: A note of caution.
Dr. Soon has been kind enough to send me a copy of that study, which I have posted up here. My thanks to him, it’s an interesting paper.
APPENDIX: Paper submitted to GRL, slightly formatted for the web.
A closer look at smoothing potentially non-stationary time series
Willis W. Eschenbach
 An experimental method is presented to determine the optimal choice among several alternative smoothing methods and boundary constraints based on their behavior at the end of the data series. This method is applied to the smoothing of the instrumental Northern Hemisphere (NH) annual mean, yielding the best choice of these methods and constraints.
 Michael Mann has given us an analysis of various ways of smoothing the data at the beginning and the end of a time series of data (Mann 2004, Geophysical Research Letters, hereinafter M2004).
These involve minimizing different boundary conditions at those boundaries, and are called the “minimum norm”, “minimum slope”, and “minimum roughness” methods. These methods minimize, in order, the zeroth, first, and second derivatives of the smoothed average. M2004 describes the methods as follows:
“To approximate the ‘minimum norm’ constraint, one pads the series with the long-term mean beyond the boundaries (up to at least one filter width) prior to smoothing.
To approximate the ‘minimum slope’ constraint, one pads the series with the values within one filter width of the boundary reflected about the time boundary. This leads the smooth towards zero slope as it approaches the boundary.
Finally, to approximate the ‘minimum roughness’ constraint, one pads the series with the values within one filter width of the boundary reflected about the time boundary, and reflected vertically (i.e., about the ‘‘y’’ axis) relative to the final value. This tends to impose a point of inflection at the boundary, and leads the smooth towards the boundary with constant slope.” (M2004)
 He then goes on to say that the best choice among these methods is the one that minimizes the mean square error (MSE) between the smoothed data and the data itself:
“That constraint providing the minimum MSE is arguably the optimal constraint among the three tested.” (M2004)
 However, there is a better and more reliable way to choose among these three constraints. This is to minimize the error of the final smoothed data point in relation, not to the data itself, but to the actual final smoothed average (which will only be obtainable in the future). The minimum MSE used in M2004 minimizes the squared error between the estimate and the data points. But this is not what we want. We are interested in the minimum mean squared error between the estimate and the final smoothed curve obtained from the chosen smoothing method. In other words, we want the minimum error between the smoothed average at the end of the data and the smoothed average that will actually be obtained in the future, when we have enough additional data to determine the smoothed average exactly.
 This choice can be determined experimentally, by realizing that the potential error increases as we approach the final data point. This is because as we approach the final data point, we have less and less data to work with, and so the potential for error grows. Accordingly, we can look to see what the error is with each method in the final piece of data. This will be the maximum expected error for each method. While we cannot determine this for any data nearer to the boundary than half the width of the smoothing filter, we can do so for all of the rest of the data. It is done by truncating the data at each data point along the way, calculating the estimated value of the final point in this truncated dataset using the minimum norm, slope, and roughness methods, and seeing how far they are from the actual value obtained from the full data set.
 In doing this, a curious fact emerges — if we calculate the average using the “minimum roughness” method outlined above, the “minimum roughness” average at the final data point is just the final data point itself. This is true regardless of the averaging method used. If we reflect data around both the time axis and the y-axis at the final value, the data will be symmetrical around the final value in both the “x” and “y” directions. Thus the average will be just the final data point, no matter what smoothing method is used. This can be seen in Fig. 1a of M2004:
ORIGINAL CAPTION: Figure 1. Annual mean NH series. (blue) shown along with (a) 40 year smooths of series based on alternative boundary constraints (1)–(3). Associated MSE scores favor use of the ‘minimum roughness’ constraint. (Mann 2004)
 Note that the minimum roughness method (red line) goes through the final data point. But this is clearly not what we want to do. Looking at Fig. 1, imagine a “smoothed average” which, for a data set truncated at any given year, must end up at the final data point. In many cases, this will yield wildly inaccurate results. If this method were applied to the data truncated at the high temperature peak just before 1880, for example, or the low temperature point just before that, the “average” would be heading out of the page. This is not at all what we are looking for, so the choice that minimizes the MSE between the data and the average (the “minimum roughness” choice) should not be used.
 Since the minimum roughness method leads to obvious errors, this leaves us a choice between the minimum norm and minimum slope methods. Fig. 2 shows the same data set with the point-by-point errors from the three methods (minimum norm, minimum slope, and minimum roughness) calculated for all possible points. (The error for the minimum roughness method, as mentioned, is identical to the data set itself.)
 To determine these errors, I truncated the data set at each year, starting with the year that is half the filter width after the start of the start of the dataset. Then I calculated the value for the final year of the truncated data set using each of the different methods, and compared it to the actual average for that year obtained from the full data set. I am using a 41-year Gaussian average as my averaging method, but the underlying procedure and its results are applicable to any other smoothing method. I have used the same dataset as Mann, the Northern Hemisphere mean annual surface temperature time series of the Climatic Research Unit (CRU) of the University of East Anglia [Jones et al., 1999], available at http://www.cru.uea.ac.uk/ftpdata/tavenh2v.dat.
Figure 2. Errors in the final data point resulting from different methods of treating the end conditions. The “minimum roughness” method error for the dataset truncated at any given year is the same as the data point for that year.
 The size of the errors of the three methods relative to the smoothed line can be seen in the graph, and the minimum slope method is clearly superior for this data set. This is verified by taking the standard deviation of each method’s point-by-point distance from the actual average. Minimum roughness has the greatest deviation from the average, a standard deviation of 0.110 degrees. The minimum norm method has a standard deviation of 0.065 degrees from the actual average, while the minimum slope’s standard deviation is the smallest at 0.048.
 Knowing how far the last point in the average of the truncated data wanders from the actual average allows us to put an error bar on the final point of our average. Here are the three methods, each with their associated error bar (all error bars in this paper show 3 standard deviations, and are slightly offset horizontally from the final data point for clarity).
Figure 3. Potential errors at the end of the dataset resulting from different methods of treating the end conditions. Error bars represent 3 standard deviations. The minimum slope constraint yields the smallest error for this dataset.
 Note that these error bars are not centered vertically on the final data point of each of the series. This is because, in addition to knowing the standard deviation of the error of each end condition, we also know the average of each error. Looking at Fig. 2, for example, we can see that the minimum norm end condition on average runs lower than the true Gaussian average. Knowing this, we can improve our estimate of the error of the final point. In this dataset, the centre of the confidence limits for the minimum norm will be higher than the final point by the amount of the average error.
3.1 Loess and Lowess Smoothing
 This dataset is regular, with a data point for each year in the series. When data is not regular but has gaps, loess or lowess smoothing is often used. These are similar to Gaussian smoothing, but use a window that encompasses a certain number of data points, rather than a certain number of years.
 When the data is evenly spaced, both lowess and loess smoothing yield very similar results to Gaussian smoothing. However, the treatment of the final data points is different from the method used in Gaussian smoothing. With loess and lowess smoothing, rather than using less and less data as in Gaussian smoothing, the filter window stays the same width (in this case 41 years). However, the shape of the curve of the weights changes as the data nears the end.
 The errors of the loess and lowess averaging can be calculated in the same way as before, by truncating the dataset at each year of the data and plotting the value of the final data point. Fig. 4 shows the errors of the two methods.
 The end condition errors for lowess and loess are quite different, but the average size of the errors is quite similar. Lowess has a standard deviation of .062 from the lowess smoothed data, and loess has a standard deviation of .061 from the loess smoothed data. Fig 5 shows the Gaussian minimum slope (the least error of the three M2004 end conditions), and the lowess and loess smoothings, with their associated error bars.
 Of the methods tested so far, the error results are as follows:
METHOD Standard Deviation of Error Gaussian Minimum Roughness 0.111 Gaussian Minimum Norm 0.065 Lowess 0.062 Loess 0.061Gaussian Minimum Slope 0.048
 Experimentally, therefore, we have determined that of these methods, for this data set, the Gaussian minimum slope method gives us the best estimate of the smoothed curve which we will find once we have enough additional years of data to determine the actual shape of the curve for the final years of data.
3.2 Improved and Alternate Methods
 At least one better method of dealing with the end conditions exists. I call it the “minimum assumptions” method, as it makes no assumptions about the future state of the data. It simply increases the result of the Gaussian smoothing by an amount equal to the weight of the missing data. Gaussian smoothing works by multiplying each data point within the filter width by a Gaussian weight. This weight is greatest for the central point of the filter, and decreases in a Gaussian “bell-shaped” curve for points further and further away from the central point. The weights are chosen so that the total of the weights summed across the width of the filter adds up to 1.
 Let us suppose that as the center of the filter approaches the end of the dataset, the final two weights do not have data associated with them because they are beyond the end of the dataset. The Gaussian average is calculated in the usual manner, by multiplying each data point with its associated weight and summing the weighted data. The final two points, of course, do not contribute to the total, as they have no data associated with them.
 However, we know the total of the weights for the other data points. Normally, all of the weights would add up to 1, but as we approach the end of the data there are missing data points within the filter width. Their total of the existing data points might only be say 0.95, instead of 1. Knowing that we only have 95% of the correct weight, we can approximate the correct total by dividing the sum of the existing weighted data points by 0.95. The net effect of this is a shifted weighting which, as the final data point is approached, shifts the center of the weighting function further and further forwards toward the final data point.
 The standard deviation of the error of the minimum slope method, calculated earlier, was 0.048. The standard deviation of the error of the minimum assumptions method is 0.046. This makes it, for this data set, the most accurate of the methods tested. Fig. 6 shows of the difference between these two methods at the end of the data set.
 We can also improve upon an existing method. The obvious candidate for improvement is the minimum norm method. It has been calculated by padding the data with the average of the full dataset, from the start to the end of the data. However, we can choose an alternate interval on which to take our average. We can calculate (over most of the dataset) the error resulting from any given choice of interval. This allows us to choose the particular interval that will minimize the error. For the dataset in question, this turns out to be padding the end of the dataset with the average of the previous 5 years of data. Fig 7 shows the individual errors from this method, compared with the minimum assumptions method. Since the results from the two very different methods are quite similar, this increases confidence in the conclusion that these are the best of the alternatives.
 The standard deviation of the error from the minimum norm with a 5-year average is slightly smaller than from the minimum assumptions method, 0.045 versus 0.046.
 I have presented a method for experimentally determining which of a number of methods yields the closest approximation to a given smoothing of a dataset at the ends of the dataset. The method can be used with most smoothing filters (Gaussian, loess, low-pass, Butterworth, or other filter). The method also experimentally determines the average error and the standard deviation of the error of the last point of the dataset. Although the Tuned Minimum Norm method yields the best results for this dataset, this does not mean that it will give the best results for other datasets. It also does not mean that the Tuned Minimum Norm method is the best smoothing method possible; there may be other smoothing methods out there, known or unknown, which will give a better result on a given dataset.
 The method for experimentally determining the smoothing method with the smallest end point error is as follows:
1) For each data point for which all of the data is available to determine the exact smoothed average, determine the smoothed result which would be obtained by each candidate method if that data point were the final point of the data. (While this can be done by truncating the data at each point, padding the data if required, and calculating the result, it is much quicker to use a modified smoothing function which simply treats each data point as if it were the last point of the dataset and applies the required padding.)
2) For each of these data points, subtract the actual smoothed result of the given filter at that point from the smoothed result of treating that point as if it were the final point. This gives the error of the smoothing method for the series if it were truncated at that data point.
3) Take the average and the standard deviation of all of the errors obtained by this analysis.
4) Use the standard deviation of these errors to determine the best smoothing method.
5) Use the average and the standard deviation of these errors to establish confidence limits at the final point of the smoothed data.
1) The Minimum Roughness method will always yield the largest standard deviation of the end point error in relation to the smoothed data, and is thus the worst method to choose.
2) For any given data set, the best method can be chosen by selecting the method with the smallest standard deviation of error as measured on the dataset itself.
3) The use of an error bar at the end of the smoothed average allows us to gauge the reliability of the smoothed average as it reaches the end of the data set.