Guest Post by Willis Eschenbach
Anthony Watts, Lucia Liljegren , and Michael Tobis have all done a good job blogging about Jeff Masters’ egregious math error. His error was that he claimed that a run of high US temperatures had only a chance of 1 in 1.6 million of being a natural occurrence. Here’s his claim:
U.S. heat over the past 13 months: a one in 1.6 million event
Each of the 13 months from June 2011 through June 2012 ranked among the warmest third of their historical distribution for the first time in the 1895 – present record. According to NCDC, the odds of this occurring randomly during any particular month are 1 in 1,594,323. Thus, we should only see one more 13-month period so warm between now and 124,652 AD–assuming the climate is staying the same as it did during the past 118 years. These are ridiculously long odds, and it is highly unlikely that the extremity of the heat during the past 13 months could have occurred without a warming climate.
All of the other commenters pointed out reasons why he was wrong … but they didn’t get to what is right.
Let me propose a different way of analyzing the situation … the old-fashioned way, by actually looking at the observations themselves. There are a couple of oddities to be found there. To analyze this, I calculated, for each year of the record, how many of the months from June to June inclusive were in the top third of the historical record. Figure 1 shows the histogram of that data, that is to say, it shows how many June-to-June periods had one month in the top third, two months in the top third, and so on.
Figure 1. Histogram of the number of June-to-June months with temperatures in the top third (tercile) of the historical record, for each of the past 116 years. Red line shows the expected number if they have a Poisson distribution with lambda = 5.206, and N (number of 13-month intervals) = 116. The value of lambda has been fit to give the best results. Photo Source.
The first thing I noticed when I plotted the histogram is that it looked like a Poisson distribution. This is a very common distribution for data which represents discrete occurrences, as in this case. Poisson distributions cover things like how many people you’ll find in line in a bank at any given instant, for example. So I overlaid the data with a Poisson distribution, and I got a good match
Now, looking at that histogram, the finding of one period in which all thirteen were in the warmest third doesn’t seem so unusual. In fact, with the number of years that we are investigating, the Poisson distribution gives an expected value of 0.2 occurrences. In this case, we find one occurrence where all thirteen were in the warmest third, so that’s not unusual at all.
Once I did that analysis, though, I thought “Wait a minute. Why June to June? Why not August to August, or April to April?” I realized I wasn’t looking at the full universe from which we were selecting the 13-month periods. I needed to look at all of the 13 month periods, from January-to-January to December-to-December.
So I took a second look, and this time I looked at all of the possible contiguous 13-month periods in the historical data. Figure 2 shows a histogram of all of the results, along with the corresponding Poisson distribution.
Figure 2. Histogram of the number of months with temperatures in the top third (tercile) of the historical record for all possible contiguous 13-month periods. Red line shows the expected number if they have a Poisson distribution with lambda = 5.213, and N (number of 13-month intervals) = 1374. Once again, the value of lambda has been fit to give the best results. Photo Source
Note that the total number of periods is much larger (1374 instead of 116) because we are looking, not just at June-to-June, but at all possible 13-month periods. Note also that the fit to the theoretical Poisson distribution is better, with Figure 2 showing only about 2/3 of the RMS error of the first dataset.
The most interesting thing to me is that in both cases, I used an iterative fit (Excel solver) to calculate the value for lambda. And despite there being 12 times as much data in the second analysis, the values of the two lambdas agreed to two decimal places. I see this as strong confirmation that indeed we are looking at a Poisson distribution.
Finally, the sting in the end of the tale. With 1374 contiguous 13-month periods and a Poisson distribution, the number of periods with 13 winners that we would expect to find is 2.6 … so in fact, far from Jeff Masters claim that finding 13 in the top third is a one in a million chance, my results show finding only one case with all thirteen in the top third is actually below the number that we would expect given the size and the nature of the dataset …
w.
Data Source, NOAA US Temperatures, thanks to Lucia for the link.
Willis Eschenbach says:
July 11, 2012 at 3:54 pm
Dr Burns says:
July 11, 2012 at 2:32 pm
Willis,
You claim “it is indeed a Poisson process”. Don Wheeler, one of the world’s leading statisticians points out “The numbers you obtain from a probability model are not really as precise as they look.” A Burr distribution can be made to look almost identical to a Poisson but give very different results in the tails.
Yes, I know, and there are other distributions that are similar as well … but in this particular case, the Poisson distribution gives very good results in the tails.
But gives a mean of 5.2 instead of the known mean for that process, if it were Poisson, of 4.33, an error of 20%.
1 in 1.6 million is extremely common. Everytime they draw the lottery of 6 numbers from 49 in the UK the winning streak is about 1 in 14 million chance but it happens every week ;>)
Nice one Willis.
KR says:
July 11, 2012 at 2:12 pm
“…Lucia’s (re-)estimate is less than a 1:100,000 chance … Masters made the mistake of not accounting for autocorrelation, and appears to be off by at least an order of magnitude as a result. Eschenbach is using the wrong model, and appears to be off by perhaps five orders of magnitude as a result.”
Eschenbach is 2.6 in 1374. Compared to 1 in 100,000, that is off by two orders of magnitude. However, Lucia said “less than”, so assuming she is right, you still can’t say for sure. I wouldn’t have bothered commenting because, as I have stated preciously, this is a tempest in a teapot. But, the snark was kind of annoying.
Willis Eschenbach says:
July 11, 2012 at 2:47 pm
“I checked on that by seeing if my results were autocorrelated. They were not, in fact they were slightly negatively correlated (-.15).”
That said, this was painful to read. An autocorrelation is generally a multi-valued function comprised of expected values of lagged products.
Phil Said…..But gives a mean of 5.2 instead of the known mean for that process, if it were Poisson, of 4.33, an error of 20%.
This depends…Was each month ranked into Terciles based only on the months preceding it? Or was it ranked based on all of the data?
Willis
please don’t bite my head off I’m only trying to help.
On the method of autocorrelation can I suggest:
1) Create a data series where the number of months that satisfy your criteria (top third) are recorded for each year.
2) Run an autocorrelation function (e.g. http://en.wikipedia.org/wiki/Correlogram):
3) Alternatively if you don’t want to go through step 2 (long winded or write your own code). If you’re familiar with Excel you could do an FFT of data series outlined in 1. This will give you a series of complex numbers. Then using the IMABS() function get the power of each output. If you then take these cells and compute another FFT for this you will get the correlogram and again you’ll be looking for the characteristic autocorrelated signature. See add-ins in Excel for FFT.
You put so much work into your posts and people seem to spend most of the time nit-picking so I’m just trying to help. I do try to give support but something always seems to get lost in translation.
Anyways off to bed. Good night. But look forward to hearing from you if this helps.
Willis Eschenbach – Again, the wrong model. A Poisson distribution is for the number of events occurring in independent sampling intervals, and you haven’t defined a sampling interval – in fact, you have different sampling intervals for each bin. A binomial distribution for successive runs would be closer, but still not account for autocorrelation.
Masters computed straight probabilities, without autocorrelation, for 13 successive months to be in the top 1/3 of temperatures – 1:1.6×10^6 is definitely too high. Tamino calculated for normalized distributions and got 1:5×10^5, which he notes is certainly a bit high due to inter-month correlations, but is probably at least close. Lucia ran a Monte Carlo simulation, and got values around 1:1×10^5, although if you give a generous helping of uncertainty to early temperature records she feels it might fall as low as 1:2×10^3 – and that’s almost certainly too low an estimate.
What you have done, essentially, is to state that the observations fall very close to a curve that is fit … to those very same observations, with a ratio near 1:1. That’s not a probability analysis, Willis, it’s a tautology. And it says exactly nothing.
Forgot to mention that you’ll obviously need powers of two data series to run the FFT tool. You can just simply pad your series out to this if it isn’t. Or use a DFT instead. You’ll need to find one or I can write something for you if you supply the data (not on work time I hasten to add).
Bart – That 2.6 in 1374 means only slightly over a 1:1 probability during the period of observation.
Of course, since that’s a prediction of observations made from a curve fit to those observations, the fact that the observations fall close to that curve is totally unsurprising. What it is not, however, is an estimate of the probability of 13 months of successive top 1/3 range months in a row in a stationary process with stochastic variation.
I’m going to go with Lucia’s Monte Carlo estimates on this one – a 1:166,667 chance for this occurrence for evenly supported data, with a fairly hard lower bound of 1:2000 if you assume that all of the early data is rather horribly uncertain.
Willis says:
To make sure I understand correctly:
The second plot above is for the number of months within a 13-month period that fall in the warmest 1/3, correct? Not the number of *contiguous* months which each fall into the top 1/3rd?
If so, then no, it doesn’t show the occurrences of ten and eleven and twelve months being in the warmest third. Because that 10 warms months could be 3 warm months, 3 cool ones, then 7 more warm ones. (Etc.). Obviously, periods like that will be more common than periods of 10 strictly consecutive months.
Hmm. I think that if you’re measuring the number of months within the top 1/3rd of months so far, instead of within all months, that you’re going to get skewed numbers. Or, numbers with a different purpose than these, at least.
Here’s the simplified example. Let’s say we have a linear, positive trend with a small bit of noise. Then every few years, we’ll hit new records. Likewise, we’ll have a disproportionately high number of months within the top third. If the noise is small enough, then nearly *every* month will be in the top third, as the temperature trends higher and higher. And because of the positive trend, you’ll have an insanely high number of 5- or 10- or however-long periods of consecutive months within the top 1/3rd.
Obviously, you couldn’t look at the high frequency of, say, 9-month hot streaks in this scenario and say “this means that 10-month hot streaks would be uncommon if the temperature was flat”. Because those 9-month hot streaks came from an ever-rising trend (which distorts their probability), they tell you nothing about the probability of 10-month hot streaks within a flat trend.
Phil. says:
July 11, 2012 at 3:41 pm
What we know is that the mean of the data is 5.2. But all that means is that p is not 1/3 as you claim, it’s some larger number, to wit, 5.2/13. It doesn’t mean that we are not looking a Poisson distribution. It just means that your estimate of p is incorrect.
Why? Because the earth is warming, obviously, so the chances of being in the warmest third are greater than if it were stationary. But again, that doesn’t mean the distribution is not Poisson. It just means that your estimate of “p” is wrong.
Finally, my finding that an iterative fit gives a value of lambda almost identical to the mean of the dataset itself is strong evidence that the dataset does in fact have a Poisson distribution.
w.
It’s rather common for people to underestimate the true probability of streaks. While it is true that for p = 1/3, the probability of any particular streak is (1/3)^13, that is not the cumulative probability for all possible streaks over the long run (refer to http://en.wikipedia.org/wiki/Gambler%27s_fallacy#Monte_Carlo_Casino). As Willis correctly points out, we cannot choose our start and end points arbitrarily.
The correct calculation algorithm for independent events is quite a bit more complicated and is described here: http://marknelson.us/2011/01/17/20-heads-in-a-row-what-are-the-odds/. We can use an on-line calculator here: http://www.pulcinientertainment.com/info/Streak-Calculator-enter.html.
If we define a “win” as an event in the top third historically (i.e. p = 1/3), then over 1392 consecutive trials (months), the probability of 13 consecutive wins would be 0.06% or 1 in about 1730. Clearly, Jeff Masters vastly underestimates the streak probability. Remember, this approach assumes perfectly independent trials akin to the expected probability of win streak while making only column bets on a roulette wheel). Considering that weather patterns are not independent events, we should be able to safely conclude that this calculation provides the lower bound for the true probability.
Using the Poisson distribution on these data tells us that there is a probability of about 0.07% of a random 13-month period having 14 months that are in the top 1/3. This is clearly not possible. Also, the calculated value of lambda (5.213) is wildly different from the mean (4.333). Both of these should have been sufficient taken alone to convince you that the Poisson distribution is not an appropriate model.
There is an endless stream of alarmist climate stories in the MSM. What are the odds of that if the journalism is fair? You can fool all of the MSM journalists, all of the time.
Then I’m unsure that you’re calculating the same thing as Lucia and Masters. They were considering what the probability of this streak would be in a non-warming world. In such a world, p=1/3.
Again, you can’t fit data to a warming-world scenario, then use that for the probability in an untrended world.
Bart says:
July 11, 2012 at 4:34 pm
Yes, I know that. However, the lag(1) autocorrelation is what is generally quoted to indicate the overall degree of autocorrelation, so I was just following what is common practice in the field. For example, see Lucia’s comment, where she simply quotes the lag(1) autocorrelation rather than specifying the full autocorrelation vector.
w.
Steve R says:
July 11, 2012 at 4:38 pm
Based only on the historical record of the months preceding it, and not based on the future temperatures yet to come …
w.
KR says:
July 11, 2012 at 4:56 pm
Thanks, KR. I thought that the sampling interval is the month in question plus the 12 preceding months … how is that not a sampling interval? And how is an interval that is always 13 months long a “different sampling interval for each bin”?
No. What I have done is to state that the observations fall very, very close to the numbers we would expect if the data were Poisson distributed.
I then used that distribution to try to understand the odds of finding 13 months that fall into the warmest third.
Consider the numbers of occurrences of 10 and 11 and 12 months in the warmest third. Respectively, there are 31, 14, and 6 of these in 1,374 different 13-month intervals in the record. Not only that, but they (and all of the results) are very close to the numbers we would expect if the distribution is Poisson.
Now, the 10, 11, and 12 cases occurred 2.9%, 0.9%, and 0.1% of the time. The Poisson distribution says that they would be expected to occur 2.1%, 1.0%, and 0.4% of the time.
I don’t know what claims you want to make for your method, because you haven’t yet said what your method is. But whatever method it is, it needs to predict the 10, 11, and 12 cases better than my method. My method is bozo simple, I admit that. But it also is very good at predicting how many of a certain result you will find.
For example, knowing just the June results, I can accurately predict the prevalence in the entire dataset of occurrences of 12 in the warmest … despite the fact that there are no occurrences in the June dataset of 12 in the warmest. Can your method do that?
And no, that prediction for 12 in the warmest is not one in a million or one in a hundred thousand or anything like that. It’s about four in a thousand. So why on earth would you expect the estimate for 13 in the warmest to be on the order of 1:10^4 or 1:10^5?
It seems to me that you are trying to calculate the odds of something other than the actual dataset that we are examining. As a result, you are making assumptions which are not true for this dataset.
I, on the other hand, am saying “given what we know about this dataset, what are the odds this dataset would contain 13 months in the warmest third?” It turns out that, for this dataset, the odds are not that bad, and they certainly are not one in 1.6 million.
In fact, for this dataset, there is better than a 50/50 chance that by now we would have found one or more groups of 13 months in the top third.
w.
Willis Eschenbach – If you have used 13 month intervals for each bin, then I would have to say I misinterpreted your post in that respect. But that’s really pretty irrelevant to the core problem.
You have fit a Poisson distribution (which is prima facie invalid, as what you are looking at is the expectation of a normal distribution of temperatures and the co-occurrence of 13 autocorrelated months in a row in a particular range, rather than collections of independent Poisson events in evenly sampled bins) to observations, and then used that fit to describe the observations.
Amazingly, the observations fit the curve that is matched to the observations – a tautology since any set of observations will closely match a curve fit directly to them. It doesn’t matter if you have fit a Poisson distribution, a binomial distribution, or the shape of your favorite baseball cap or a for that matter a 1967 VW Beetle. This says exactly nothing about a stationary process with stochastic noise, which is what Masters compared the last 13 months to. You cannot fit observations to a descriptive curve and then make judgements about the observations without looking at those expectations of the observations and how they behave in respect to those expectations. Which is something you have not done.
Have you analyzed expectations of a stochastic process? No. You have only compared the observations to the observations, and come up with a nearly 1:1 relationship. Not surprising.
I hate to say it, but your analysis has absolutely nothing to do with a process with stochastic, normally distributed variations, such as the temperature record.
Word of the day: Tautology
I am inordinately fond of of this thread. It is a microcosm of the model verses reality discussion. I think the streak post by ZP at 5:43 is very on-point. It has been 32 years since my last statistics class and I do not use statistics in my work. I finally settled on this analysis to form my own opinion on the “truth”.
The N in M problem for a binomial distribution was pretty standard. So if you take a probability for an event (1 in 1.5 millionish) and run x number of trials (1374), then the probability of the event occurring increases with each trial. Cranking the numbers through the binomial formula gives 1 in 1161 (and I probably have an error somewhere) of the event occurring once and only once. Note that this answer describes a different problem from all of the other answers discussed but it illustrates Dr. Master’s and NCDCs original error – they only considered one trial. The error is of course compounded by projecting out a gazillion years.
The streak calculator ZP points to is also a binomial view of the world and attacks the problem in a more sophisticated way than my sanity check and yields a 1 in 1964 chance. I have no idea why the streak result is different than my sanity check.
Lucia addresses different questions based in a couple of different models. The key point is that the calculations are models. In addition, they were based on a real world temperature trend of 0 and a modeled auto-correlation factor.
Willis addresses a somewhat different question – given the real world properties of this data, what is the probability of the event. He does not try to take out any real world temperature trend nor define an auto-correlation factor. He looks at the curve.
I would be interested in the curves for 12 in 12, 11 in 11, 10 in 10 and 9 in 9 to see if the 13 in 13 curve properties holds for them. Not interested enough to do the work myself of course…..
p.s. I think Willis’s analysis compared to Lucia’s reasonable model confirms a real world temperature trend.
KR says:
July 11, 2012 at 7:38 pm
All are 13 months.
Perhaps you are under the mistaken impression that we are looking at a “normal distribution of temperatures”. Me, I have found that climate datasets are rarely normally distributed. In this case, the Jarque-Bera test resoundingly rejects your idea that the dataset is normal.
So does the Shapiro-Wilk test
I have examined the results to see if they have a Poisson distribution, with lambda equal to the mean of the data. The mean of the data is 5.17. That fits the data quite well. An iterative fit of the Poisson distribution to the data gives a lambda of 5.2.
In addition, the Kolmogorov-Smirnov test strongly rejects the results (not the data but the results) having a normal distribution:
It also rejects it being a binomial distribution:
But it fails to reject it being a Poisson distribution:
So I’m using a bog-standard Poisson distribution, with lambda equal to the mean of the results … and as you can see from the graph, a bog-standard Poisson distribution fits the data exactly.
Nonsense. Try an experiment. Take a normal Gaussian dataset, and use the mean of that dataset as “lambda” to define a Poisson distribution. Or since you’ll find that the mean won’t work, try to use an iterative fit to shoehorn a Gaussian distribution into a Poisson curve … come back and tell us how absurdly bad the fit is. So it is not the case that “any set of observations will closely match a curve fit directly to them”. You can’t fit a Poisson distribution to a normal dataset and get a “close match”, no matter how directly you fit it.
Perhaps you and Jeff are foolish enough to think that we are looking at a “stationary process with stochastic noise”. I’m not.
I’m also not foolish enough to think that it doesn’t matter what kind of distribution you are using.
I have not “fit observations to a descriptive curve”. I have gone through the normal process of trying to determine what kind of a distribution we’re looking at, something that you have given far too little thought to. I have determined that the distribution is best described as a Poisson distribution, although I’m happy to be shown wrong.
So … how about you quit claiming I’m wrong when I say the data has a Poisson distribution, and instead show that I’m wrong. What distribution do you think we’re looking at? Not the distribution of the data, of course, but the distribution of the results. Because thats all I’m doing, answering that question. I’m not “fitting” anything. I’m trying to understand the distribution of the answers, so I can see how likely certain answers might be.
You are 100% right that my analysis has nothing to do with “stochastic, normally distributed variations” … but you are way wrong if you think that describes the temperature record. It is not normally distributed, it is not stochastic, and most important, it is not stationary.
Words of the day: Unpleasantly Patronizing.
You haven’t thought this all the way through, and yet you want to lecture me as though I were an idiot. We could be having a discussion about it, but instead, you babble about “tautologies” without realizing that determining the distribution of the answers is a hugely important step. You make inane claims about “stochastic, normally distributed variations” without making the most rudimentary checks to see if we actually are dealing with stochastic normally distributed variations (protip: we’re not) … and yet you want to lecture me? Medice, cura te ipsum!
w.
Something just seems wrong. 10’s of thousands left their farms and lives behind to escape the dust bowl conditions of the 30’s. We’ve all seen the pictures of the total devastation. I find it extremely difficult to believe that the past 13 months are anywhere close to the conditions in those days. I mean yes, its been hot, but even in my 50 year experience, I would hesitate to say this is the worst I’ve seen. Have we actually blown away all the record high’s set back in those days? Somehow I doubt it.
Willis,
Your choice of a Poisson distribution has been criticised, not least because it gives a finite probability for getting 14 months out of 13. And if it gets that tail value wrong, 13/13 is a worry too.
In fact, the Poisson is just the limiting form of the binomial for events of low probability. So the binomial for 13 would look quite like a Poisson anyway, and doesn’t have this issue. So you might as well use it.
In fact, that’s just what Masters did, with p=1/3. In effect, you’re regarding this p as a fittable parameter, rather than understood from first principles. And when fitted, it comes out to something different.
That discrepancy is an issue, but I think in any case if you do want to fit a distribution, the binomial is better.
Then I’m unsure that you’re calculating the same thing as Lucia and Masters. They were considering what the probability of this streak would be in a non-warming world. In such a world, p=1/3.
Again, you can’t fit data to a warming-world scenario, then use that for the probability in an untrended world.
This was discussed some on today’s thread on John N-G’s blog. I objected, fairly strenuously, to the claim, as being a lousy use of statistics. John explained to me that the reason it was published was that 10-35% percent of all (Americans? Humans?) still don’t believe that there has been a warming trend over the last 150 years at all. Lucia and Masters, as you say, assumed no warming trend — more or less straight up independent trials and no autocorrelation, which will then damn skippy make the result very unlikely — results that hold for an imaginary planet with temperatures per month that are pulled out of a hat around some mean from a distribution with some width, which is even more unlikely.
So the “point” is to convince those holdouts that the Earth is in a warming trend at all.
To me this is bizarre in so very many ways. I pointed out that Willis was if anything too kind. To even begin to estimate the correct probability of the outcome, one has to do many things — account for a monotonic or near monotonic warming or cooling trend, both of which would make runs in the top 1/3 more likely depending on the noise (at one or the other end of the trended data). At the moment, following 150 years of global warming post the Dalton minimum, of course it isn’t even close to as unlikely as a flat temperature plus noise estimate will produce. Then, just as Willis averaged over all possible starting months, one similarly has to average over all possible US sized patches of the Earth’s surface (and all possible starting points). The US is roughly 1/50 of the Earth, so even if you do a mutually exclusive partitioning, you get fifty chances in a year right there, and if you use sliding windows looking for any patch where it is true you get far more.
Then, there are places on the Earth’s surface that beat the flat odds all the time. The patch of ocean where El Nino occurs, for example, is roughly the area of the US. Very roughly once a decade it warms up by 0.5-0.9C (compared to the usual monthly temperature the rest of the time) on the surface, and typically stays that way for 1-2 years. It therefore produces this “unusual” event approximately once a decade, very probably almost independent of any superimposed warming or cooling trends.
Curiously, John agreed with me on basically everything, including the fact that the observation is basically meaningless except as proof that we are in a warming trend, which anybody that can actually read a graph can see anyway (and the ones that are going to “deny” that graph aren’t going to be convinced by a little thing like bad, almost deliberately misleading statistics).
Have we really reached the point in climate science where the ends justify the means? Should we be trying to convince young earth creationists that evolution is true and the Universe is old by making egregious and irrelevant claims now, or should we rely on things like radiometric dating and measuring distances to distant stars and galaxies?
This is really a lot more like their arguments with evolutionary biologists. If we shake a box full of “stuff”, it is absurdly improbable that a fully formed organism will fall out, therefore God is necessary. The former is true, and yet horribly misleading and certainly neither proves the consequent nor disproves the mechanism of evolution in any way, but it certainly does emphasize the surprising difference between randomness and structure.
Is this not the exact same argument? In conditions that everybody knows do not hold or pertain to the issue of climate we make an egregious but true statement that is phrased in such a way as to make one think that something important has been proven, that the event in question was really unlikely at the level indicated given the actual data of a near monotonic increase in temperature across the entire thermal record! As if it mattered.
So in retrospect, I will withdraw my earlier conclusion that the result was erroneous. It is perfectly correct.
Which is worse. Being mistaken is forgivable. Deliberately using statistics to mislead people in a political discussion is, well, less forgivable.
So Willis, a question. Suppose you take the data and (say) chop it off at the end of the flattened stretch from the 40’s through the 70s. That’s thirty years or so when the temperature was fairly uniformly in the top third of the shorter data set. You might get lucky and find a stretch of 13 consecutive months in there that are all in the top third too — just not in the current top third. I wonder what one could say to that — the same miracle occurring twice in one single dataset (and quite possibly in a stretch where the temperature was steady or decreasing from the 40’s peak).
rgb
“So Willis, a question. Suppose you take the data and (say) chop it off at the end of the flattened stretch from the 40′s through the 70s. That’s thirty years or so when the temperature was fairly uniformly in the top third of the shorter data set. ”
Well, the limiting case is to truncate the data to 13 months for a probability of 1. Also 1 for all 13 years in the middle and lower thirds. So the nature of the distribution changes over time even if the trend is flat. My head started to hurt so I dropped back to simple N of M analysis.