By William M. Briggs, professional statistician

“J’accuse! A statistician may prove anything with his nefarious methods. He may even say a negative number is positive! You cannot trust anything he says.”
Sigh. Unfortunately, this oft-hurled charge is all too true. I and my fellow statisticians must bear its sad burden, knowing it is caused by our more zealous brethren (and sisthren). But, you know, it really isn’t their fault, for they are victims of loving not wisely but too well their own creations.
First, a fact. It is true that, based on the observed satellite data, average global temperatures since about 1998 have not continued the rough year-by-year increase that had been noticed in the decade or so before that date. The temperatures since about 1998 have increased in some years, but more often have they decreased. For example, last year was cooler than the year before last. These statements, barring unknown errors in the measurement of that data, are taken as true by everybody, even statisticians.
Th AP gave this data—concealing its source—to “several independent statisticians” who said they “found no true temperature declines over time” (link)
How can this be? Why would a statistician say that the observed cooling is not “scientifically legitimate”; and why would another state that noticing the cooling “is a case of ‘people coming at the data with preconceived notions’”?
Are these statisticians, since they are concluding the opposite of what has been observed, insane? This is impossible: statisticians are highly lucid individuals, its male members exceedingly handsome and charming. Perhaps they are rabid environmentalists who care nothing for truth? No, because none of them knew the source of the data they were analyzing. What can account for this preposterous situation!
Love. The keen pleasures of their own handiwork. That is, the adoration of lovingly crafted models.
Let me teach you to be a classical statistician. Go to your favorite climate site and download a time series picture of the satellite-derived temperature (so that we have no complications from mixing of different data sources); any will do. Here’s one from our pal Anthony Watts.
Now fetch a ruler—a straight edge—preferably one with which you have an emotional attachment. Perhaps the one your daughter used in kindergarten. The only proviso is that you must love the ruler.
Place the ruler on the temperature plot and orient it along the data so that it most pleases your eye. Grab a pencil and draw a line along its edge. Then, if you can, erase all the original temperature points so that all you are left with is the line you drew.
If a reporter calls and asks if the temperature was warmer or colder last year, do not use the original data, which of course you cannot since you erased it, but use instead your line. According to that very objective line the temperature has obviously increased. Insist on the scientificity of that line—say that according to its sophisticated inner-methodology, the pronouncement must be that the temperature has gone up! Even though, in fact, it has gone down.
Don’t laugh yet, dear ones. That analogy is too close to the truth. The only twist is that statisticians don’t use a ruler to draw their lines—some use a hockey stick. Just kidding! (Now you can laugh.) Instead, they use the mathematical equivalent of rulers and other flexible lines.
Your ruler is a model Statisticians are taught—their entire training stresses—that data isn’t data until it is modeled. Those temperatures don’t attain significance until a model can be laid over the top of them. Further, it is our credo to, in the end, ignore the data and talk solely of the model and its properties. We love models!
All this would be OK, except for one fact that is always forgotten. For any set of data, there are always an infinite number of possible models. Which is the correct one? Which indeed!
Many of these models will say the temperature has gone down, just as others will say that it has gone up. The AP statisticians used models most familiar to them; like “moving averages of about 10 years” (moving average is the most used method of replacing actual data with a model in time series); or “trend” models, which are distinct cousins to rulers.
Since we are free to choose from an infinite bag, all of our models are suspect and should not be trusted until they have proven their worth by skillfully predicting data that has not yet been seen. None of the models in the AP study have done so. Even stronger, since they said temperatures were higher when they were in fact lower, they must predict higher temperatures in the coming years, a forecast which few are making.
We are too comfortable with this old way of doing things. We really can prove anything we want with careful choice of models.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
As an engineer (originally) by trade it was always quite simple:
1) 2 data points = straight line
2) 3 data points = a curve
Then I went and did a bunch of statistics (and econometrics) in a Masters and discovered my creative talents. Now I can prove anything you want on demand.
I miss the good ole days 😉
It’s refreshing to see a statistician being honest about his field of expertise.
Now if only the statisticians who work for the TV ratings outfit would be so honest. There’s no way their relatively tiny sample size can be scaled up to accurately calculate how many viewers each TV show had out of the 300+ million people in the most diverse country on Earth.
Well said William.
Hope you are well.
Best regards, Allan
You’ve convinced me to dig out my old Master’s coursework on long-term forecasting. I think I mentioned before in another blog here that I used about 50 or so years of US aluminium (two i’s darn it!) consumption data and applied all kinds of lovely forecast modeling techniques to it.
Things to note:
1) The data was known with a high degree of accuracy (far better than thermometer readings I bet);
2) All models fit past data extremely well (R^2 in the region of 95-98%);
3) Some of the models were purely time trend in nature, others were econometric models (intensity of use models and the like).
The forecast range was phenomenol. Everything from a doubling in US aluminium consumption over the forecast period to dropping down to almost zero.
Modelling is fun!
From the instant post:
“Love. The keen pleasures of their own handiwork. That is, the adoration of lovingly crafted models.”
So true.
And which one of these model lovers is going to come out and say their model is wrong?
What is also important to remember is that mathematical equations are in effect “models”.
String theory is a mathematical model…of what?
The answer is nothing that has been observed & measured.
So, what happens when mathematical models are applied to properties and relationships of physical elements and energy in our environment?
Unless the mathematical models are rigorously defined by actual observation & measurement and the observation & measurement is rigorously quantified and the mathematical models (also called equations) are rigorously and consistently related to the quantified observations & measurements, it is likely, no, it is more than likely, it is assured the mathematical models will be wrong.
Meaningful mathematical models can only be made after observation & measurement have been accomplished.
The mathematical model is analogous to the forward pass in football: Three results can happen; an incomplete pass, an interception, or a completed pass.
So, three things can happen and two of them are bad, but when the third thing happens it is very advantagous:
When mathematical models are correct (based on repeated testing, i.e., further observation & measurement) representations of properties and relationships of physical elements and energy, they help us predict future behavior and allow engineering to manipulate (or harness) these properties and relationtionships of physical elements and energy to create technology for the betterment and enlightenment of Man.
But remember, just like the forward pass in football, two out of three results are bad.
And what’s worse, sometimes an interception (the worst result) can erroneously be called a completion.
Mathematics is a very powerful tool, but like all powerful tools, if misused either intentionally or by mistake, it can lead to very misleading or dangerous outcomes.
When statisticians apply mathematical equations to non-linear, unstable, and complex physical relationships and processes.
Beware of the pick-six going the other way for a touchdown.
Since we are free to choose from an infinite bag, all of our models are suspect and should not be trusted until they have proven their worth by skillfully predicting data that has not yet been seen.
That will be difficult for a model of a random or chaotic system.
James F. Evans (22:57:31) “Beware of the pick-six going the other way for a touchdown.”
Indeed, it is the linear-correlation-destroying phase-reversals that are most interesting, reminding us that “completions” do not have the same definition in all games.
Useful analogy – thank you.
This article basicly supposes data collection is a waste of time. No conclusions can be pulled from them anyway.
RR Kampen (04:18:18) :
“This article basicly supposes data collection is a waste of time. No conclusions can be pulled from them anyway.”
No – I think it is saying that any desired conclusion can be pulled from data.
Re : Jimmy Haigh (05:24:40) :
No – I think it is saying that any desired conclusion can be pulled from data.
That is why data collection is supposed to be a waste of time, of course.
Your remark is fortunately not true as it stands. It is only true that any desired conclusion can be pulled from data given abuse of statistics and statistical methods, and crooked or forged interpretation of these.
In fact this type of remark (or the infamous ‘there are lies, damned lies, statistics’) is characteristically made by people who simply do not understand statistics.
Speaking of models vs. data – has anyone noticed that the October update (of September hemispheric and global temperatures) over at CRU is, as has often been the case of late, very late? The general impression I’ve been getting is that the later the update is reported the more “exotic” we can expect the number to be (by which I mean, the earth will be running a particularly high fever). Also, has anyone else noticed that no matter how late the update is posted (possibly into November this month) that the “Update” is reported as sometime around mid-month?
Just curious…
George E. Smith – the only difference between and integral and an average is a multiplier – but what do I know, I’m only a chemist and a physicist.
It’s a dumb question, obviously, but can somebody tell me what ‘the AP’ is?
Funny how a group of uninvolved statisticians all arrived at the same conclusions, yet are vilified here. I can’t for the life of me understand how people can’t look at the data objectively, and continue to entrench themselves further with confirmation bias (in this case, ignoring substantive findings that conflict with their belief system).
Amazing what the human psyche will go through to maintain it’s version of reality…
Author: Jimmy Haigh
Comment:
RR Kampen (04:18:18) :
“This article basicly supposes data collection is a waste of time. No
conclusions can be pulled from them anyway.”
No – I think it is saying that any desired conclusion can be pulled from data.
That’s not true; but conclusions can often be stated floating free of the assumptions. The idea that the increase in temperature has ceased since 2000 cannot be falsified by performing a linear regression on all the data since 1987. Basically, you would be assuming that there has been no change in the slope; i.e., you are assuming that the Null Hypothesis is true and are simply trying to estimate what that slope is.
Hidden behind this assumption is the assumption that the slope is necessarily linear and constant.
This is not a property of statistics, but of the carelessness or tendentiousness of those using it.
Keep in mind, too, that statistics cannot prove anything; it can only disprove. You can show that the facts are incompatible with the hypothesis, and thus the hypothesis in not true (“true to the facts”); but you can never prove that the hypothesis is true.
We have here a failure to distinguish between interpolation and extrapolation. See Mark Twain, Life on the Mississippi, for details. Twain proves conclusively, by using the measured shortening of his river over the last 200 years, that at one time the Mississippi stretched out into the Gulf of Mexico like a fishing pole, while in a few thousand years New Orleans will be a suburb of Chicago. Just a simple linear mapping!
Only when the underlying mechanism is fully understood (which means full knowledge of the limitations of the model) can one meaningfully extrapolate.
Alas for statisticians. Statistics does not deal in certainties. Statistics only deals with the assignment of likelihood (for example, a 100 year flood plain). This does not mean that if one has gone 99 years without a flood, there will be a flood in year 100.
And significance tests only allow us to accept or reject a null hypothesis. We can assign a likelihood of being wrong on our decision. And that is all.
*****
We could accurately forecast the weather. The expense would be monumental. Simply consider the surface area of Earth and the number of data collection points needed for one per square kilometer. And one would need data collection for at least 10 altitude points above each point on the surface. Talk about petabytes! Talk about gigadollars! Never happen.
After all, we are talking about a thin film here.
Here’s what I said about this AP (Associated Press) story in the prior thread on the topic:
*******
Let’s parse that AP article:
“The statisticians, reviewing two sets of temperature data, found no trend of falling temperatures over time.“
Strawman. 2009 is warmer than 1979 and 1880. But the period between those two start points is not what skeptics have in mind by “over time.” They are referring to the most recent trend.
“And U.S. government figures show that the decade that ends in December will be the warmest in 130 years of record-keeping.”
Another technically correct pseudo-refutation. Since the first half of that period preceded heavy man-made CO2, and therefore warmed from another cause, it indicates there’s a non-anthropogenic component to the long-term warming trend—a component that could still be active. (I.e., the rebound from the LIA.)
“Global warming skeptics are basing their claims on an unusually hot year in 1998.”
Another strawman. Most skeptics (here on WUWT, anyway) don’t choose 1998 as their starting point. Instead, they claim it’s been cooling during the present century, or since 2002, or 2004.
“They say that since then, temperatures have fallen — thus, a cooling trend. But it’s not that simple.”
A red herring (diversion). It IS that simple, because a short-term flattening and cooling trend falsifies the IPCC’s prediction for this decade, casting doubt on its models’ reliability; because it casts doubt on the implacability (and the urgency of the threat) of CO2’s alleged “forcing”; and because the PDO has flattened and turned negative at about the same time, which suggests that the PDO is the climate “forcer,” not CO2.
**********
“when I asked him [Borenstein] why he felt it necessary to *make* news, and then report it, he answered that he was simply fact-checking against recent “internet memes””
The primary “meme” here on WUWT and CA has been that the globe has been cooling slightly for the past five years or so. Borenstein merely knocked down a strawman (a caricatured version of an opponent’s argument) by pointing out that the globe has not been cooling since 1880 and 1979. The fact that this obvious dissembling hasn’t been caught demonstrates the CAWGers lack of critical thought.
PS: Here’s the link to the Amazon thread on How to Lie with Statistics.
http://www.amazon.com/How-Lie-Statistics-Darrell-Huff/dp/0393310728/ref=sr_1_1?ie=UTF8&s=books&qid=1256832335&sr=1-1
“”” Merrick (06:23:29) :
George E. Smith – the only difference between and integral and an average is a multiplier – but what do I know, I’m only a chemist and a physicist. “””
Well Merrick, let’s test your thesis with the often cited absurdity. Shall we place your legs in Dry ice at -80 deg C, and simultaneously put your head in superheated steam at 120 deg C. On average you are at a very comfortanble +20 deg C; but if we integrate (add up) the effects of the “weather events”, the results are quite uncomfortable; fatally so. So what multiplier would you suggest to equate the two situations.
Climate is the sum total of ALL of the weather events that have ever occurred.
Under the average conditions; there would be no weather at all.
“”” Mike the QE (16:37:03) :
@george Smith
The extremal average (M+m)/2 may not be an unbiased estimator if the distribution of values within the day is not symmetric. Even if it is, and even if it is approximately Gaussian*, it is still an inefficient estimator. If there are outliers in the daily data, due to measurement system error or actual +/- spikes, those outliers will be one or the other or both of the extreme values, and so will affect the extremal average, making the estimator subject to considerable variation.
If it is impractical to “integrate” over the entire series of daily temperatures — perhaps temperatures are not taken continuously — it would make more sense to take samples at random times of the day and compute a sample average and sample standard deviation. (Of course, there is the problem of serial correlation…) “””
Well Mike, the simplest situation would be if the daily temperature cycle at a single loaction were a pure sine wave. (M+m)/2 ; to use your terminology, would in fact be the correct average, and moreover would meet the Nyquist criterion for two samples per cycle of the highest signal frquency (in this case only one frequency). But in general exactly two samples per cycle (not M and m) is not sufficient to reconstruct the signal; it’s a degenerate case. For a non sinusoidal but periodic signal, there must be at least a second harmonic component or higher frequency present, so at least four samples per day would be required to at least get the average correct, although once again not to reconstruct the signal. Bear in mind that at exactly two samples per cycle, the samples would be phase locked to the waveform, and would never reveal the waveform; but a slightly higher sample frequency, would slue the sampling, and eventually capture the complete waveform.
But the real daily cycle is not sinusoidal so two samples per day even M and m cannot give a correct average. More importantly the radiated thermal emission, that is a consequence of the temperature, varies more as the 4th power of the temperature, so the daily integrated radiation is always greater than what is calculated from the average temperature; even if you do have the correct average temperature. It’s a small but not zero effect for the daily cycles; but is quite significant for the annual cycle.
No you don’t have to take the temperature every second or minute; but more often that twice daily M and m is worse than crude; and as I have said in the past; that completely ignores the effect of random cloud changes. Gaia does NOT ignore the effect of cloud changes on even the average temperature; let alone on the weather and climate.
sustainableloudoun (08:15:42) :
“Funny how a group of uninvolved statisticians all arrived at the same conclusions, yet are vilified here. I can’t for the life of me understand how people can’t look at the data objectively, and continue to entrench themselves further with confirmation bias (in this case, ignoring substantive findings that conflict with their belief system).
Amazing what the human psyche will go through to maintain it’s version of reality…”
We have seen the actual data, do we need unnamed “uninvolved statisticians” to tell us what we see? We see temperatures not following the predictions of the vaunted climate models. Skeptics favour the null hypothesis in the face of what could charitably be called insufficient evidence for AGW. One only has to look at the statistical atrocities committed by “confirmation bias” prone scientists their pathetic attempts to get rid of the medieval warming period. Appeals to authority get no traction here, skeptics have seem too many authorities disgrace themselves over the so called “settled science”.
@george
That’s pretty much what I said: the extremal average is unbiased under strict conditions not likely met by the distribution of temperatures during the day, and even if it were, it is extremely sensitive to outliers in the data. A proper daily average needs to take samples at multiple times during the day.
From the lack of response to my question, I judge that no-one here is much interested that the original post begins with “facts” that are trivially shown to be false. Do you not wonder what would motivate someone to mislead you so?
Briggs is spot on about statisticians’ love affair with their models. However I wish that he had gone into more detail. A few commentators on this thread have already mentioned the null hypothesis. In my opinion, that’s the elephant in the room. How so?
When we translate from Statisticalese into plain Engish, there’s not always a one-to-one correspondence. Take the word “no”. When a statistician says “no recent cooling trend”, what does he mean? Here’s my educated guess about the meaning of the word “no” in this context:
Null hypothesis: Global temperatures in recent years have not changed appreciably. Given the paucity of global temperature data over the last 11 years, and given the high noise level within that data set, we cannot reject the null hypothesis at the 95% confidence level.
If the statisticians in the AP story had been able to reject the null hypothesis with even 90% confidence, that would still not pass muster; 95% is the magic number. In other words, “no” does not always mean no.
Surely no capable statistician would agree to this analysis before knowing if the figures were truly independent of each other, or if a dependency existed between them. Different statistical tests are appropriate in each case.