
Guest essay By Kip Hansen
INTRO: Statistical trends never determine future values in a data set. Trends do not and cannot predict future values. If these two statements make you yawn and say “Why would anyone even have to say that? It is self-evident.” then this essay is not for you, you may go do something useful for the next few minutes while others read this. If you had any other reaction, read on. For background, you might want to read this at Andrew Revkin’s NY Times Dot Earth blog.
I have an acquaintance that is a fanatical button collector. He collects buttons at every chance, stores them away, thinks about them every day, reads about buttons and button collecting, spends hours every day sorting his buttons into different little boxes and bins and worries about safeguarding his buttons. Let’s call him simply The Button Collector or BC, for short.
Of course, he doesn’t really collect buttons, he collects dollars, yen, lira, British pounds sterling, escudos, pesos…you get the idea. But he never puts them to any useful purpose, neither really helping himself or helping others, so they might as well just be buttons, so I call him: The Button Collector. BC has millions and millions of buttons – plus 102. For our ease today, we’ll consistently leave off the millions and millions and we’ll say he has just the 102.
On Monday night, at 6 PM, BC counts his buttons and finds he has 102 whole buttons (we will have no half buttons here please); Tuesday night, he counts again: 104 buttons; on Wednesday night, 106. With this information, we can do wonderful statistical-ish things. We can find the average number of buttons over three days (both mean and median). Precisely 104.
We can determine the statistical trend represented by this three-day data set. It is precisely +2 buttons/day. We have no doubts, no error bars, no probabilities (we have 100% certainty for each answer).
How many buttons will there be Friday night, two days later?
If you have answered with any number or a range of numbers, or even let a number pass through your mind, you are absolutely wrong.
The only correct answer is: We have no idea how many buttons he will have Friday night because we cannot see into the future.
But, you might argue, the trend is precisely, perfectly, scientifically statistically +2 buttons/day and two days pass, therefore there will be 110 buttons. All but the final phrase is correct, the last — “therefore there will be 110 buttons” — is wrong.
We know only the numbers of buttons counted each of the three days – the actual measurements of number of buttons. Our little three point trend is just a graphic report about some measurements. We know also, importantly, the model for the taking the measurements – exactly how we measured — a simple count of whole buttons, as in 1, 2, 3, etc..
We know how the data was arrived at (counted), but we don’t know the process by which buttons appear in or disappear from BC’s collection.
If we want to be able to have any reliable idea about future button counts, we must have a correct and complete model of this particular process of button collecting. It is really little use to us to have a generalized model of button collecting processes because we want a specific prediction about this particular process.
Investigating, by our own observation and close interrogation of BC, we find that my eccentric acquaintance has the following apparent button collecting rules:
- He collects only whole buttons – no fractional buttons.
- Odd numbers seem to give him the heebie-jeebies, he only adds or subtracts even numbers of buttons so that he always has an even number in the collection.
- He never changes the total by more than 10 buttons per day.
These are all fictional rules for our example; of course, the actual details could have been anything. We then work these into a tentative model representing the details of this process.
So now that we have a model of the process; how many buttons will there be when counted on Friday, two days from now?
Our new model still predicts 110, based on trend, but the actual number on Friday was 118.
The truth being: we still didn’t know and couldn’t have known.
What we could know on Wednesday about the value on Friday:
- We could know the maximum number of buttons – 106 plus ten twice = 126
- We could know the minimum – 106 minus ten twice = 86
- We could know all the other possible numbers (all even, all between 86 and 126 somewhere). I won’t bother here, but you can see it is 106+0+0, 106+0+2, 106+0+4, etc..
- We could know the probability of the answers, some answers being the result of more than one set of choices. (such as 106+0+2 and 106+2+0)
- We could then go on to figure five day trends, means and medians for each of the possible answers, to a high degree of precision. (We would be hampered by the non-existence of fractional-buttons and the actual set only allowing even numbers, but the trends, means and medians would be statistically precisely correct.)
What we couldn’t know:
- How many buttons there would actually be on Friday.
Why couldn’t we know this? We couldn’t know because our model – our button collecting model – contains no information whatever about causes. We have modeled the changes, the effects, and some of the rules we could discover. We don’t know why and under what circumstances and motivations the Button Collector adds or subtracts buttons – we don’t really understand the process – BC’s button collecting — because we have no data about the causes of the effects we can observe or the rules we can deduce.
And, because we know nothing about causes in our process, our model of the process, being magnificently incomplete, can make no useful predictions whatever from existing measurements.
If we were able to discover the causes effective in the process, and their relative strengths, relationships and conditions, we could improve our model of the process.
Back we go to The Button Collector and under a little stronger persuasion he reveals that he has a secret formula for determining whether or not to add or subtract the numbers of buttons previously observed and a formula for determining this. Armed with this secret formula, which is precise and immutable, we can now adjust our model of this button collecting process.
Testing our new, improved, and finally adjusted model, we run it again, pretending it is Wednesday, and see if it predicts Friday’s value. BINGO! ONLY NOW does it give us an accurate prediction of 118 (the already known actual value) – a perfect prediction of a simple, basic, wholly deterministic (if tricky and secret) process by which my eccentric acquaintance adds and subtracts buttons from his collection.
What can and must we learn from this exercise?
1. No statistical trend, no matter how precisely calculated, regardless of its apparent precision or length, has any effect whatever on future values of a data set – never, never and never. Statistical trends, like the data of which they are created, are effects. They are not causes.
2. Models, not trends, can predict, project, or inform about possible futures, to some sort of accuracy. Models must include all of the causative agents involved which must be modeled correctly for relative effects. It takes a complete, correct and accurate model of a process to reliably predict real world outcomes of that process. Models can and should be tested by their abilities to correctly predict already known values within a data set of the process and then tested again against a real world future. Models also are not themselves causes.
3. Future values of a thing represented by a metric in data set output from a model are caused only by the underlying process being modeled–only the actual process itself is a causative agent and only the actual process determines future real world results.
PS: If you think that this was a silly exercise that didn’t need to be done, you haven’t read the comments section at my essay at Dot Earth. It never hurts to take a quick pass over the basics once in a while.
# # # # #
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
David L. says:
October 18, 2013 at 6:58 am
But that’s where the learned academics fool themselves: they don’t really ever know the complete first principles model.
========
The classic example is tidal prediction. Prediction from first principles is a hopeless way to try and predict the ocean tides. Instead we predict the tides with great accuracy using a method that is for all intents and purposes astrology.
We observe the tides and the position of the sun, moon and planets in the heavens, and predict that the same alignment in the future will result in the same tides. We don’t need to know what causes the tides, there is no need for a mechanism, only the observation that nature moves in repetitive cycles.
Early humans learned to predict the seasons the same way, long before they could predict them from first principles. We could predict summer and winter long before we understood the tilt of the earth on its axis around the sun.
fhhaynie says:
October 18, 2013 at 6:59 am
Weather forecasters generally hedge their bets and limit their “predictions” to a few days.
=====
Climate forecasters make their “predictions” so far in the future that no one can check their accuracy.
@ur momisugly Doug Huffman
“Connecting the dots on an epistemological map ignores the complexity between the dots, however closely they are placed. Reality is fractally complex.”
Though I certainly agree with you on this, a great deal depends on your interpretive framework. I think our problem is often that we look at the map through the wrong end of the telescope (quantitatively) and everything seems so distant and out of reach, not to mention rather one-dimensional. A bit like a soporific general surveying a battlefield with his field glasses back-to-front and remarking at how far away the enemy seems and how flat the terrain. This framework just doesn’t reflect reality satisfactorily. We intuitively sense that ‘there is much more to this than meets the eye (than we can currently measure)’.
Yet if we can retrain ourselves to view the map more qualitatively (fractally, quantum-ly), perhaps –just perhaps – we might find ourselves looking through the other end of the telescope and seeing the lay of the land with greater clarity. We would then see a more intricate and layered topography of interconnecting factors (with both quantitative and qualitative properties) that better reflects reality. And better predicts probable outcomes while accommodating those “most tenebrous of cygneous waterfowl”.
So while I agree with you, I think we need to take a more positive approach, embedding the best of what we currently have in a wider and more qualitative framework, rather than just pointing out the obvious failure of the current paradigm.
This entire article is categorically untrue. Correlation is not causality, agreed, but it is all we ever have to make sense of the Universe. Science is nothing but systemetized observations of correlation that are indeed interpreted as probably being causality. The author should really a) learn some statistics; and b) read Jaynes’ Probability Theory, the Logic of Science.
The point isn’t that one cannot be mistaken in e.g. the extrapolation of a linear trend. The point is that one can often make some fairly powerful statements about how probable it is that one will be mistaken. If the author’s primary point above were true, then just because we’ve observed that objects released from rest close to the Earth’s surface consistently have fallen down according to what appear to be simple, predictable rules that fit consistently into a framework of similar rules for our entire life, we would still have no good reason for believing that the next time we drop a penny or throw a baseball, it will follow a trajectory consistent with those past observations. I will cheerfully bet the author $1 a trial for as many trials as he likes that if either of us drop a penny, it will fall down. I’ll even give him odds. Hell, I’ll just plain give him a dollar the first time it falls up.
There is so much more that I can say that is mistaken about this analysis. It ignores Bayes’ theorem entirely and the value of priors and their effect on estimates of future marginal or conditional probabilities. It ignores all of the mathematics associated with functional analysis (e.g. the Taylor series) which basically asserts that if there is an underlying non-stationary distribution from which any set of statistical samples is drawn that meets some very, very broad requirements — continuity, differentiability — that in fact one can almost invariably extrapolate a linear trend for at least some time, simply because the linear term often dominates a Taylor series expansion of the underlying distribution. That doesn’t mean that it always will, or that there don’t exist distributions and processes where it never will, but that there is a very, very broad class of processes for which it will, in fact, work. Broad enough that I dare say it will probably work nearly all of the time.
Even in climate science, this trick works remarkably well. What’s one of the best predictors of tomorrow’s weather, one that doesn’t rely on satellites or all of the tricks given in Anthony’s lovely book on predicting the weather (which, by the way, are also a direct contradiction of the top article, as the top article is basically saying that Anthony’s book is nonsense because just because certain cloud patterns were observed to have a linear correlation with future weather in the past doesn’t mean that those correlations will persist into the future)? “The weather tomorrow will be pretty much like the weather today”. Why? Because if one computes the autocorrelation of a variety of aspects of the weather, the autocorrelation time is longer than a day for many of them. Large temperature shifts don’t occur daily, they occur every few days (and even then, usually occur within fairly narrow ranges). Sunny fine days are often clustered (because high pressure systems are usually large enough and move slowly enough that they take days to pass over any given point). Ditto rainy/cloudy weather. Note that we cannot be certain of any of this, and that at some times of the year and on some parts of the globe the autocorrelation is smaller (in the middle of the Sahara or Antarctica I imagine it is positively huge, but in the temperate zone springtime it is comparatively short) but even this is known, approximately, on the basis of observed, extrapolated linear trends buffed up only in the modern era by additional Bayesian prior knowledge such as an understanding of the physics underlying moisture and the movement of air masses and cloud formation and so on.
Red sky in the morning, sailor take warning dates back 2000 plus years and is nothing but a linear extrapolation of observational data in weather systems. Now we actually understand things like Rayleigh scattering and why the rule — more often than not, but not at all certainly — works.
So sorry, if you want to actually learn about the statistical science of the improbable, you are far better advised to rely on Taleb’s book The Black Swan, who makes the same point the author is attempting to make, only far, far better, and in a context that fully appreciates that even black swan events don’t invalidate the assertion that autocorrelation in systems with an internal (hidden) dynamics is usually long enough that linear extrapolation of their behavior is valid for some time (times less than the empirically observed autocorrelation time, in fact), they simply mean that one should hedge one’s bets taking into account the possibility of comparatively rare but highly costly exceptions.
rgb
[Should “because just because” be “just because” . ? . Mod]
But that’s where the learned academics fool themselves: they don’t really ever know the complete first principles model. Friction, air resistance, etc. combine to make the full calculation impossible for all but the most idealized cases. Even artillery calculations utilize “fudge factors” that are required to dial in the targeting calculations for the given situation. Accounting for powder charge, mass, velocity, wind speed and direction, temperature, etc. can get you close but not guaranteed “bullseye”.
Excuse me, but this too is bullshit. Learned academics (such as myself) who teach this stuff every day do not, in fact fool themselves. Often we teach our students “this is idealized bullshit, but it is still the first step towards understanding what goes on and is at least approximately correct”. And if one works hard enough, and wisely enough, within the known limits of our knowledge and descriptions, one can do things like build the laptop you are typing this on, or naval fire control computers that work well enough to sink ships remarkably well (compared to the old days of pirates firing cannons at point blank range).
Don’t generalize. I know you’re trying to say “Climate models suck” but why not just say it instead of accusing “academics” of not knowing the limitations of their own knowledge in general?
rgb
Replies to All:
To those who have been supportive, Thank You.
To those who have been generally supportive but have some concern or question: Thank You, I’ll try to cover your concerns as I answer comments in buckets by issue raised.
To Statisticians Everywhere: Fred, Ted and a couple of others. Luckily, I am not a statistician. I write as a professional “practitioner of practicality” — as a “practician”. Statisticians have their own definitions of words that are different than the rest of the world and the things discussed in my essay mean different things to them. They are sure their definitions are the correct ones, even though the rest of us don’t use the words that way. All fair enough. If some statistician would like to translate my essay into Statistician-ese, I would be glad to read it. The practical principles presented in my essay — however simplistic and lacking in nuance — are nonetheless correct in what even lawyers are now required to call “plain English”.
and
An Observation: I am pleased to observe that here at WUWT, as opposed to the Dot Earth blog, in the 104 comments so far, there has not been a single instance of rank name calling or bullyism — not one. Marvelous.
rgbatduke says: October 18, 2013 at 7:46 am “The author should really a) learn some statistics; and b) read Jaynes’ Probability Theory, the Logic of Science.” Yes, cited above! Particularly Section 5.3 ‘Converging and diverging views’ (pp 126 – 132) excerpted here
http://www.variousconsequences.com/2009/11/converging-and-diverging-views.html
2. Models, not trends, can predict, project, or inform about possible futures, to some sort of accuracy. Models must include all of the causative agents involved which must be modeled correctly for relative effects. It takes a complete, correct and accurate model of a process to reliably predict real world outcomes of that process.
The rigor specified here is overstated and in the end entirely false.
Ancient astronomers were able to predict lunar and solar eclipses without knowing all the causative agents. They might have even been wrong is important aspects of the causative model. The Farmer’s Almanac makes useful predictions about where and when to plant, not from accurate causative non-linear models, but from historical records and employing trends.
An accurate prediction based upon causative agent models, of Breckenridge snow base on Dec. 26, 2013 would be most difficult. An accurate prediction for Dec. 26, 2053 would be even harder. Nevertheless, using only trends based on historical data, you can have extremely high confidence in a prediction that snow base for Breckenridge on July 4, 2054 will be much less than it will be Dec. 26, 2053.
[PREDICTIVE] Models can and should be tested by their abilities to correctly predict already known values within a data set of the process and then tested again against a real world future.
The value of any model must be measured by the difference it can make to you with the prediction in hand[1] compared to without the prediction. No matter how mysterious, if a model had a 60% chance of correctly predicting BLACK on a roulette wheel the model would have great value indeed.
Note 1: Sometimes your value in a model lies solely in fact that you found some sucker to pay you for a prediction.
Statistics and probability are tools like hammers and saws. Experience, understanding and intelligence have to be applied to use an appropriate tool for a specific job. Hammers and saws are fairly easy to understand but still need skil to apply well, whereas statistics and probability require a deep understanding together with a deep understanding of the process/system under consideration before the appropriate tool can be chosen and skillfully applied. Just applying formulae to collections of numbers is not the way to use statistics intelligently in the same way it is unreasonable to pick a specific process (button collecting) and suggest it invalidates the intelligent use of statistics.
Reply to those insisting that trends can and/or do, or can be used to, predict the future:
It is important not to confuse apparent trend with model output or results of an underlying process. In the BC example, the first ‘trend’ presented is three button counts, on three successive days. There are simple measurements (counts) and it is really an error to call them a trend, as a trend implies or assumes that they are the result of a [modeled] process. Drawing a line through the data points does not change anything. Think about this difference. Don’t make the assumption for evidence not presented. A simple count or measurement is just that and nothing else. One mustn’t assume an underlying process (which would or could be modeled to produce data). Try another example — the value of the coins found in your pocket at the end of each day, which you dump into a jar on the dresser. Count this every day for a week. Graph them and see some trend. Can you scientifically use this trend to predict the next day’s value? next Wednesday’s value? No, of course not. It is not, as some commenters have pointed out, because it is random — it is because you have no reliable information about the process that produces “coins in my pocket daily”. You can not even formulate a mental model that allows estimation of the future. If you make any prediction, whether your prediction is right or wrong — or right 75% of the time — you are fooling yourself with numbers.
Trends formed by model output–when the model has been formulated with knowledge about the process it models–do have predictive ability. It is the model that has the ability to predict, not the visualization of it on a piece of paper or a computer screen. For us to make a prediction using a model, we look for things like trends. Many models are far more complicated and the predictive power is NOT made evident by straight-line linear trends, but by other indicators. The predictive ability of models always depends on the correctness of the model and the accuracy (and appropriateness) of the input.
“Climate models suck”
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Indeed – loaded dice.
Now that here has been an opportunity to digest Jaynes’ ‘Converging and diverging views’; this is why narrators of controversial stories must use considered and measured tones, for the least hyperbole wedges the audience at a greater rate. Lessons the AGW-ists and media must take to heart to regain any measure of credibility.
If RGB at Duke has an academician’s grasp of Jaynes’ maths then this retiree would appreciate an on-line course so narrowly focussed. I will read ALL of Probability Theory, however slowly, however interrupted by Popper, Taleb, Kahneman, Tversky …
Yes I think we can agree on the usefulness of simple observations like pencil’s dropping and the sun rising but when things get a lot more complex perhaps the Climatology Club might need to think about parallels in science.
From The Australian (18th Oct) comes a report of this piece of settled science from an impeccable source like the IPCC presumably-
THE World Health Organization yesterday classified outdoor air pollution as a leading cause of cancer in humans.
“The air we breathe has become polluted with a mixture of cancer-causing substances,” said Kurt Straif of the WHO’s International Agency for Research on Cancer.
“We now know that outdoor air pollution is not only a major risk to health in general, but also a leading environmental cause of cancer deaths.”
The IARC said a panel of top experts had found “sufficient evidence” that exposure to outdoor air pollution caused lung cancer and raised the risk of bladder cancer.
Although the composition of air pollution and levels of exposure can vary dramatically between locations, the agency said its conclusions applied to all regions of the globe.
Air pollution was already known to increase the risk of respiratory and heart diseases.
The IARC said pollution exposure levels increased significantly in some parts of the world in recent years, notably in rapidly industrialising nations with large populations.
The most recent data, from 2010, showed that 223,000 lung cancer deaths worldwide were the result of air pollution, the agency said.
The data did not enable experts to establish whether particular groups of people were more or less vulnerable to cancer from pollution, but Dr Straif said it was clear that risk rose in line with exposure.
In the past, the IARC had measured the presence of individual chemicals and mixtures of chemicals in the air — including diesel engine exhaust, solvents, metals, and dust.
Diesel exhaust and what is known as “particulate matter” — which includes soot — have been classified as carcinogenic by the IARC.
The latest findings were based on overall air quality, and based on an in-depth study of thousands of medical research projects conducted around the world over decades.
“Our task was to evaluate the air everyone breathes rather than focus on specific air pollutants,” said the IARC’s Dana Loomis.
“The results from the reviewed studies point in the same direction: the risk of developing lung cancer is significantly increased in people exposed to air pollution,” he added.
The predominant sources of outdoor air pollution were transport, power generation, emissions from factories and farms, and residential heating and cooking, the agency said.
“Classifying outdoor air pollution as carcinogenic to humans is an important step,” said the IARC’s director Christopher Wild.
“There are effective ways to reduce air pollution and, given the scale of the exposure affecting people worldwide, this report should send a strong signal to the international community to take action without further delay.”
The IARC said that was set to publish its in-depth conclusions on October 24 on the specialised website The Lancet Oncology.
Now from those lung cancer statistics it may be Big Tobacco rightly want an international apology and their money back. What say you Big Climate? Is the science settled here or are Big Tobacco being a little too simplistic perhaps?
Kip Hansen said:
October 18, 2013 at 8:09 am
An Observation: I am pleased to observe that here at WUWT, as opposed to the Dot Earth blog, in the 104 comments so far, there has not been a single instance of rank name calling or bullyism — not one. Marvelous.
———————————————–
Idiot.
Now gimme your lunch money.
😉
Reply to those who feel that the conclusions of my essay invalidate or threaten science or statistics or use of their favorite cookie recipe:
Many commenters object that “we can and do predict things all the time” and give examples such as flying bullets, cannon balls in flight, rolling cars, the sun rising and setting, tides, historical snow falls and a long list of other things. And that we use observed “trends” to do so. Of course we do.
But why can we do so, since trends cannot and do not predict the future? It is because of what we are substituting for “trends” in these examples – in most cases we are substituting mental models based on physical processes. We see a car rolling straight across the parking lot, sans driver, towards a playing child. We automatically formulate the model for free rolling cars, use the model to make a prediction, rush over and pick up and remove the threatened child from danger. There is no trend involved in this example. There is the [mentally] modeled output of the physical process of the rolling car (based on well-understood and time-tested Newtonian laws of motion).
Another factor in this general concern bucket involves the subject of forecasting. [ http://www.forecastingprinciples.com/ ] Forecasting is a vitally important subject. See the link provided for scientifically formulated forecasting principles. Many mention directly, or intuitively, that a first principle of forecasting is that your best bet, when faced with a complex system about which little is understood, is to forecast “more of the same” and then use this to mean that one’s best bet is to always predict that a trend will continue. I am not a forecasting expert any more than I am a statistician. But before one can apply this true principle of forecasting, one has to apply a higher and overriding principle of forecasting, which is to first determine is any meaningful forecast can be made at all, given the problem, the data available, one’s understanding of the processes involved and the purpose of the forecast. I consider it most likely that simply predicting linear projection of a trend is not a valid forecasting method – unless one steps way way back and looks at the long long term behavior — such as “Investing in Blue Chip Stocks is a good long term investment approach”.
rgbatduke says:
October 18, 2013 at 7:46 am
Agreed, more or less. The author has shown a particular type of behavior is not predictable using trends. However, a linear regression is actually the optimal mean-square estimator for a particular type of random sequence: that of a deterministic affine variable with random slope and intercept, with measurements polluted by uniform independent noise. That is, in fact, the model for which a standard linear regression is derived.
He would be on much more solid ground if he said that climate variables, particularly mean surface temperature anomaly, do not behave like such a sequence and, as a result, are not predictable to the desired level of accuracy using such a model. To the degree that your model fails to capture the dynamics of the actual process, statistics derived based upon that model are dubious, to say the least.
A better model for mean surface temperature anomaly could be constructed. It would include a process model for the 65 year dynamics observable here, and possibly the 21 year process as well. The 65 year process is key – it was the upswing of that dynamic which was mistaken for an anthropogenic effect in the latter decades of the 20th century, and it is that process which is currently driving temperatures back down again. All independently of egocentric humans who, like a flea on the elephant’s back, think they control the beast because he happened to turn when the flea thought “turn!”.
mhx says:
October 18, 2013 at 2:06 am
Suppose that a car at a distance of 500 meters starts driving in your direction and for 490 meters it drives in a straight line at constant velocity. The car has no driver. It has been programmed. You don’t know the program. You know nothing about the causes that make the car drive in a straight line for about 490 meters. What would you do?
———————————————————–
Stay on the sidewalk. It is the driver using a cellphone that worries me.
Kip, as you and others acquainted with this subject no doubt know: Statistics is the language of science, but Probability is the logic of science. Most people have never come into contact with either the language or the logic, and therefore fill in the gaps in their education with arm waving and emotion.
How many people do you know who are familiar with Hume, Jaynes, or others of that caliber?
People make critical decisions based upon statistical analysis and trend presentation all the time, included educated and well-reasoned people who make it their living, including actuaries, financial analysts, marketing analysts, etc. Complex sets of data won’t always yield rational explanation, and the fast-paced world of real business can’t afford the luxury of the time it takes to completely isolate all the variables to come to root cause understanding. There is a cost to not making a decision, and opportunities often have a shelf life, and therefore regression trend analysis may be the only predictive tool which has both the timeliness and utility to give answers.
While it is easy (and perhaps a bit enjoyable) to poke fun at “scientists” who hang their hats on statistical trending and modeling, it would be disingenuous to say there is zero value in using modeling as a predictive tool. Indeed business and significant portions of science would be severely handicapped without it. While I agree with the premise of the article that people who use statistical analysis and models need to understand the limits and risks of the forecasting it can produce, I would avoid making such a sweeping indictment.
I’m not sure what the frequent reference to being able to predict a rising sun has to do with a description of the nature of what a trend is? Instead of using a phenomenon that is modelled VERY well and then claiming the hypothesis that trend has predictive value is validated (c’mon man!) why not pick an actual test of your hypothesis?
Take the last 100 months of temperature data from any station you want, fit a trend to that data and bet me 5 quatloos that you know what the tempature will be tomorrow. Afterwards, I’ll give you the money back if you promise to read this essay again.
see this arcticle by statistician William Briggs for more on what trend means and what it doesn’t mean.
http://wmbriggs.com/blog/?p=6854
Just an engineer;
Stay on the sidewalk. It is the driver using a cellphone that worries me.
>>>>>>>>>>>>>>>>>
No no no! You use YOUR cell phone to hack into the car’s computer and slam on the brakes. They do it on TV all the time, everyone knows that. 😉
You can hardly talk about predictability of a trend until you have verified its accuracy and reliability. There are parallels between investors and those with a vested interested in climate, and it’s worth noting that both groups rely heavily on the conventional databases that archive “past performances”. Of those two data sets, which is more reliable? Personally, I trust Wall Street market data more than I trust the archives about the climate at, for example, UEA HadCRU. Perhaps too many eyes are on the DOW minute by minute for investors to suffer the kind of blatant “adjustments” of their database that has affected the government-corrupted climate industry.
In his graph here a few days ago, Roy Spencer documented fully 90 climate model projections from the 1980’s which incorrectly mimicked each other (see http://wattsupwiththat.com/2013/10/14/90-climate-model-projectons-versus-reality/ ). Only a few of these came close to the observed temperature records (HadCRUT4 Surface, and UAH Troposphere) strongly suggesting that they were biased by the same incentives to achieve the same alternate reality. Programmed to go upwards, they went about their predetermined task with alacrity. Do such diversions from “reality” happen with stocks? Every day, it appears, at the whims of lying CEOs and intrusive government agents attempting to pull the strings of the economy. Sooner or later, however, sometimes with brutal determinism, the markets correct themselves. By instantly clarifying its own record, such “efficiency” earns investment indexes a grudging respect not earned by climate records, and arguably makes predictions of the movements of the DOW more reliable.
Kip Hansen:
Respectfully, I write to disagree with your assertion that “trends cannot and do not predict the future”.
In your explanatory post at October 18, 2013 at 9:48 am you say
Sorry, but the human brain is very good at assessing trajectories (i.e. trends in the spatial changes of objects). Ball games would not be possible if this were not so.
I assure you that a person dodges a falling object on the basis of a prediction of where the trend of the objects movement will lead to it hitting the ground. The only model is the trend.
Similarly, an athlete does not have a “[mentally] modeled output of the physical process” of a ball’s flight “(based on well-understood and time-tested Newtonian laws of motion)”. She assesses the ball’s trajectory and predicts future position on the basis of the non-linear trend of its movement. And as e.g. wind alters the trajectory she constantly adjusts her prediction as she runs to where she hopes to catch it. Indeed, a bowler in cricket uses several methods (e.g. swing, spin, bounce, etc.) to disguise the eventual trajectory of the ball from the batsman.
People use trends as predictors every day. Evolution has honed ability to do that because it works more often than not. Indeed, this evolutionary result is why people are good at observing patters even where no patterns exist.
A trend can and does predict the future so long as the trend continues into the future. But trends change with time and, therefore, trends are imperfect predictors of the future.
Richard
Sorry patterns and not patters. Richard
Nothing predicts the future exactly. Statistical trends, with or without adequate knowledge of the underlying process, predict the near future better than anything else, where “better” is measured by “mean squared prediction error”. In central Missouri, next December will be cooler than next July, even though the process of weather is incompletely known; and in central Missouri the weather tomorrow will be more like the weather yesterday than like the weather 3 months ago, almost all the time. An investment company that has above average returns 10 years in a row will have above average returns next year, but will display regression to the mean; that is true for almost all investment companies that have above average returns 10 years in a row.
Even if you know the mechanism you need measurements on its outcomes and estimates of its parameters, and a statistical analysis of how well the computed model has fit the data in the past. The prediction of the future based on knowing the mechanism will be a calculation based on the computed estimate, along with a probability distribution over the range of possible outcomes.
To repeat: nothing predicts exactly; trends do better than anything else.
I think people forget to make comparisons among alternatives, and neglect to specify what they mean. When you specify a measure of successful prediction, like mean square error, mean absolute error, etc., and then look at all available alternatives, then trends are the best of a pretty dismal lot.