The Button Collector or When does trend predict future values?

So, you know now who to call if YOU loose a bu...
How many buttons will he have on Friday? (Photo credit: Wikipedia)

Guest essay By Kip Hansen

INTRO: Statistical trends never determine future values in a data set. Trends do not and cannot predict future values. If these two statements make you yawn and say “Why would anyone even have to say that? It is self-evident.” then this essay is not for you, you may go do something useful for the next few minutes while others read this. If you had any other reaction, read on. For background, you might want to read this at Andrew Revkin’s NY Times Dot Earth blog.

­­­­­­I have an acquaintance that is a fanatical button collector. He collects buttons at every chance, stores them away, thinks about them every day, reads about buttons and button collecting, spends hours every day sorting his buttons into different little boxes and bins and worries about safeguarding his buttons. Let’s call him simply The Button Collector or BC, for short.

Of course, he doesn’t really collect buttons, he collects dollars, yen, lira, British pounds sterling, escudos, pesos…you get the idea. But he never puts them to any useful purpose, neither really helping himself or helping others, so they might as well just be buttons, so I call him: The Button Collector. BC has millions and millions of buttons – plus 102. For our ease today, we’ll consistently leave off the millions and millions and we’ll say he has just the 102.

On Monday night, at 6 PM, BC counts his buttons and finds he has 102 whole buttons (we will have no half buttons here please); Tuesday night, he counts again: 104 buttons; on Wednesday night, 106. With this information, we can do wonderful statistical-ish things. We can find the average number of buttons over three days (both mean and median). Precisely 104.

We can determine the statistical trend represented by this three-day data set. It is precisely +2 buttons/day. We have no doubts, no error bars, no probabilities (we have 100% certainty for each answer).

How many buttons will there be Friday night, two days later? 

If you have answered with any number or a range of numbers, or even let a number pass through your mind, you are absolutely wrong.

The only correct answer is: We have no idea how many buttons he will have Friday night because we cannot see into the future.

But, you might argue, the trend is precisely, perfectly, scientifically statistically +2 buttons/day and two days pass, therefore there will be 110 buttons. All but the final phrase is correct, the last — “therefore there will be 110 buttons” — is wrong.

We know only the numbers of buttons counted each of the three days – the actual measurements of number of buttons. Our little three point trend is just a graphic report about some measurements. We know also, importantly, the model for the taking the measurements – exactly how we measured — a simple count of whole buttons, as in 1, 2, 3, etc..

We know how the data was arrived at (counted), but we don’t know the process by which buttons appear in or disappear from BC’s collection.

If we want to be able to have any reliable idea about future button counts, we must have a correct and complete model of this particular process of button collecting. It is really little use to us to have a generalized model of button collecting processes because we want a specific prediction about this particular process.

Investigating, by our own observation and close interrogation of BC, we find that my eccentric acquaintance has the following apparent button collecting rules:

  • He collects only whole buttons – no fractional buttons.
  • Odd numbers seem to give him the heebie-jeebies, he only adds or subtracts even numbers of buttons so that he always has an even number in the collection.
  • He never changes the total by more than 10 buttons per day.

These are all fictional rules for our example; of course, the actual details could have been anything. We then work these into a tentative model representing the details of this process.

So now that we have a model of the process; how many buttons will there be when counted on Friday, two days from now?

Our new model still predicts 110, based on trend, but the actual number on Friday was 118.

The truth being: we still didn’t know and couldn’t have known.

What we could know on Wednesday about the value on Friday:

  • We could know the maximum number of buttons – 106 plus ten twice = 126
  • We could know the minimum – 106 minus ten twice = 86
  • We could know all the other possible numbers (all even, all between 86 and 126 somewhere). I won’t bother here, but you can see it is 106+0+0, 106+0+2, 106+0+4, etc..
  • We could know the probability of the answers, some answers being the result of more than one set of choices. (such as 106+0+2 and 106+2+0)
  • We could then go on to figure five day trends, means and medians for each of the possible answers, to a high degree of precision. (We would be hampered by the non-existence of fractional-buttons and the actual set only allowing even numbers, but the trends, means and medians would be statistically precisely correct.)

What we couldn’t know:

  • How many buttons there would actually be on Friday.

Why couldn’t we know this? We couldn’t know because our model – our button collecting model – contains no information whatever about causes. We have modeled the changes, the effects, and some of the rules we could discover. We don’t know why and under what circumstances and motivations the Button Collector adds or subtracts buttons – we don’t really understand the process – BC’s button collecting because we have no data about the causes of the effects we can observe or the rules we can deduce.

And, because we know nothing about causes in our process, our model of the process, being magnificently incomplete, can make no useful predictions whatever from existing measurements.

If we were able to discover the causes effective in the process, and their relative strengths, relationships and conditions, we could improve our model of the process.

Back we go to The Button Collector and under a little stronger persuasion he reveals that he has a secret formula for determining whether or not to add or subtract the numbers of buttons previously observed and a formula for determining this. Armed with this secret formula, which is precise and immutable, we can now adjust our model of this button collecting process.

Testing our new, improved, and finally adjusted model, we run it again, pretending it is Wednesday, and see if it predicts Friday’s value. BINGO! ONLY NOW does it give us an accurate prediction of 118 (the already known actual value) – a perfect prediction of a simple, basic, wholly deterministic (if tricky and secret) process by which my eccentric acquaintance adds and subtracts buttons from his collection.

What can and must we learn from this exercise?

1. No statistical trend, no matter how precisely calculated, regardless of its apparent precision or length, has any effect whatever on future values of a data set – never, never and never. Statistical trends, like the data of which they are created, are effects. They are not causes.

2. Models, not trends, can predict, project, or inform about possible futures, to some sort of accuracy. Models must include all of the causative agents involved which must be modeled correctly for relative effects. It takes a complete, correct and accurate model of a process to reliably predict real world outcomes of that process. Models can and should be tested by their abilities to correctly predict already known values within a data set of the process and then tested again against a real world future. Models also are not themselves causes.

3. Future values of a thing represented by a metric in data set output from a model are caused only by the underlying process being modeled–only the actual process itself is a causative agent and only the actual process determines future real world results.

PS: If you think that this was a silly exercise that didn’t need to be done, you haven’t read the comments section at my essay at Dot Earth. It never hurts to take a quick pass over the basics once in a while.

# # # # #

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

222 Comments
Inline Feedbacks
View all comments
DougS
October 17, 2013 9:02 pm

Excellent presentation of the underlying problem Kip, right on target.

graphicconception
October 17, 2013 9:22 pm

Statistics are strange things. Some thoughts:
Lord Rutherford is supposed to have said: “If your result needs a statistician then you should design a better experiment.”
90% of lung cancer victims have smoked. 100% of them have drunk water.
Odds of a million to one are considered remote. However, I can do something that has odds of, approximately 80,658,175,170,943,878,571,660,636,856,403,766,
975,289,505,440,883,277,824,000,000,000,000 to 1 against any time you like. I just need to put a deck of cards in a line.

October 17, 2013 9:28 pm

I liken it to a person walking up a hill and down the other side, and measuring their altitude at every step. Even if we know exactly what the length of their stride is, we cannot predict from the past data what their altitude will be on the next stride unless we know exactly what the shape of the hill is. In fact, if all we have is the data from the first 25% of the journey, the data would provide a trend suggesting the person is headed for outer space.
Interestingly, if we stopped our data collection when the person was 5 steps past the crest of the hill and on their way back down, we could say, and correctly so, that the person’s last 10 steps were the highest in the entire record of the journey. It would not change the fact that the person is going down hill.

Karl
October 17, 2013 9:36 pm

It is amazing how many commenters still, even after reading the article, cling to the idea of a predictive trend.
A trend never predicts future values. By definition, as the author pointed out, a trend[line] is fit to the data (not the other way around).
Some commenters seem to be unable to uncouple the trend from the process beneath.
Measurements Monday, Tuesday, Wednesday, and Thursday, of empty milk bottles on the front porch being, 1,2,3, and 4 respectively create a trend that predicts 5 empty milk bottles on Friday.
Unfortunately for the trend based prediction user, there are only 4 quart bottles of milk, and the milkman brings new milk on Fridays, so the correct answer is zero.
The trend is utterly useless as a predictor — always.

HankHenry
October 17, 2013 10:00 pm

Trends are an illusion

Gogs
October 17, 2013 10:22 pm

This is what worries me about share traders who use charts and “technical analysis” as predictions. Expressing the price as trying to break resistance lines and all that.
Prediction, it ain’t!

jimmi_the_dalek
October 17, 2013 10:25 pm

But half the articles on this site back predictions based on trends, usually of the form “it is going to get colder”. Are you saying all these articles are worthless?

observa
October 17, 2013 10:38 pm

Young Antounie Caen’s schoolteacher-
Swans are white, always have been white and always will be young Antounie, so get that through your thick head and pay attention in class meboy.
http://www.svswans.com/black.html
gnomish-
“if the underlying process is regular and invariant, then trend is, indeed, of predictive value with great accuracy.
for example, i need not specify planetary orbital paths nor gravitational constants to tell you that between now and a year from now there will be 365 cycles of day/night.”
Black swan event and poof! Just that we are lulled into thinking that it will be so from trend but should we ever have reason to believe otherwise then a helluva lot of other theories and science will evaporate instantly.

Fred the Statistician
October 17, 2013 10:38 pm

“The trend is utterly useless as a predictor — always.”
This is absolutely incorrect. In the absence of any other data, past trends are the BEST predictor of future values.
This entire article shows a complete failure to understand statistics. Statistics can be used to make predictions, based on certain assumptions. If those assumptions are correct, the prediction is MORE LIKELY to be correct. If the assumptions are wrong, then the prediction is LESS LIKELY to be correct. No where are guarantees made.
The correct prediction in the article would have been “Given the assumptions we are making (iid, normal, etc), we predict that it is most likely that there will be 110 buttons, give or take (a very large number).” No statistician would EVER say “There WILL BE 110 buttons.” Because statisticians actually understand statistics.
“We can’t know the future” is an utterly silly argument. Will the sun rise in the East tomorrow? I hope you didn’t say “yes”, because you can’t know the future! If I let go of this pencil I’m holding in the air, will it fall to the ground? I hope you didn’t say “yes”, because you can’t know the future!
Bah. The only thing this article shows is how many supposedly educated people can utterly fail to understand simple concepts. Statistical predictions are not Truth, they are well formed Guesses. And guesses can be wrong. Anyone, like the author, that fails to understand this should stay well away from statistics.

October 17, 2013 10:43 pm

This analysis fails to mention response times when you are dealing with a time series. If you consider response times, you are left with a strange result, that although a trend cannot be used to predict a future term, it can estimate unpredictable future terms with various degrees of lack of confidence.
An earlier blogger has used the case of planetary orbits. These have a long response times in regard to position relative to a datum. If I am given a graph of daily positions of a planet, I can estimate a position in the next day to a high degree of accuracy.
If, however, I am counting buttons, which my wife can throw away in a second, then I would not try to estimate ahead. The response time can be very short, like her temper can.

observa
October 17, 2013 11:07 pm

the Statistician-
“This entire article shows a complete failure to understand statistics. Statistics can be used to make predictions, based on certain assumptions. If those assumptions are correct, the prediction is MORE LIKELY to be correct. If the assumptions are wrong, then the prediction is LESS LIKELY to be correct. No where are guarantees made.
That’s what the article points out- ie the bleeding obvious and it’s just reminding us all about the implicit assumptions underlying the sun rising tomorrow or the pencil dropping. We answer ‘yes’ and tend to ignore the qualifying assumptions underlying it.

Mike McMillan
October 17, 2013 11:14 pm

Here’s a Revkin interview that’s almost reasonable.

Other_Andy
October 17, 2013 11:17 pm

Mark XR says:
October 17, 2013 at 8:23 pm
So, just because there has been a pause in rise of global mean surface temperature does not mean we can reject the physics of global warming? Rats.
If you mean, post modern IPCC physics, that CO2 is the key driver of modern climate change, yes we can. Historical evidence has all but debunked the correlation between global temperature and CO2.

Leonard Lane
October 17, 2013 11:28 pm

Steve M. from TN says:
October 17, 2013 at 7:43 pm
jim2 says:
October 17, 2013 at 7:26 pm
He will have between 0 and a googleplex of buttons. I’m 95% certain of it.
I’m 100% sure he’ll have between 0 and infinity (inclusive).
Not exactly. He will have between 0 and a countable infinite number of buttons, all terms of the infinite series being positive even numbers (unless he goes “broke” and winds up with zero). He will not have any odd number of buttons.

albertalad
October 17, 2013 11:53 pm

You are correct – stats are only as good the information entered. And as always there are as many economists predicting the future as there are economists. One may as well watch Star Trek. No matter what anyone says here – whatever their forward programing it is a guess based on a past that may or may not be anywhere near accurate. Regards to inventory – how often shelves are without a product is common. If climate is an indication then it is no wonder why models are next to useless. What we do know is there are ice ages and interglacial periods. The rest our present understanding is next to nil – like the weather.

Scarface
October 18, 2013 12:02 am

What ever happened to ‘The trend is your friend’?
Certain kinds of trading rely heavily on this principle.

October 18, 2013 12:03 am

How do you know that BC is reporting the number of buttons accurately? What if his buttonometer hasn’t been calibrated. What if he is in an Urban Button Island. Maybe he adjusted the count to conform to the Summary for Button Makers.
Heh – there haven’t been any buttons for 17 years.

thingadonta
October 18, 2013 12:12 am

Gary Larson’s The Far Side’s 4 personality types.
1. The glass is half empty.
2. The glass is half full.
3. Half empty!..no wait.. Half full.!….no wait
4. Hey I ordered a cheeseburger.!
1. The buttons are decreasing.
2. The buttons are increasing
3. We don’t know whether they are increasing or decreasing, or staying the same.
4. Unless we reduce our button consumption, things will get much worse than we thought, and the world will end as know it.

DrJohnGalan
October 18, 2013 12:28 am

In the UK, I think the few minutes spent reading this article would be of great benefit both to Ed Davey (if he is capable of understanding it) and Sir Mark Walport, as well as several members of Parliament’s Energy and Climate Change Committee.
“Back to basics” on the use of statistics, with a reminder of the number of potential interconnected variables that might contribute to a model of global climate, should get them to pause for thought. Clearly this will not happen, because we are dealing with a religious type of belief (which history shows us is all about the political power to soak the poor), not statistics or science.

A Crooks
October 18, 2013 12:30 am

If I could be bothered reading all this, I think I would disagree.
Sure, the trend is a product of the data, but if the data is measuring something which has inertia we can expect data to continue on trend. Given that global weather has inertia one can expect trend to continue. If there is a sixty year cycle that is obvious over one hundred and fifty years of data one would expect a certain amount of time to pass before that cycle decays and a new cycle / trend is established. There is a long trend out of the Little Ice Age. That will not just reverse. The value of the trend is that it enables you to see when the data is starting to stray off the trend. This is something the IPCC never worked out. They were silly enough to draw a straight line through a cherry-picked bit of the data and call it significant when the bigger picture screamed something else. Their model is broken and they just dont like the reality imposed by the real data (trend).
If you think this is wrong – consider investors in stock markets whose whole living is dependent on picking trends in data and then working out when the data has gone off trend in a significant way. Like climate researchers and climate , each investor does not need to know the detailed internal workings of a company to be able to see when the data is straying off trend and a buy/sell decision be made. I dont need to know anything about the science of climate change to see its drifted off the IPCCs trend and the whole AGW/CO2 business is now a SELL option. There was an excellent post just the other day on this very topic.
To say that you can never predict the future based on a trend leaves us saying we can never be sure the sun will come up in the morning. This may be true but it is an excellent working hypothesis.

Gareth Phillips
October 18, 2013 12:37 am

Past performance etc does not guarantee future returns, is a caution given to any small investor in the UK. However past behaviours do indicated the likelihood of future behaviours. When working in forensic mental health we used various tools to assess the risks patients with mental health problems who were also offenders posed to society. These assessments were in the main pretty useful. These type of assessments can also extend to how a person will react to a given piece of research. If they have always rejected a certain stance in the past, they are likely to do so again whatever the quality of the study. This underpins my belief that there is one heck of a lot more subjectivity in Climate science than we recognise, and a stance that makes me unpopular with both the consensus and skeptic side, many of whom like the reassurance of definite indisputable facts. From a personal perspective I see two general trends in opinions, but millions of different specific views on the issue, a number usually correlated to the amount of people who are asked.

LevelGaze
October 18, 2013 12:41 am

About trends –
Yes, the trend is indeed your friend.
But wise stockbrokers know that a trend continues until it doesn’t.
And, Observa
Here in Australia all our swans are black.

Kelvin Vaughan
October 18, 2013 1:32 am

Definition of Trend:
1. The general direction in which something tends to move.
2. A general tendency or inclination.
3. Current style.

Kelvin Vaughan
October 18, 2013 1:37 am

observa says:
October 17, 2013 at 10:38 pm
Young Antounie Caen’s schoolteacher-
Swans are white, always have been white and always will be young Antounie, so get that through your thick head and pay attention in class meboy.
He forgot to add “The science is settled”!

October 18, 2013 1:43 am

If you think this is wrong – consider investors in stock markets whose whole living is dependent on picking trends in data and then working out when the data has gone off trend in a significant way. Like climate researchers and climate , each investor does not need to know the detailed internal workings of a company to be able to see when the data is straying off trend and a buy/sell decision be made. I dont need to know anything about the science of climate change to see its drifted off the IPCCs trend and the whole AGW/CO2 business is now a SELL option. There was an excellent post just the other day on this very topic.