The Button Collector or When does trend predict future values?

So, you know now who to call if YOU loose a bu...
How many buttons will he have on Friday? (Photo credit: Wikipedia)

Guest essay By Kip Hansen

INTRO: Statistical trends never determine future values in a data set. Trends do not and cannot predict future values. If these two statements make you yawn and say “Why would anyone even have to say that? It is self-evident.” then this essay is not for you, you may go do something useful for the next few minutes while others read this. If you had any other reaction, read on. For background, you might want to read this at Andrew Revkin’s NY Times Dot Earth blog.

­­­­­­I have an acquaintance that is a fanatical button collector. He collects buttons at every chance, stores them away, thinks about them every day, reads about buttons and button collecting, spends hours every day sorting his buttons into different little boxes and bins and worries about safeguarding his buttons. Let’s call him simply The Button Collector or BC, for short.

Of course, he doesn’t really collect buttons, he collects dollars, yen, lira, British pounds sterling, escudos, pesos…you get the idea. But he never puts them to any useful purpose, neither really helping himself or helping others, so they might as well just be buttons, so I call him: The Button Collector. BC has millions and millions of buttons – plus 102. For our ease today, we’ll consistently leave off the millions and millions and we’ll say he has just the 102.

On Monday night, at 6 PM, BC counts his buttons and finds he has 102 whole buttons (we will have no half buttons here please); Tuesday night, he counts again: 104 buttons; on Wednesday night, 106. With this information, we can do wonderful statistical-ish things. We can find the average number of buttons over three days (both mean and median). Precisely 104.

We can determine the statistical trend represented by this three-day data set. It is precisely +2 buttons/day. We have no doubts, no error bars, no probabilities (we have 100% certainty for each answer).

How many buttons will there be Friday night, two days later? 

If you have answered with any number or a range of numbers, or even let a number pass through your mind, you are absolutely wrong.

The only correct answer is: We have no idea how many buttons he will have Friday night because we cannot see into the future.

But, you might argue, the trend is precisely, perfectly, scientifically statistically +2 buttons/day and two days pass, therefore there will be 110 buttons. All but the final phrase is correct, the last — “therefore there will be 110 buttons” — is wrong.

We know only the numbers of buttons counted each of the three days – the actual measurements of number of buttons. Our little three point trend is just a graphic report about some measurements. We know also, importantly, the model for the taking the measurements – exactly how we measured — a simple count of whole buttons, as in 1, 2, 3, etc..

We know how the data was arrived at (counted), but we don’t know the process by which buttons appear in or disappear from BC’s collection.

If we want to be able to have any reliable idea about future button counts, we must have a correct and complete model of this particular process of button collecting. It is really little use to us to have a generalized model of button collecting processes because we want a specific prediction about this particular process.

Investigating, by our own observation and close interrogation of BC, we find that my eccentric acquaintance has the following apparent button collecting rules:

  • He collects only whole buttons – no fractional buttons.
  • Odd numbers seem to give him the heebie-jeebies, he only adds or subtracts even numbers of buttons so that he always has an even number in the collection.
  • He never changes the total by more than 10 buttons per day.

These are all fictional rules for our example; of course, the actual details could have been anything. We then work these into a tentative model representing the details of this process.

So now that we have a model of the process; how many buttons will there be when counted on Friday, two days from now?

Our new model still predicts 110, based on trend, but the actual number on Friday was 118.

The truth being: we still didn’t know and couldn’t have known.

What we could know on Wednesday about the value on Friday:

  • We could know the maximum number of buttons – 106 plus ten twice = 126
  • We could know the minimum – 106 minus ten twice = 86
  • We could know all the other possible numbers (all even, all between 86 and 126 somewhere). I won’t bother here, but you can see it is 106+0+0, 106+0+2, 106+0+4, etc..
  • We could know the probability of the answers, some answers being the result of more than one set of choices. (such as 106+0+2 and 106+2+0)
  • We could then go on to figure five day trends, means and medians for each of the possible answers, to a high degree of precision. (We would be hampered by the non-existence of fractional-buttons and the actual set only allowing even numbers, but the trends, means and medians would be statistically precisely correct.)

What we couldn’t know:

  • How many buttons there would actually be on Friday.

Why couldn’t we know this? We couldn’t know because our model – our button collecting model – contains no information whatever about causes. We have modeled the changes, the effects, and some of the rules we could discover. We don’t know why and under what circumstances and motivations the Button Collector adds or subtracts buttons – we don’t really understand the process – BC’s button collecting because we have no data about the causes of the effects we can observe or the rules we can deduce.

And, because we know nothing about causes in our process, our model of the process, being magnificently incomplete, can make no useful predictions whatever from existing measurements.

If we were able to discover the causes effective in the process, and their relative strengths, relationships and conditions, we could improve our model of the process.

Back we go to The Button Collector and under a little stronger persuasion he reveals that he has a secret formula for determining whether or not to add or subtract the numbers of buttons previously observed and a formula for determining this. Armed with this secret formula, which is precise and immutable, we can now adjust our model of this button collecting process.

Testing our new, improved, and finally adjusted model, we run it again, pretending it is Wednesday, and see if it predicts Friday’s value. BINGO! ONLY NOW does it give us an accurate prediction of 118 (the already known actual value) – a perfect prediction of a simple, basic, wholly deterministic (if tricky and secret) process by which my eccentric acquaintance adds and subtracts buttons from his collection.

What can and must we learn from this exercise?

1. No statistical trend, no matter how precisely calculated, regardless of its apparent precision or length, has any effect whatever on future values of a data set – never, never and never. Statistical trends, like the data of which they are created, are effects. They are not causes.

2. Models, not trends, can predict, project, or inform about possible futures, to some sort of accuracy. Models must include all of the causative agents involved which must be modeled correctly for relative effects. It takes a complete, correct and accurate model of a process to reliably predict real world outcomes of that process. Models can and should be tested by their abilities to correctly predict already known values within a data set of the process and then tested again against a real world future. Models also are not themselves causes.

3. Future values of a thing represented by a metric in data set output from a model are caused only by the underlying process being modeled–only the actual process itself is a causative agent and only the actual process determines future real world results.

PS: If you think that this was a silly exercise that didn’t need to be done, you haven’t read the comments section at my essay at Dot Earth. It never hurts to take a quick pass over the basics once in a while.

# # # # #

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

222 Comments
Inline Feedbacks
View all comments
rogerknights
October 18, 2013 4:39 am

LevelGaze says:
October 18, 2013 at 12:41 am
About trends –
Yes, the trend is indeed your friend.
But wise stockbrokers know that a trend continues until it doesn’t.

“The trend is your friend until the bend in the end.”

Gareth Phillips
October 18, 2013 5:01 am

I think people are confusing likelihood’s with certainties. Trends give a reasonable indication of a likelihood when used in conjunction with other considerations, 100% certainties are more related to prophecy. My fellow believers in the luke-warm camp have made this error on many occasions by looking at trends in isolation and making predictions which have failed to materialise.

William Abbott
October 18, 2013 5:02 am

Jan Smit and acementhead
Jan the link says – “And semi-scientific it must be, because there are, of course, absolutely no demographic data available for 99 percent of the span of the human stay on Earth. Still, with some speculation concerning prehistoric populations, we can at least approach a guesstimate of this elusive number.”
Ace asserts you are wrong; in which he is almost certainly right and then he provides another “correct” estimate, in which he is almost certainly wrong. A range of somewhere between your two percentages is a better guess. But it is o so wide and board. The “some speculation” is like “total speculation” and so the answer is totally speculative.
And now the question: What do past population trends tell us about future population trends? Nobody sixty years ago would have been able to predict current population trends from past population trends.
Kips point is very well taken and very humbling, “We know nothing about the future” “We know damned little about probabilities” And yet.. what else do we have?
As the pilot observed: “The runway is always long enough, until it isn’t”
And as Oliver Cromwell said, “Who can love to walk in the dark? Yet Providence doth so often dispose”

Ted Carmichael
October 18, 2013 5:05 am

– thanks for the reply. You said, “We do actually have no idea what will happen next based on the trend.” It is the “no idea” part that I object to. That’s what I meant about it not being black or white: ‘know’ vs. ‘don’t know’ is the wrong way to look at it. The trend is useful information even though it is not perfect information. Saying “no idea” is akin to saying the trend gives no information at all, and I would say that is wrong. Cheers.

Jan Smit
October 18, 2013 5:17 am

William Abbott
Yes, William, of course you are right. But I was trying to tie off a potential off-topic discussion started inadvertently by myself due to my own laxity. Accepting I had been sloppy and acknowledging that an acceptable guesstimate might be closer to acementhead’s 7% was my way of willingly giving ground on matters of little consequence so we could all focus on more important aspects of this discussion.
Though I’m sure we could all have a jolly interesting and constructive discussion on estimates of earth’s past population, this is neither the time nor the place…

Grump Old Man
October 18, 2013 5:21 am

This all too much for us simple folk. I’ll stick with the Amish.

October 18, 2013 5:24 am

” Fred the Statistician says:
October 17, 2013 at 10:38 pm
“The trend is utterly useless as a predictor — always.”
This is absolutely incorrect. In the absence of any other data, past trends are the BEST predictor of future values.
This entire article shows a complete failure to understand statistics. Statistics can be used to make predictions, based on certain assumptions. If those assumptions are correct, the prediction is MORE LIKELY to be correct. If the assumptions are wrong, then the prediction is LESS LIKELY to be correct. No where are guarantees made.
The correct prediction in the article would have been “Given the assumptions we are making (iid, normal, etc), we predict that it is most likely that there will be 110 buttons, give or take (a very large number).” No statistician would EVER say “There WILL BE 110 buttons.” Because statisticians actually understand statistics.”
__________________________________________________________________________
And don’t Climate Alarmist Scientists say there WILL be catastrophe? And do statisticians actually understand statistics? I doubt it. Some statisticians understand some statistics and that would be as far as I would go.
“A Crooks says:
October 18, 2013 at 12:30 am
If you think this is wrong – consider investors in stock markets whose whole living is dependent on picking trends in data and then working out when the data has gone off trend in a significant way. Like climate researchers and climate , each investor does not need to know the detailed internal workings of a company to be able to see when the data is straying off trend and a buy/sell decision be made. I dont need to know anything about the science of climate change to see its drifted off the IPCCs trend and the whole AGW/CO2 business is now a SELL option. There was an excellent post just the other day on this very topic.”
___________________________________________________________________________
But are you a stock investor? Do you have any idea how many 100’s of different types of charts there are and you are trying to line up a trend? I personally used about 5 different stats charts and still would get it wrong about 20% of the time. These days the really big boys use their computers and very fancy algorithms to fire out buy and sell orders but the basics are still the different statistical methods.

Ian W
October 18, 2013 5:26 am

It was an interesting read. However, if the intent was to demonstrate the foolishness of using ‘statistical’ trends in climate, then the allegory button collecting from Monday to Wednesday what is Friday, was about using trends in weather. The allegory should have been: “I know the button numbers back from June to October, what will will the button count be in October in 10 years time?
This would have stopped all the comments from people saying they use short term trends all the time – as weather forecasters can – although they can still get it wrong. Once outside a temporal comfort zone though things are different, unless you are a climate ‘scientist’.

Paul Coppin
October 18, 2013 5:27 am

Having read both the DotEarth page (and its resident commenters are something else…) and the comments here, I’m somewhat bemused by the number of people who do not understand Kip’s simple message. I’m particularly intrigued by the apparent correlation between those with advanced knowledge and skills, and their lack of understanding of what Kip has written. What I do read in many posts, are rationalizations for beliefs that are based on the statistical analysis of past events.
Probabilities are not certainties. Statistical analysis allows us to take past facts and develop beliefs about future facts, but it never allows us to actually know the facts – that only happens when the future is past, and we add the future facts to the dataset.
Knowledge is always based on history. Knowledge of the future doesn’t exist. A belief about some aspect of the future is all there is. Statistical analysis and modelling provide us with degrees of comfort that our beliefs are, or are not, likely to happen, but the uncertainty is never completely quantifiable. It’s improved by the causalities we know and understand, and therefore our beliefs have a higher degree of comfort, and mathematically a higher probability of certainty, but they are never certain. Even the probability of tomorrow is not a certainty, however likely the probability is.
I note that in many comments on both sites, that individuals with high skills appear to have a stronger belief that their future knowledge is more certain. That’s a rationalization for their beliefs abut the future.
In the climate science realm, as I suspect in many sciences today, the faith in these models and tools has allowed beliefs to take precedence over facts, the implied point of Kip’s argument. As a consequence, we have predictions (beliefs) presented as certainties. and some of the most convinced that these beliefs are certainties are scientists, the group most expected to understand they are not.

Steve Koch
October 18, 2013 5:29 am

“Models, not trends, can predict, project, or inform about possible futures, to some sort of accuracy. Models must include all of the causative agents involved which must be modeled correctly for relative effects. It takes a complete, correct and accurate model of a process to reliably predict real world outcomes of that process. Models can and should be tested by their abilities to correctly predict already known values within a data set of the process and then tested again against a real world future. Models also are not themselves causes.”
Let’s say that a statistician is given a dataset containing the length of the day on earth for the last 1000 years. He is not told what the numbers mean (i.e. they are just numbers) or how they relate to real world processes. A competent statistician will easily be able to reliably predict the next number in the sequence.

JohnWho
October 18, 2013 5:30 am

Key, to me, is this:
“Models must include all of the causative agents involved which must be modeled correctly for relative effects. It takes a complete, correct and accurate model of a process to reliably predict real world outcomes of that process.”
Regarding climate models, they must not be including all of the causative agents or their relative effects.
So, whether they are giving “predictions” or “projections”, they will be wrong.

Rich
October 18, 2013 5:33 am

Interesting to read this in conjunction with this:
http://wattsupwiththat.com/2013/10/01/if-climate-data-were-a-stock-now-would-be-the-time-to-sell/
Sometimes you have to choose and to choose you have to guess. At least statistics can confine your guess to a probable region.

Bruce Cobb
October 18, 2013 5:34 am

If, two days later, the number of buttons is the same, or has even dropped by 1, one can always claim that the buttons which “should” be there are instead somewhere in the basement, under the floor. Problem solved.

William Abbott
October 18, 2013 5:38 am

Its hardly off topic. It’s a perfect example of Kip Hansen’sexcellent analogy and what I want to call, “Cromwell’s Dilemma” Cromwell wanted the future revealed to him through the agencies of divine providence, but concluded the information’s not available to the mortal man. We want the models, the trends, the statistics, to reveal the future to us —- but they don’t. They can’t. They speak of probabilities and probabilities tell us nothing of the future. We are left making decisions about the future, in the dark.
I know the population statistic was introduced as a humorous lead, but the percentage you used is taken as approximately knowable by all and it isn’t – and population trends happen to be a great example of how unpredictive trends are. Trends are predictive until they aren’t. So they tell us nothing and we have to “walk in the dark”

DirkH
October 18, 2013 5:47 am

Paul Coppin says:
October 18, 2013 at 5:27 am
“Knowledge is always based on history. Knowledge of the future doesn’t exist.”
Ya well, the number of buttons of the collector, just like global average temperature, is brown noise, because there is an integrating element (the existing collection / thermal inertia resp.).
That gives us a probability distribution for the result on day today+n.
Mark XR says:
October 17, 2013 at 8:23 pm
“So, just because there has been a pause in rise of global mean surface temperature does not mean we can reject the physics of global warming? Rats.”
The opposite is true. Because the Null hypothesis (that warming out of the LIA proceeds as it always has) still suffices to explain everything, we can apply Occam’s razor and discard the CO2AGW theory as unnecessary, it does not add explanative power, more likely, it WORSENs our ability to forecast.
So you warmists, predict something that we would not have been able to predict with the Null hypothesis and then you MIGHT have something. Refute the Null hypothesis. For now you have NOTHING.

Jan Smit
October 18, 2013 5:58 am

William Abbott
Please, William, I appreciate what you are saying and find myself completely in agreement with it. I regret using the 50% statistic as it’s based on little more than hearsay and has now exercised our minds too much already. Given its total lack of weight for either historical or predictive purposes, I put no store whatsoever in such a figure – it was just an off-the-cuff remark to illustrate a different point. I am perfectly aware that there is no way on God’s earth we can be anything but very uncertain how many people have ever lived. Even today’s population estimates are said by some to be widely off the mark. I thought the phrase “… it is estimated …” would make it nuanced enough to prevent such feedback, but I misjudged it. My bad.
Can we leave it at that?

AlexS
October 18, 2013 6:08 am

The author plays with words . A trend can be a model.
Trends might indeed show the future and might indeed predict future values. The most common situation is when the trend itself affects people behavior.
We can also discuss mob or pack behavior, and “If you tell a lie big enough and keep repeating it, people will eventually come to believe it”.
After a string of defeats i can predict that a certain trainer will not be at front of that team for much longer. Will be this true in all cases, no.
Can i predict the increasing odds of a divorce based on increasing discussions between a couple? Yes i can.
Or if a new technology appears,even if i don’t know anything about it but many adopt it then it might show a trend which mean that there are odds that the current increasing trend is repeated in the future.
Of course using just this kind of prediction is typically of inferior quality in most cases, but it has its value and sometimes it is the only or less worse solution.
Quantity has a value of its own.

Janice
October 18, 2013 6:22 am

Something else that can affect statistical predictions are the boundary conditions. Does the Button Collector have unlimited storage for his button collection? Does the Button Collector have an unlimited selection of buttons available to be added to the collection? Or, does storage and/or selection have the possibility of changing over time?
Worse yet, is there anyone who double-checks the actual numbers of buttons (quality assurance) that are in the collection?

October 18, 2013 6:28 am

The thesis of this post is correct. It is also irrelevant – very few people would confuse an extension of a past trend into the future for an infallible prediction.
Consider for a moment how the intelligence of animals and, finally, humans evolved. Did it start with sophisticated statistical methods and philosophical rigour? Of course not. We have two cats at home – one smart, the other not so much. The smart one is extremely good at picking up cues. For example, at the breakfast table, I first have some toast and jam, which she doesn’t seem to pay any attention to. However, afterwards, I make some salami or chicken sandwiches for my brown bag – and she is immediately on it and reminds me of her presence. Her cue? A little break between toast and sandwich. If I happen to pause a bit longer between two slices of toast, she will show up prematurely. (This is a simplified version – she follows a whole set of cues, she really is a smart beast, and as a consequence, too fat.)
What happens if she is wrong? She walks off to return again later, and the next day she will still try the same – because this strategy soundly beats a random guess. That’s what our brains give us – educated guesses. Of course, it’s nice when we can go beyond that, but much of the time, this is what we do, and as the course of evolution shows, there is nothing wrong with it.

October 18, 2013 6:33 am

“…fanatical button collector. He collects buttons at every chance, stores them away, thinks about them every day, reads about buttons and button collecting, spends hours every day sorting his buttons into different little boxes and bins and worries about safeguarding his buttons…”
On reading this, did anyone else think of Stanley Howler, Moist von Lipvig, Terry Pratchett and pins? 🙂 Now I have to go read “Going Postal” again.
As to the article, trends in the absence of any other data simply provide us with the initial assumption what its doing now is what it will keep on doing, for now. As the trend is studied and its actual (and possibly-multiple) causes are sorted out, the importance of the trend as a predictor diminishes, and the factors that caused the trend become more important.
w/regard to global warming, it appears that its proponents have spent 20 years denying any and all factors responsible for the trend except for their initial assumption that it was all CO2.
They will be the laughingstock of future history books, taking their rightful place alongside the Millerite movement’s “Great Disappointment” and of course Piltdown Man.

Doug Huffman
October 18, 2013 6:35 am

Nassim Nicholas Taleb, The Black Swan: The Impact of the Highly Improbable
E. T.Jaynes, Probability Theory: The Logic of Science.
Connecting the dots on an epistemological map ignores the complexity between the dots, however closely they are placed. Reality is fractally complex.

Doug Huffman
October 18, 2013 6:37 am

Michael Palmer says: October 18, 2013 at 6:28 am “The thesis of this post is correct. It is also irrelevant – very few people would confuse an extension of a past trend into the future for an infallible prediction. ” Still we indulge in progressive historicism. Santayana’s dictum was a caution and not a prescription for action. Karl Popper, The Poverty of Historicism, and The Open Society and Its Enemies.

David L.
October 18, 2013 6:58 am

The academics will tell you that you are describing a purely empirical model and therefore extrapolation is not accurate. Now the learned academics build “first principles” models and therefore extrapolation is possible. For example, one can theoretically calculate the trajectory of a projectile based only on velocity and angle of departure from the horizon (assuming no air resistance, etc.)
But that’s where the learned academics fool themselves: they don’t really ever know the complete first principles model. Friction, air resistance, etc. combine to make the full calculation impossible for all but the most idealized cases. Even artillery calculations utilize “fudge factors” that are required to dial in the targeting calculations for the given situation. Accounting for powder charge, mass, velocity, wind speed and direction, temperature, etc. can get you close but not guaranteed “bullseye”.
The climate is no less complex and no one has a crystal ball.

October 18, 2013 6:59 am

The use of statistical tools by “scientist” who do not understand the mathematics is a big problem. I agree that linear trends on small samples from a large complex non-linear population have little predictive quality other than finding out that the system is non-linear. Weather forecasters generally hedge their bets and limit their “predictions” to a few days.

ferd berple
October 18, 2013 7:14 am

Fred the Statistician says:
October 17, 2013 at 10:38 pm
Will the sun rise in the East tomorrow?
========
The sun will rise in a direction that is mostly eastward, but this will change day to day and if you based a prediction of where it would rise in the future months based where it was last month, you would be right or wrong largely as a matter of chance.
Cyclical data does provide predictability, based on the observation that nature tends to be cyclical, otherwise extinction events at the extremes would have long ago made us the observer extinct. Trying to fit linear approximations to cyclical data is a statistical nonsense.