The Button Collector or When does trend predict future values?

So, you know now who to call if YOU loose a bu...
How many buttons will he have on Friday? (Photo credit: Wikipedia)

Guest essay By Kip Hansen

INTRO: Statistical trends never determine future values in a data set. Trends do not and cannot predict future values. If these two statements make you yawn and say “Why would anyone even have to say that? It is self-evident.” then this essay is not for you, you may go do something useful for the next few minutes while others read this. If you had any other reaction, read on. For background, you might want to read this at Andrew Revkin’s NY Times Dot Earth blog.

­­­­­­I have an acquaintance that is a fanatical button collector. He collects buttons at every chance, stores them away, thinks about them every day, reads about buttons and button collecting, spends hours every day sorting his buttons into different little boxes and bins and worries about safeguarding his buttons. Let’s call him simply The Button Collector or BC, for short.

Of course, he doesn’t really collect buttons, he collects dollars, yen, lira, British pounds sterling, escudos, pesos…you get the idea. But he never puts them to any useful purpose, neither really helping himself or helping others, so they might as well just be buttons, so I call him: The Button Collector. BC has millions and millions of buttons – plus 102. For our ease today, we’ll consistently leave off the millions and millions and we’ll say he has just the 102.

On Monday night, at 6 PM, BC counts his buttons and finds he has 102 whole buttons (we will have no half buttons here please); Tuesday night, he counts again: 104 buttons; on Wednesday night, 106. With this information, we can do wonderful statistical-ish things. We can find the average number of buttons over three days (both mean and median). Precisely 104.

We can determine the statistical trend represented by this three-day data set. It is precisely +2 buttons/day. We have no doubts, no error bars, no probabilities (we have 100% certainty for each answer).

How many buttons will there be Friday night, two days later? 

If you have answered with any number or a range of numbers, or even let a number pass through your mind, you are absolutely wrong.

The only correct answer is: We have no idea how many buttons he will have Friday night because we cannot see into the future.

But, you might argue, the trend is precisely, perfectly, scientifically statistically +2 buttons/day and two days pass, therefore there will be 110 buttons. All but the final phrase is correct, the last — “therefore there will be 110 buttons” — is wrong.

We know only the numbers of buttons counted each of the three days – the actual measurements of number of buttons. Our little three point trend is just a graphic report about some measurements. We know also, importantly, the model for the taking the measurements – exactly how we measured — a simple count of whole buttons, as in 1, 2, 3, etc..

We know how the data was arrived at (counted), but we don’t know the process by which buttons appear in or disappear from BC’s collection.

If we want to be able to have any reliable idea about future button counts, we must have a correct and complete model of this particular process of button collecting. It is really little use to us to have a generalized model of button collecting processes because we want a specific prediction about this particular process.

Investigating, by our own observation and close interrogation of BC, we find that my eccentric acquaintance has the following apparent button collecting rules:

  • He collects only whole buttons – no fractional buttons.
  • Odd numbers seem to give him the heebie-jeebies, he only adds or subtracts even numbers of buttons so that he always has an even number in the collection.
  • He never changes the total by more than 10 buttons per day.

These are all fictional rules for our example; of course, the actual details could have been anything. We then work these into a tentative model representing the details of this process.

So now that we have a model of the process; how many buttons will there be when counted on Friday, two days from now?

Our new model still predicts 110, based on trend, but the actual number on Friday was 118.

The truth being: we still didn’t know and couldn’t have known.

What we could know on Wednesday about the value on Friday:

  • We could know the maximum number of buttons – 106 plus ten twice = 126
  • We could know the minimum – 106 minus ten twice = 86
  • We could know all the other possible numbers (all even, all between 86 and 126 somewhere). I won’t bother here, but you can see it is 106+0+0, 106+0+2, 106+0+4, etc..
  • We could know the probability of the answers, some answers being the result of more than one set of choices. (such as 106+0+2 and 106+2+0)
  • We could then go on to figure five day trends, means and medians for each of the possible answers, to a high degree of precision. (We would be hampered by the non-existence of fractional-buttons and the actual set only allowing even numbers, but the trends, means and medians would be statistically precisely correct.)

What we couldn’t know:

  • How many buttons there would actually be on Friday.

Why couldn’t we know this? We couldn’t know because our model – our button collecting model – contains no information whatever about causes. We have modeled the changes, the effects, and some of the rules we could discover. We don’t know why and under what circumstances and motivations the Button Collector adds or subtracts buttons – we don’t really understand the process – BC’s button collecting because we have no data about the causes of the effects we can observe or the rules we can deduce.

And, because we know nothing about causes in our process, our model of the process, being magnificently incomplete, can make no useful predictions whatever from existing measurements.

If we were able to discover the causes effective in the process, and their relative strengths, relationships and conditions, we could improve our model of the process.

Back we go to The Button Collector and under a little stronger persuasion he reveals that he has a secret formula for determining whether or not to add or subtract the numbers of buttons previously observed and a formula for determining this. Armed with this secret formula, which is precise and immutable, we can now adjust our model of this button collecting process.

Testing our new, improved, and finally adjusted model, we run it again, pretending it is Wednesday, and see if it predicts Friday’s value. BINGO! ONLY NOW does it give us an accurate prediction of 118 (the already known actual value) – a perfect prediction of a simple, basic, wholly deterministic (if tricky and secret) process by which my eccentric acquaintance adds and subtracts buttons from his collection.

What can and must we learn from this exercise?

1. No statistical trend, no matter how precisely calculated, regardless of its apparent precision or length, has any effect whatever on future values of a data set – never, never and never. Statistical trends, like the data of which they are created, are effects. They are not causes.

2. Models, not trends, can predict, project, or inform about possible futures, to some sort of accuracy. Models must include all of the causative agents involved which must be modeled correctly for relative effects. It takes a complete, correct and accurate model of a process to reliably predict real world outcomes of that process. Models can and should be tested by their abilities to correctly predict already known values within a data set of the process and then tested again against a real world future. Models also are not themselves causes.

3. Future values of a thing represented by a metric in data set output from a model are caused only by the underlying process being modeled–only the actual process itself is a causative agent and only the actual process determines future real world results.

PS: If you think that this was a silly exercise that didn’t need to be done, you haven’t read the comments section at my essay at Dot Earth. It never hurts to take a quick pass over the basics once in a while.

# # # # #

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

222 Comments
Inline Feedbacks
View all comments
RoHa
October 18, 2013 1:49 am

It’s awfully difficult even for me to predict that we are doomed because of buttons.

steveta_uk
October 18, 2013 1:50 am

In my humble opinion, whenever anyone compares any scientific statistical argument to the stock market, they’ve lost the plot.
When I take a hot caserole pot from the oven, I use oven gloves so that I don’t burn myself. An hour later, then I take the same caserole pot from the table to sink, I don’t use oven gloves. I have successfully predicted the future. Sure, I can’t know the exact temperature, but I can be pretty sure it will be close to room temperature. I can even take a few temperature readings soon after removing it from the oven, and make a pretty good guess at when it will be comfortable to hold the pot.
Note that unlike the stop market, this is unlikely to be affected news of some CEO having an affair, or by whether the Republicans and the Democrats agree on some particluar financial arrangement. Unlike the stock market, the pot is unlikely to be impacted by whether some trader had a headache this morning. Comparing the two is stupid
In fact, the author of this posting completely agrees. He says:

Armed with this secret formula, which is precise and immutable, we can now adjust our model of this button collecting process.

And here we get to the climatology comparisons. Some climate scientists seem to believe that they know the secret formula – in fact, they go further and claim it isn’t secret, but common knowledge based on physical principles – and using this formula they can predict the future using computer-based models. Other scientists, and much of the WUWT readership, believe that the formula is too simplistic, that it doesn’t allow for a variety of internal and external influences, or that it is tuned incorrectly so that some real influences (such as CO2 forcing) are assigned too much of an influence.
But it isn’t the stock market. And it isn’t some random collection of buttons. So don’t make a silly comparison to ‘prove’ a point.

steveta_uk
October 18, 2013 1:52 am

“stop market” -> “stock market”

Roy Grainger
October 18, 2013 1:54 am

In the example given, by analysing all past data, you would be able to make a prediction to some level of confidence (say 90%) what the minimum & maximum number of buttons the next day was likely to be. This is done all the time in many fields – for example we don’t know how many particles a radioactive source will emit in the next minute, and we don’t know the underlying mechanism which causes a particular atom to split, but we can estimate a range of values and we can use that knowledge for useful purposes (in a nuclear power station for example). To characterise this nuanced outcome as “We have no idea how many buttons he will have Friday night because we cannot see into the future” is absurd and just plain wrong – we have an “idea” but we don’t know for certain.

mhx
October 18, 2013 2:06 am

Suppose that a car at a distance of 500 meters starts driving in your direction and for 490 meters it drives in a straight line at constant velocity. The car has no driver. It has been programmed. You don’t know the program. You know nothing about the causes that make the car drive in a straight line for about 490 meters. What would you do?

Jan Smit
October 18, 2013 2:09 am

On statistics, it is estimated that 50% of all people who have ever lived on earth are alive today. Does this mean I only have a 50% chance of dying? 😉
Of course this post is overly simplified, it’s loosely applying the ad absurdum rhetorical tool. I’m sure most folk frequenting this particular virtual space are perfectly aware that in practice we can attach PROBABILITIES to certain potential outcomes based on HISTORICAL trends derived from hard data.
Nothing wrong with that, and many people base important decisions (investment, policy, research, travel, etc.) on that very concept. In fact we all do this constantly. It’s an integral part of human existence. From the moment we wake up of a morning, we expend a great deal of mental energy throughout the day subjectively assessing risk on the basis of the cognitive models we have constructed over time in accordance with the lessons life has taught us (just take the drive to work, for instance). It’s part of the subjective cost/benefit analyses that life demands of us all continually, and we are all far more adept at it than many people realize. (Indeed, the better we are at this, the more Antifragile we become.)
The point is, there is a huge difference between modeling probable outcomes of physically quantifiable processes, and modeling trends arising from human action. The former entails a higher objectivity quotient, the latter a great deal more subjectivity, thus making it less predictable based on existing trends. And I think that that is an important distinction to make in this thread. (I call this distinction the Light/Life Ratio.)
However, both types of forecasting are at all times subject to Taleb’s now proverbial Black Swan Event (meteor impact, unforeseen geoplitical event, etc.). Obviously, therefore, even the highest probabilities properly assinged to a given trendline can miss the mark completely. And it’s this fact that I think Kip Hansen is getting at here when he asserts we cannot predict the future using past events.
I imagine that the above is as plain as a pikestaff to most folk here, but I suspect that there are a great many others for whom this is not so obvious. And it’s their hearts and minds we seek to win from the prophets of the Great and Terrible Day of Global Thermageddon!
Just remember, “He is no friend that guides not, but ridicules, when one speak from ignorance”.

October 18, 2013 2:22 am

Very stimulating, as are the responses from the commenters.

Donald A. Neill
October 18, 2013 2:34 am

I would add a caution to the author’s conclusions. No matter how perfectly one understands the underlying causes of a phenomenon, models will still utterly fail if the phenomenon being modelled is inherently nonlinear. As a simple example, consider orbital mechanics. We understand Newtonian mechanics (as modified by relativity) with sufficient precision to be able to predict the ground paths of solar eclipses, the trajectory of Voyager or Cassini, or the return of Haley’s Comet. These are linear systems and thus can be modelled with some predictive accuracy. However, a nonlinear system (like weather) cannot be modelled over any significant duration because it is sensitive to infinitesimal changes in starting conditions. Nonlinearity prevents prediction over anything beyond the shortest timelines. Lorenz recognized this in the 60s, and the IPCC acknowledged it decades ago (and then promptly swept scientific reality under the carpet and spent the next several years desperately clinging to a politically approved but empirically falsified hypothetical linear relationship between anthropogenic CO2 emissions and delta T).

gnomish
October 18, 2013 2:48 am

yes, observa- a single contradiction falsifies a logical proposition.
on the other hand, it is the nature of an identity that it is never, in the specified context, falsified – ever.
those are the basics of epistemology and the nature of knowledge.
but the law of gravity is not a black swan.
tomorrow will come whether anybody believes it or not.
and that will not be falsified.

Mindert Eiting
October 18, 2013 2:56 am

This issue was treated once very well by Karl Popper, e.g. in The poverty of historicism. Is has nothing to do with statistics as it can be summarized by his statement that regularities or not laws of nature. Even the sun-rise tomorrow is not a law of nature and cannot be predicted. We do not even know the probability of this event.

Elizabeth
October 18, 2013 3:01 am

This is the reason there are fires EVERY year in Australia
http://news.ninemsn.com.au/national/2013/10/14/08/19/nsw-mayor-slams-rfs-on-backburning
Blame square on AGW fanatics like T Flannery etc….

October 18, 2013 3:11 am

gnomish says:
October 18, 2013 at 2:48 am
tomorrow will come whether anybody believes it or not.
and that will not be falsified.
———————————–
There is only the eternal now, the same as yesterday, today, and tomorrow, which ;illusionary; concepts are only relative perspectives of the mortal mind which observes the apparent kaleidoscopic environment presented to its inherent senses.

Alan the Brit
October 18, 2013 3:22 am

Wonderfully put, a clear & simple explanation of statistical modelling!
My Higher National Certificate maths teacher, one Ms Mallik, who trained as an engineer, said interpolate with some confidence, extrapolate at our peril!!!! 😉

observa
October 18, 2013 3:37 am


He forgot to add “The science is settled”!
and the toilet salesman-
Statistics show brighter kids come from homes with 2 toilets. What about a third toilet there mum and dad…?
Meanwhile in Oz we’re obviously headed for the next Ice Age according to recent trends in high places-
http://blogs.news.com.au/heraldsun/andrewbolt/index.php/heraldsun/comments/what_does_adam_bandt_say_about_this_cold/

ancientmariner
October 18, 2013 3:42 am

MHX says:
Suppose that a car at a distance of 500 meters starts driving in your direction and for 490 meters it drives in a straight line at constant velocity. The car has no driver. It has been programmed. You don’t know the program. You know nothing about the causes that make the car drive in a straight line for about 490 meters. What would you do?
well if the car had been going left and right and then went off in another direction for 17 meters AND if I moved I would have to drop and break pretty much everything I had earned in the last 10 years then maybe I would wait and see what happens…

Ted Carmichael
October 18, 2013 3:51 am

Hi, Kip. I’m afraid I agree with Fred (the Statistician) above, and others who have made similar comments. I think you have gone quite overboard by saying we have “no idea” what will happen next. This is the sort of black and white thinking that gets a lot of people into rhetorical trouble. Statistics is not about black and white – it is about quantifying the grey.
On a related point, you said “Models, not trends, can predict, project, or inform about possible futures, to some sort of accuracy.” This is true, about models, but it is (of course!) also true about trends. You then add, “Models must include all of the causative agents involved which must be modeled correctly for relative effects.” This is absolutely incorrect.
Models *never* include “all of the causative agents involved.” Never. A model is a simplification of the real world. By definition, some things are left out. We hope, when building our model, that only trivial causes (or trends) are left out, and what we have is a pretty good approximation. But models are always a simplification of the real world.
Further, it is entirely reasonable to sometimes leave out ALL causative agents. This would be an empirical model – i.e., a model based only on data, and one that tries to capture or describe the trends found in our data set. Absent any other knowledge, an empirical model will be the best predictor of future values – i.e., better than random guessing (which – absent any other knowledge – would be our only other option.)
Finally I would submit that empirical modeling is always the first step of any attempt to understand a phenomenon. That is, it is the first step in the scientific process. You first notice a pattern in nature or society. You try to describe that pattern, and – if it persists to some degree – you then investigate the pattern to look for causal agents – the underlying mechanisms that produce the pattern. But you have to notice the pattern first, and “noticing a pattern” is another way of saying “building an empirical model.”
The Theory of Gravity is the classic example of this. The empirical results are so thoroughly robust and understood that we even call it a Law. But we don’t know what “causes” gravity … we can model it very, very well, but we don’t know the cause. (Yes, there are a few hypotheses of late; but these are as yet uncertain.)
Cheers.

acementhead
October 18, 2013 3:54 am

Jan Smit says:
October 18, 2013 at 2:09 am

“On statistics, it is estimated that 50% of all people who have ever lived on earth are alive today.”

Please don’t make up such rubbish, as you will give WUWT a name for inaccuracy. The estimate is wildly wrong. Try circa 7%.
Why do I always have to be “the bad guy” and correct rubbish on this site? Note that I correct probably less than 1% of it. Probably much less than .1%.

observa
October 18, 2013 3:58 am

And in case you northern hemispheric folk were wondering about that record minimum for Canberra (we’ve only had a reasonably comprehensive Stevenson Screen system since 1910 recall), the bushfires around Sydney you’re witnessing on the news are occurring only 240km ( 149 miles) from Canberra.
Yes folks it’s strong hot northerly winds in Sydney, coupled with high fuel loads after a couple of wet years and all those people who like living among the gum trees rather than tar and cement.

A Crooks
October 18, 2013 4:03 am

The stocks numbers are just numbers that depend an infinite number of complexly related variables – but in the end they are just data.
Temperature data is just numbers that depend on a infinite number of complexly related variables – but in the end it is just data.
They are both just data sets that can be looked at just like data sets. There is nothing sacrosanct about temperature data that means its off limits to speculation
Climate though has more inertia – that’s why its took 15 years for anyone to notice that the data was definitively heading off the IPCC’s trend. And why it will take another 15 years before anyone can agree what happens next. Sticking to Akasofu’s trend and that’s good enough for me for the short term until I can see a deviation away from that. Then we re-think.
The point here is if we all sit here and just say there are no such thing as trends you effectively are saying that the future is unknowable which doesn’t help anyone. It effectively leaves us with the IPCCs model lines out to 2100 as the only people with skin in the game.
Whats science all about if it isnt putting up a speculative hypothesis and seeing how it works out – there is no value in waiting until the end of the universe and saying Oh, so that what happened!

mitigatedsceptic
October 18, 2013 4:15 am

If every event is causally related to every other event, past or present, forming a compete model is a forbiddingly large enterprise. Worse if the causal relations are non-linear. In brief – trends and models are just masks to conceal our ignorance. However all is not lost – we can and do form very accurate predictions – so accurate that they are ALWAYS correct.
If I am x% (it does not matter what value ‘x’ has – this does not affect the accuracy of our prediction) certain that the sun will or will not rise tomorrow. So long as I do not claim perfect knowledge or absolute certainty, I shall always produce correct predictions; even better, I can never produce wrong ones!
This is why IPCC and all the other quacks playing with models/ and trends to produce forecasts/predictions can continue to syphon money from tax payers – they are never wrong! Even if their predictions never correspond to the events, they can (and do) claim increased confidence in the skill of their models and they can justify whatever causal relations they may fancy – yes – even that Great Greenhouse in the Sky!

Karl
October 18, 2013 4:16 am

Ted
We do actually have no idea what will happen next based on the trend. The trend is simply a statistical analysis of the data set. It includes no context, simply an equation that is defined by the data. —- It contains nothing useful with respect to predicting the next measurement – zip, zero, nada.
Models, OTOH, are sometimes useful predictors of future behavior, measurements, values, or occurances, depending upon the fidelity by which they capture/represent the phenomena they are modeling.
All too often — even after a correct explanation, people CONFLATE the predictive ability of models and trends.

Jan Smit
October 18, 2013 4:21 am


Point taken, sorry for being so sloppy! However, it was only meant as joke about how statistics can be used to assert any nonsense (hence the winking smiley). But I consider myself suitably chastised…
Here’s a good link to support acementhead’s correction of my spurious nonsense:
http://www.prb.org/Publications/Articles/2002/HowManyPeopleHaveEverLivedonEarth.aspx
Do you agree though, acementhead, that statistics can and are used to support all manner of claptrap?

tom0mason
October 18, 2013 4:23 am

Predicting future trends ain’t what it used to be!

Samuel C Cogar
October 18, 2013 4:24 am

mhx says:
October 18, 2013 at 2:06 am
“Suppose that a car at a distance of 500 meters starts driving in your direction and for 490 meters it drives in a straight line at constant velocity. What would you do? “
—————
I would predict that the right-front tire would “blow-out” at the 492 meter mark and the car would veer off the road and crash. That is unless my earlier prediction proved to be correct that it would run out of fuel at the 491 meter mark. 🙂
Predicting earth’s climate for the next 100 years is akin to predicting Super Bowl or NBA winners for the next 10 years. Iffen you think you can, …. go for it, ……. place your “bets” today.
If one could magically remove all of the CO2 from the atmosphere there would be no measurable effect/change in the near-surface air temperatures and that is because 398 molecules of CO2 are irrelevant when intermixed with 20,000 to 40,000 molecules of H2O vapor.
Thermal energy in the near-surface atmosphere is NOT cumulative from one (1) week to the next and surely not one (1) year to the next.

Paul Mackey
October 18, 2013 4:39 am

Long Term Capital Mamangement……enough said.