
Guest essay By Kip Hansen
INTRO: Statistical trends never determine future values in a data set. Trends do not and cannot predict future values. If these two statements make you yawn and say “Why would anyone even have to say that? It is self-evident.” then this essay is not for you, you may go do something useful for the next few minutes while others read this. If you had any other reaction, read on. For background, you might want to read this at Andrew Revkin’s NY Times Dot Earth blog.
I have an acquaintance that is a fanatical button collector. He collects buttons at every chance, stores them away, thinks about them every day, reads about buttons and button collecting, spends hours every day sorting his buttons into different little boxes and bins and worries about safeguarding his buttons. Let’s call him simply The Button Collector or BC, for short.
Of course, he doesn’t really collect buttons, he collects dollars, yen, lira, British pounds sterling, escudos, pesos…you get the idea. But he never puts them to any useful purpose, neither really helping himself or helping others, so they might as well just be buttons, so I call him: The Button Collector. BC has millions and millions of buttons – plus 102. For our ease today, we’ll consistently leave off the millions and millions and we’ll say he has just the 102.
On Monday night, at 6 PM, BC counts his buttons and finds he has 102 whole buttons (we will have no half buttons here please); Tuesday night, he counts again: 104 buttons; on Wednesday night, 106. With this information, we can do wonderful statistical-ish things. We can find the average number of buttons over three days (both mean and median). Precisely 104.
We can determine the statistical trend represented by this three-day data set. It is precisely +2 buttons/day. We have no doubts, no error bars, no probabilities (we have 100% certainty for each answer).
How many buttons will there be Friday night, two days later?
If you have answered with any number or a range of numbers, or even let a number pass through your mind, you are absolutely wrong.
The only correct answer is: We have no idea how many buttons he will have Friday night because we cannot see into the future.
But, you might argue, the trend is precisely, perfectly, scientifically statistically +2 buttons/day and two days pass, therefore there will be 110 buttons. All but the final phrase is correct, the last — “therefore there will be 110 buttons” — is wrong.
We know only the numbers of buttons counted each of the three days – the actual measurements of number of buttons. Our little three point trend is just a graphic report about some measurements. We know also, importantly, the model for the taking the measurements – exactly how we measured — a simple count of whole buttons, as in 1, 2, 3, etc..
We know how the data was arrived at (counted), but we don’t know the process by which buttons appear in or disappear from BC’s collection.
If we want to be able to have any reliable idea about future button counts, we must have a correct and complete model of this particular process of button collecting. It is really little use to us to have a generalized model of button collecting processes because we want a specific prediction about this particular process.
Investigating, by our own observation and close interrogation of BC, we find that my eccentric acquaintance has the following apparent button collecting rules:
- He collects only whole buttons – no fractional buttons.
- Odd numbers seem to give him the heebie-jeebies, he only adds or subtracts even numbers of buttons so that he always has an even number in the collection.
- He never changes the total by more than 10 buttons per day.
These are all fictional rules for our example; of course, the actual details could have been anything. We then work these into a tentative model representing the details of this process.
So now that we have a model of the process; how many buttons will there be when counted on Friday, two days from now?
Our new model still predicts 110, based on trend, but the actual number on Friday was 118.
The truth being: we still didn’t know and couldn’t have known.
What we could know on Wednesday about the value on Friday:
- We could know the maximum number of buttons – 106 plus ten twice = 126
- We could know the minimum – 106 minus ten twice = 86
- We could know all the other possible numbers (all even, all between 86 and 126 somewhere). I won’t bother here, but you can see it is 106+0+0, 106+0+2, 106+0+4, etc..
- We could know the probability of the answers, some answers being the result of more than one set of choices. (such as 106+0+2 and 106+2+0)
- We could then go on to figure five day trends, means and medians for each of the possible answers, to a high degree of precision. (We would be hampered by the non-existence of fractional-buttons and the actual set only allowing even numbers, but the trends, means and medians would be statistically precisely correct.)
What we couldn’t know:
- How many buttons there would actually be on Friday.
Why couldn’t we know this? We couldn’t know because our model – our button collecting model – contains no information whatever about causes. We have modeled the changes, the effects, and some of the rules we could discover. We don’t know why and under what circumstances and motivations the Button Collector adds or subtracts buttons – we don’t really understand the process – BC’s button collecting — because we have no data about the causes of the effects we can observe or the rules we can deduce.
And, because we know nothing about causes in our process, our model of the process, being magnificently incomplete, can make no useful predictions whatever from existing measurements.
If we were able to discover the causes effective in the process, and their relative strengths, relationships and conditions, we could improve our model of the process.
Back we go to The Button Collector and under a little stronger persuasion he reveals that he has a secret formula for determining whether or not to add or subtract the numbers of buttons previously observed and a formula for determining this. Armed with this secret formula, which is precise and immutable, we can now adjust our model of this button collecting process.
Testing our new, improved, and finally adjusted model, we run it again, pretending it is Wednesday, and see if it predicts Friday’s value. BINGO! ONLY NOW does it give us an accurate prediction of 118 (the already known actual value) – a perfect prediction of a simple, basic, wholly deterministic (if tricky and secret) process by which my eccentric acquaintance adds and subtracts buttons from his collection.
What can and must we learn from this exercise?
1. No statistical trend, no matter how precisely calculated, regardless of its apparent precision or length, has any effect whatever on future values of a data set – never, never and never. Statistical trends, like the data of which they are created, are effects. They are not causes.
2. Models, not trends, can predict, project, or inform about possible futures, to some sort of accuracy. Models must include all of the causative agents involved which must be modeled correctly for relative effects. It takes a complete, correct and accurate model of a process to reliably predict real world outcomes of that process. Models can and should be tested by their abilities to correctly predict already known values within a data set of the process and then tested again against a real world future. Models also are not themselves causes.
3. Future values of a thing represented by a metric in data set output from a model are caused only by the underlying process being modeled–only the actual process itself is a causative agent and only the actual process determines future real world results.
PS: If you think that this was a silly exercise that didn’t need to be done, you haven’t read the comments section at my essay at Dot Earth. It never hurts to take a quick pass over the basics once in a while.
# # # # #
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
davidmhoffer says:
October 17, 2013 at 9:28 pm
I liken it to a person walking up a hill and down the other side, and measuring their altitude at every step. Even if we know exactly what the length of their stride is, we cannot predict from the past data what their altitude will be on the next stride unless we know exactly what the shape of the hill is. In fact, if all we have is the data from the first 25% of the journey, the data would provide a trend suggesting the person is headed for outer space.
Interestingly, if we stopped our data collection when the person was 5 steps past the crest of the hill and on their way back down, we could say, and correctly so, that the person’s last 10 steps were the highest in the entire record of the journey. It would not change the fact that the person is going down hill.
+++++++++++++++
I was thinking this concept of an explanation to this fine Post! The alarmists are focused on the past 10 years being some of the “warmest in recorded history” because we are near the top of the recent short term relatively natural warm period.
Janice: You have a clear way of looking at information!
Chapter 5. in the 1995 second edition. I am not a statistician. My expertise was in atmospheric degradation of materials (especially the corrosion of metals).
Oh, Mario, you don’t know how much I needed to hear that. Thank you. And, thank you for taking the time to tell me. I am so grateful that you are a regular on WUWT. And, you do, too — with the added benefit of your scientific education and expertise. I hope that, like A-th-y, you are enjoying a happy weekend with your family. Take care. J.
GO, NUKES! #(:))
(uh oh……. I think I hear him (won’t even mention his name — whince)…… stomping over this way right now…… time to go, uh, to go…… get my driver’s license renewed … or something… ANYTHING. Heh.)
**************
Thank you, too, Richard, for your 3:04am today affirmation of the spirit of (though, not the content of) my post. How kind of you to lift me out of the Slough of Despond which is the only downside to the GREAT FUN posting on WUWT is: being ignored (or the illusion of it which, like optical illusions, tricks the mind for a bit into thinking things are what they are not).
**************
@F. H. Haynie (you will never “retire,” you know — and so glad you are here, still teaching) — thanks for the helpful cite. So…. your “… expertise
wasis in atmospheric degradation of materials (especially the corrosion of metals),” hm? They could use you over at Save the Planet Jewelery, Inc.. Can you turn lead into gold? That’s what they are working on right now! Oh, maaaan, those Fantasy Club guys are sooooo funny (and easy to mock).Kip Hansen says:
October 18, 2013 at 11:55 am
“Reply to Richard:
We’ll just have to see what others think about your view. “
——————————
Kip H, this poster pretty much agrees with all of your commentary. It was/is refreshing to read said commentary that was/is not “laced” with the author’s personal biases of the subject matter being discussed, ….. and/or commentary of a PJE nature. Statisticians will claim “importance” of their statistics because it would be silly of them not to.
Statistics based on past observations of cultural activities are useful “tools” for projecting the potential continuation of and/or a future trend associated with cultural activities ….. simply because humans are “creatures of habit” whose habits are highly influenced by the actions and deeds of other humans and which are subject to change depending on the likes and dislikes of said humans.
Statistics based on past observations of the earth’s physical activities are also useful “tools” but only for the research and discovery of the causes and/or effects of said physical activities ….. simply because Mother Nature is not a “creature of habit” and therefore has no habits nor any likes or dislikes of any physical event. And without any habits, likes or dislikes there can be no trend or prediction of any trends. And the aforesaid is confirmed by the stated definition of the word “trend”, to wit:
“trend 1. The general direction in which something tends to move.”
Everything in the universe “tends to move” ….. depending on one’s “point-of-reference”. Even the general direction of the observed retrograde motion that a few of the planets tend to move, ….. but in actuality they don’t.
Ref: http://www.lasalle.edu/~smithsc/Astronomy/retrograd.html
One can “squeeze” a general direction “trend” out of most any data set if the data set is large enough. Trends are like beauty, ….. they are in the eyes of the beholder.
Trends can not predict the future. Only an observer of a trend is capable of making a prediction.
And trends do not change, they just cease to be applicable in some environments and/or are replaced by a new(er) trend.
Hi Janice:
You wrote “And, you do, too — with the added benefit of your scientific education and expertise. ”
++++++++
That’s kind of you Janice. Compared to some people, I am pretty up on the climate sciences. But compared to many here, I am a neophyte. I enjoy being allowed to play with people who are above my level of knowledge! I want to make sure that the folks who comment a lot at WUWT don’t think I am full of myself. My quest is to find what is true and try to wrap my head around it.
I’ve found that most people have their minds made up about certain subjects because they affiliate with a group and consider themselves like-minded. Most of these people do not understand what research is. They watch a documentary or read something in a newspaper and consider that doing research. They are now tainted like a red blood cell with a carbon monoxide molecule bonded to it!
I remember when I first started seeking the truth about climate (after seeing Gore’s movie) I was researching NOAA’s website to find out how they adjusted the data to filter out the urban heat island effect. I can no longer find what I read there. But to paraphrase from memory, they used 5 algorithms (I think) based on demographic information to account for growing population etc – and then apply factors to remove the additional warming of urban temperature readings. The person who wrote that section of the site wrote something like “…and we know that these algorithms work because the adjusted temperatures increased in line with our expectations.”
I was shocked and could not understand how people could not be skeptical of NOAA?
Thank to you all for reading and commenting. I will not be answering individual comments from this time forward — I have another couple of projects that are reaching the time-intensive stages.
If any of you have questions or comments you feel I need to answer, please feel free to contact me directly at kip at my domain i4.net.
Blue Skies and Following seas,
Kip Hansen
“Mother Nature is not a “creature of habit” and therefore has no habits”
So the Earth invert its rotation or even rotates along longitude?
I think the problem of this text is the over desire to explain things with binary concepts and simple words, the seeking of elegance of simplicity even if it is not possible in several cases. This reminds me of Climate Science today and CO2 which is worse.
“… algorithms work because the adjusted temperatures increased in line with our expectations… .” (NASA website quoted by Mario Lento)
Well, you gotta admire their candor!
LOL, CO2 in the blood and ON THE BRAIN. Good one.
***************
Hi, Alex S (at 7:11pm) — The intrinsic merit of Kip Hansen’s post aside, your equating his approach to the Fantasy Science Club’s CO2 conjecture is inaccurate.
The climate models of the IPCC suffer far more from OVER-elaboration and needless complexity which, as some famous scientist I can’t remember (he’s quoted in Bob Tisdale’s book Climate Models Fail) said is: the sign of mediocrity.
Oh, bother — …. just went to look it up. Here it is from page 36 of Climate Models Fail — Box! That’s the guy’s name…:
See this thread for some EXCELLENT comments about the GCMs’ code: http://wattsupwiththat.com/2013/07/27/another-uncertainty-for-climate-models-different-results-on-different-computers-using-the-same-code/
Persevere through all of the over 200 comments; you’ll find gems like this one:
“… the entire range of past variation is equally likely under their assumptions and procedures… .” [Brian H 12:22 PM 7/29/13 – edited emphasis]
In a nutshell: they are junk.
Note: Ed Ohugima did a back-of-the-napkin calculation last summer and was able to duplicate one projected temperature rise for the 21st century (it was 6 or 8 degrees F, I think) with a simple interpolation. This was further evidence that the IPCC’s models are super-expensive, gee-whiz, gadgets that do nothing more than a basic math calculator can do.
I have NO IDEA how all those line ends/returns ended up in there! Hm, maybe “WordPress” is actually a climate model… .
Paul Coppin said @ur momisugly October 18, 2013 at 5:27 am
Some cogent points on rationalisation of beliefs. Rather well put 🙂
Well worth reading Briggs’ little essay: The Data is the Data if you have not already done so:
http://wmbriggs.com/blog/?p=6854
Pompous Git! I’ve missed you. How’s life down there on the island, now that spring has begun? Have those awful rains stopped yet? Hope all is well. J.
Thanks for that witty and insightful essay, Pompous — too tired to read it with the care it deserved, but I got the gist and that was great. Like this esp.:
“… we can always find a better fitting model. And then we’d still have to wait for new observations to check it.
LOL.
The Pompous Git:
Your post at October 19, 2013 at 11:42 pm says
Yes, that is worth reading, and in the context of this thread its most important statements are these
The data is the data. But there is NO DATA for the future. The model is the prediction which we – n.b. we and not the data – make for the future. And we construct that model as a trend which can be extrapolated from the future.
The model in the article is a linear trend. That is clearly not the correct trend model for global temperature time series: the most recent 17 years demonstrates that.
A better trend model is provided by Akosofu. It is better because that trend predicted the very recent global temperature stasis. This link goes to discussion relating to it and shows it in graphical form.
http://wattsupwiththat.com/2013/09/09/syun-akasofus-work-provokes-journal-resignation/
Richard
extropolated for the future
not
from the future
sorry
Janice Moore said @ur momisugly October 19, 2013 at 11:53 pm
This is the worst spring for gardening since the one in the mid-80s when I started market gardening! We have had rain for nearly every day for the last month. Yesterday was fine and warm but the wind blew ~55km/hr gales. We do live in the Roaring Forties. Today was fine and warm, but the rain started again about an hour ago. So it goes…
@ur momisugly Richard Courtney
I think Paul Coppin nailed the head on the hit. There’s an epistemological issue here. Belief versus knowledge. Most commenters in this thread have missed this. Briggs is certainly aware of it. I have yet to take Kip’s recommendation of visiting dot earth. I suspect it’s ugly over there.
Yes, Akosofu’s model is very interesting. It will be even more interesting in 2020 when we will be able to view the temperatures of Earth over the next seven years with… 20 20 hindsight.
While I agree Jaynes is an excellent read for statistics, I suggest Richard Taylor’s Metaphysics for those making “an unusually obstinate effort to think clearly”.
The Pompous Git:
Thankyou for your reply to me at October 20, 2013 at 1:01 am in which you say
I am assuming you are referring to the post of Paul Coppin in this thread at October 18, 2013 at 5:27 am. In that post he writes
There is an epistemological issue here but it is NOT the stark blackwhite issue of “Belief versus knowledge” which you and he assert. Few things divide that clearly (as I am well aware if only because of the activity I am to conduct in the next few minutes).
Knowledge is what is known and can be justified by evidence.
Belief is what the believer treats as being known because it is accepted on faith although it cannot be justified by evidence.
Ideas and inferences exist between knowledge and belief. They include large amounts of uncertainty and doubt, but they enable us to operate in the real world.
For example, I ‘know’ the light will come on in my hall when I turn on the hall light switch. No knowledge is completely certain and there were times when my knowledge was wrong because my hall light bulb had failed. But I ‘know’ the light will come on when I turn on the switch.
Others do not share my knowledge of my hall light because they have never visited my home. Some of them may believe my hall light is operated by a switch because they trust my word on the matter. Others mat believe I do not have a hall light because they so distrust my word on anything that they accept my claim of a hall light ‘proves’ I don’t have one.
Between these are people who accept that most houses in the UK have hall lights which are operated by a switch. So, these people accept as a useful and probable working hypothesis that my house has a hall light which is operated by a switch.
Determining a trend in a time series data set is similar to accepting that my house has a hall light because most houses in the UK have hall lights which are operated by a switch. The trend assesses what is known to determine what is probable.
Determining a trend and extrapolating that trend to obtain a prediction does not generate knowledge and it does not create belief: the determined trend provides a useful and probable working hypothesis about what will happen on the basis of what is known to have happened.
The hypothesis may be wrong (e.g. because the trend is modelled as being linear when it is not) or may turn out to be wrong (e.g. because the behaviour changes from its past trend). But the future cannot be known. Determining a trend indicates what is likely to happen for at least the near term future.
And now I must rush away to deal with issues of knowledge and beliefs but I will check back later today.
Richard
Even when we understand the processes involved in which result in certain trends, another danger exists when we do not apply proper constraints on those processes when modeled.
I’m not going to try to outdo Mark Twain’s way of expaining the fallacy of extrapolating the future or the past without understanding the correct boundaries of the underlaying processes of a trend. It’s a very short read, and can very aptly be applied to the subject at hand:
http://www.lhup.edu/~dsimanek/twain.htm
“Hi, Alex S (at 7:11pm) — The intrinsic merit of Kip Hansen’s post aside, your equating his approach to the Fantasy Science Club’s CO2 conjecture is inaccurate.
The climate models of the IPCC suffer far more from OVER-elaboration and needless complexity which, as some famous scientist I can’t remember (he’s quoted in Bob Tisdale’s book Climate Models Fail) said is: the sign of mediocrity”
No, the so called complexity of IPCC is a claim you make due to the $$$ and hardware they spend and the fact they have to produce stuff around their dogma. They have no complexity in their dogma, they like the author put away many situations that make difficult to state their case.
IPCC doesn’t show any complexity of a list of inputs might affect climate, more or less it is due to CO2 and “science” is settled.
Imagine if IPCC would have included proper research about clouds, sun etc…
Same here. The list of cases is enormous, and several have shown here examples of trends helping predict the future.
The temptation to simplify with a binary thinking is the undoing of this text when there are several shades of grey.
I wouldn’t have disagreed if it was simply stated that trend based prediction is usually an inferior technique to predict the future in most cases.
But everyone seems to be after their own e=mc2
Everyone is an aesthete.
AlexS:
In your post at October 20, 2013 at 10:53 am you say
I agree. Please see my post at October 20, 2013 at 2:04 am. This link jumps to it
http://wattsupwiththat.com/2013/10/17/the-button-collector-or-when-does-trend-predict-future-values/#comment-1453686
Richard
richardscourtney said @ur momisugly October 20, 2013 at 2:04 am
Richard, I must on this occasion disagree. Investment advisers, actuaries, and ever so many others rely upon their generating belief in the validity of their extrapolations. And we know this is so because those persuaded to entertain the validity of those beliefs hand over their hard-earned money to invest.
This is however beside Kip and Briggs’ point: beware reification.
The Pompous Git:
Thankyou for your post at October 20, 2013 at 11:35 am in reply to my post at October 20, 2013 at 2:04 am
OK. You say we disagree. Perhaps I am misunderstanding you because I do not see the disagreement.
Several times in this thread – including in the post you say you disagree – I have said,
“But the future cannot be known”.
Any prediction is an assertion of what is likely to happen. Yes, some people believe predictions which may be the predictions extrapolated trends, or scientific model outputs or horoscopes, or etc.. But the believers’ beliefs do NOT mean the predictions are reality: the predictions are estimates of what is likely to happen.
As I see it, and as I tried to explain, the predictions obtained from extrapolated trends are NOT what is known to happen: they are not knowledge because the future cannot be known. Some people choose to believe the predictions obtained from extrapolated trends – as some people choose to believe horoscopes – but that is the failing of the believers: it is not the fault of the extrapolated trends. The believers error is reification, and it is an error I am warning against making in my statement you have quoted in your post I am answering.
The predictions obtained from extrapolated trends are an indication of the future based on what has happened in the past. As I explained, those predictions may be wrong but they are useful because trends continue until they don’t. Hence, the predictions give indications of the future which are better than chance, and doing better than chance is an advantage (as any casino manager will tell you).
Richard
Richard, you stated “Determining a trend and extrapolating that trend to obtain a prediction… does not create belief” and it is that portion of your statement with which I disagreed. I am perfectly happy to accept the Aristotelian account of knowledge as justified true belief. We agree that extrapolation into the future does not generate knowledge. But if that extrapolation does not generate belief, then what does it generate? You seem to be saying it generates “X” and people subsequently choose to believe, or not based on “X”. I am curious to understand what this “X” is.