The Button Collector Revisited: Graphs, Trends and Hypotheses

Guest Essay by Kip Hansen

 

dotlongdog-blog350Prologue:    This essay is a follow-up to two previous essays on the topic of the usefulness of trend lines [trends] in prediction.  Readers may not be familiar with these two essays as they were written years ago, and if you wish, you should read them through first:

  1. Your Dot: On Walking Dogs and Warming Trends posted in Oct 2013 at Andy Revkin’s NY Times Opinion Section blog, Dot Earth. Make sure to watch the original Doggie Walkin’ Man animation, it is only 1 minute long.
  2. The Button Collector or When does trend predict future values? posted a few days later here at WUWT (but 4 years ago!)

Trigger Warning:  This post contains the message “Trends do not and cannot predict future values” .  If this idea is threatening or potentially distressing, please stop reading now.

If the trigger warning confuses you,  please first read the two items above and all my comments and answers [to the same questions you will  have] in the above two essays, it will save us both a lot of time.

I’ll begin this post commenting on an ancient comment  to a follow-up Dot Earth column by Andrew C. Revkin, “Warming Trend and Variations on a Greenhouse-Heated Planet” (Dec. 8, 2014). [Alas, while the link is still good, Dot Earth is no longer, it has gone way of my old blog, The Bad Science Times, which faded away in the early 1990s.)  Revkin’s piece repeated the Doggie Walking Animation, and contained a link to my response.   This comment, from Dr. Eric Steig, Professor, Earth & Space Sci. at the University of Washington, where he is Director of the IsoLab and is listed on his faculty page as a founding member and contributor to the influential [their word] climate science web site, “RealClimate.org“, says:

“Kip Hansen’s “critique” of the dog-walking cartoon, is clever, and completely missing the point. Yes, the commentators of the original cartoon should not have said “the trend determines the future”; that was poorly worded. But climate forcing (CO2, mostly) does determine the trend, and the trend (where the man is walking) does determine where the dog will go, on average.”  (my emphasis)

Dr. Steig, I believe, has simply “poorly worded” this response.  He surely means that the climate forcings, which themselves are trending upwards, do/will determine (cause) future temperatures to be higher (“where the dog will go, on average”). He is entitled to that opinion but he errs when he insists “the trend (where the man is walking) does determine where the dog will go”.  It is this repeated, almost universally used,  imprecise choice of language that causes a great deal of misunderstanding and trouble for the rest of the English speaking world (and I suppose for others after literal translations) when dealing with numbers, statistics, graphs and trend lines.  People, students, journalists, readers, audiences…begin to actually believe that it is the trend itself that is causing (determining) future values. 

Many of you will say to your selves, “Stuff and nonsense!  Nobody believes such a thing.”   I didn’t think so either…but read the comments to either of my two essays… you will be astonished.

 

Data points, lines and graphs:

[Warning:  These are all very simple points. If you are in a hurry, just scroll down and look at the images.]

Let’s look at the definition of a trend line:  “A line on a graph showing the general direction that a group of points seem to be heading.”  Or another version “A trend line (also called the line of best fit) is a line we add to a graph to show the general direction in which points seem to be going.”

Here’s an example (mostly in pictures):

whats_wrong_with_this

Trend lines are added to graphs of existing data to show “the general direction in which [the data] points seem to be going.”  Now, let’s clarify that a little bit — more precisely,  the trend line only actual shows the general direction in which [the data] points have gone”  — and one could add — “so far”.

trends_are_only_valid

That seems awfully picky, doesn’t it?  But it is very important to our correct understanding of what a data graph is — it is a visualization of existing data — the data that we actually have — what has actually been measured.   We would all agree that adding data points to either end of the graph — data that we just made up that had not actually been measured or found experimentally — would be fraudulent.  Yet we hardly ever see anyone object to “trend lines” that extend far beyond the actual data shown on a graph — usually in both directions.  Sometimes this is just lazy graphics work.   Sometimes it is intentional to imply [unjustifiably] that past data and future data would be in line with the trend line.  However, just to be clear, if there is no data for “before” and “after” then that assumption cannot and should not be made.

Now, one (or two) more little points:

whats_wrong_eggnog

I’ll answer in a graphic:

whats_wrong_eggnog_ans

But (isn’t there always a ‘but’?):

eggnog_without_traces

Traces added to join data points on a graph can sometimes be misunderstood to represent the data that might exist between the data points shown.  More properly, the graph would ONLY show the data points if that is all the data we have — but, as illustrated above,  we are not really used to seeing time series graphs that way – we like to see the little lines march across time connecting the values.  That’s fine as long as we don’t let ourselves be fooled into thinking that the lines represent any data. They do not and one should not let the little lines fool you into thinking that the intervening data lays along those little lines.  It might…it might not…but there is no data, at least on the graph, to support that idea.

For eggnog sales, I have modified part of the graph to match reality:

eggnof_monthly

This is one of the reasons that graphing something like “annual average data” can present wildly misleading information — the trace lines between the annual average points are easily mistaken for how the data behaved during the intermediate time — between year-end totals or yearly averages.  Graphing just annual averages or global averages easily obscures important information about the dynamics of the system that generates the data.  In some cases, like eggnog, looking only at individual monthly sales, like July sales figures (which are traditionally near zero),  would be very discouraging and could cause an eggnog producer to vastly underestimate yearly sales potential.

There are several good information sources on the proper use of graphs — and the common ways in which graphs are misused and malformed – either out of ignorance or to intentionally spin the message for propaganda purposes.  We see them almost everywhere, not just in CliSci.

Here’s two classic examples:

temperatures_rescaled

rescaled_GAST

On both of the above graphs, there is another invisible feature — error bars (or confidence intervals even) — invisible because they are entirely missing.    In reality, values before 1900 are “vague wild guesses”, confidence increases from 1900 – 1950 to “guesses based on some very imprecise, spatially thin data”,  confidence increases again 1950-1990s to “educated guesses”, and finally, in the satellite era, “educated guesses based on computational hubris.”

That’s the intro — a few “we all already knew all that!” [“Wha’da’ya think we are?  Stupid?”] points — of which we all need to remind ourselves every once in a while.

 

The Button Collector:  Revisited

My two previous essays on Trends focused on “The Button Collector”  — let me re-introduce him:

I have an acquaintance [actually, I have to admit, he is a relative] that is a fanatical button collector. He collects buttons at every chance, stores them away, thinks about them every day, reads about buttons and button collecting, spends hours every day sorting his buttons into different little boxes and bins and worries about safeguarding his buttons. Let’s call him simply The Button Collector.  Of course, he doesn’t really collect buttons, he collects dollars, yen, lira, British pounds sterling, escudos, pesos…you get the idea. But he never puts them to any useful purpose, neither really helping himself nor helping others, so they might as well just be buttons.

He has, latest count, millions and millions of buttons, exactly, on Sunday night.  So, we can ignore the “millions and millions” part and just say he has zero buttons on Monday morning to start his week, to make things easy.  (see, there is some advantage to the idea of “anomalies”.)  Monday, Tuesday and Wednesday pass, and on Wednesday evening, his accountant shows him this graph:

the_button_collection

As in my previous essay, I ask, “How many buttons will BC have at the end of day on Friday, Day 5?”

Before we answer, let’s discuss what has to be done even to attempt an answer.   We have to formulate an idea of what the process is that is being modeled by this little dataset.  [By “modeled” we simply mean that the daily results of some system are being visually represented.]

“No, we don’t!”,  some will say.  We just grab our little rulers and draw a little line like this (or use or complicated maths program on our laptops to do it for us) and Viola!  The answer is revealed:

Untitled-14

And our answer is “10”……(and will be wrong, of course).

There is no mathematical or statistical or physical reason or justification to believe we have suggested the correct answer.  We skipped a very important step.  Well, actually, we rushed right over it.  We have to first try to guess what the process is (mathematically, what function is being graphed) that produces the numbers we see.  This guess is more scientifically called a “hypothesis” but is no different, at this point, than any other guess.  We can safely guess that the process (the function) is “Tomorrow’s Total will be Today’s Total plus 2”.  This is, in fact, the only reasonable guess given the first three day’s data – and it even complies with formal forecasting principles (when you know next to nothing, predict more of the same).

Let’s check Thursday’s graph:

button_collection_DAY4

We’re rocking! – right on target – now Friday:

button_collection(1)

Shucks!  What happened?   Certainly our hypothesis is correct.   Maybe a glitch….?  Try Saturday (we’re working the weekend to make up for lost time):

button_collection(3)

Well, that looks better.  Let’s move our little trend line over to reassure ourselves….:

bc_dasy6_originsal_hypo

Well, we say, still pretty close…darn those glitches!

But wait a minute…what was our original hypothesis, our guess about the system, the process, the function that produced the first three days of results?   It was:  “Tomorrow’s Total will be Today’s Total plus 2”.   Do our results (these results are a simple matter of counting the buttons – that’s our data gathering method – counting) support our original hypothesis, our first guess, as of Day 6?   No, they do not.  No amount of dissembling – saying “Up is Up”, or “The Trend is still going up” makes the current results support the original hypothesis.

What’s a self-respecting scientist to do at this point?   There are a lot of things not to do:  1)  Fudge the results to make them agree with the hypothesis,  2) Pretend that “close” is the same as supporting the hypothesis – “see how closely the trends correlate?”,  3) Adopt the “Wait until tomorrow, we’re sure this glitch will clear up” approach,  4) Order a button recount, making sure the button counters understand the numbers that they are supposed to find, 5) Try re-analysis, incremental hourly in-filling, krigging, de-trending and re-analysis and anything else until the results come into line like “they should”.

While our colleagues try these ploys, let’s see that happens on Day 7:

button_collection(5)

Oh, my…amidst the “Still going up” mantra, we see that the data can really no longer be used to support our original hypothesis – something else, other than what we guessed, must be going on here.

What a real scientist does at this point is:

Makes a new hypothesis which explains more correctly the actual results, usually by modifying the original hypothesis.

This is hard – it requires admitting that one’s first pass was incorrect.  It may mean giving up a really neat idea, one that has professional or political or social value apart from solving the question at hand.  But – it MUST be done at this point.

Day 8, despite being “in the right direction”,  does not help our original guess either:

button_collection(8)

 

The whole week trend is still “going up” – but that is not what the trend line is for.

What is that trend line for?

  • To help us visualize and understand the system or process that is creating (causing) the numbers (daily button counts) that we see – particularly useful with data much messier than this.
  • To help us judge whether or not our hypothesis is correct

Until we understand what is going on, what the process is, we will not be able to make meaningful predictions about what the daily button counts will be in the future.  At this point, we have to admit, we do not know because we do not understand clearly the process(es) involved.

Trend lines are useful in hypothesis testing – they can show researchers – visually or numerically —  when they have correctly “guessed” the system or process underlying their results or, on the other hand, expose where they have missed the mark and give them opportunities to re-formulate hypotheses or even to “go back to the drawing board” altogether if necessary.

Discussion:

My example above is unfair to you, the reader, because by Day 10, there is no apparent answer to the question we need to answer:  What is the process or function that is producing these results?

That is the whole point of this essay.

Let me make a confession:    This week’s results were picked at random – there is no underlying system to discover in them.

This is much more common in research  results than is generally admitted – one sees seemingly random results caused by poor study design, too small a sample, improper metric selection and “hypothesis way off base”.  This has resulted in untold suffering of innocent data being unrelentingly tortured to reveal secrets it does not contain.

We often think we see quite plainly and obviously what various visualizations of numerical results have to tell us.  We combine these with our understanding of things and we make bold statements, often overly certain.  Once made, we are tempted to stick with our first guesses out of misplaced pride.  If our time periods in the example above had been years instead of days, this temptation would have become even stronger, maybe irresistible – irresistible if we had spent ten years trying to show how correct our hypothesis was, only to have the data betray us.

When our hypotheses fail to predict or explain the data coming out of our experiments or observations of real world systems, we need new hypotheses — new guesses — modified guesses.  We have to admit that we don’t have it quite right — or maybe worse, not right at all.

Linus Pauling, brilliant Nobel Prize winning chemist, is commonly believed, late in life,  to have chased the unicorn of a Vitamin C Cancer Cure for way too many years, refusing to re-evaluate his hypothesis when the data failed to support it and other groups failed to replicate his findings.  Dick Feynman blamed this sort of thing on, what he called in his homey way, “fooling one’s self”.  On the other hand, Pauling may have been right about Vitamin C’s ability to ward off the common cold or, at least, to shorten its duration — the question still has not been subject to enough good experimentation to be conclusive.

When, as in our little Button Collector example above,  our hypotheses don’t match the data and  there doesn’t seem to be any reasonable, workable answer then we have to go back to basics in testing our hypotheses:

1) Is our experimental design valid?

2) Are our measurement techniques adequate?

3)  Have we picked the right metrics to measure? Do our chosen metrics actually (physically) represent/reflect the thing we think they do?

4)  Have we taken into account all the possible confounders?  Are the confounders orders-of-magnitude larger than the thing we are trying to measure?  (see here for an example.)

5)  Do we understand the larger picture well enough to properly design an experiment of this type?

That’s our real topic today — the list of questions that a researcher must ask when his/her/their results just won’t come in line with their hypotheses regardless of repeated attempts and modifications of the original hypothesis.

I have started the list off above and I’d like you, the readers, to suggest additional items and supply your personal professional (or student era) experiences and stories in line with the topic.

# # # # #

“Wait”, you may say.  What about trends and predictions?

  • Trends are simply visualizations — graphical or mental — of the change of past, existing results.  Let me repeat that – they are results of resultseffects of effectsthey are not and cannot be causes.
  • As we see above, even obvious trends cannot be used to predict (no less cause or determine) future values in the absence of a true [or at last, “fairly true”] and clear understanding of the processes, systems and functions (causes) that are producing the results, data points, which form the basis of your trend.
  • If one does have a clear and full-enough understanding of the underlying systems and processes, then if the trend of results fully supports your understanding (your hypothesis) and if you are using a metric that mirrors the processes closely enough, then you could possibly use it to suggest possible future values, within bounds – almost certainly if probabilities alone are acceptable as predictions – but it is your understanding of the process, the function, that allows you to produce the prediction, not the trend – and the actual causative agent is always the underlying process itself.
  • If one is forced by circumstance, public pressure, political pressure or just plain hubris to make a prediction (a forecast) in an absence of understanding — under deep uncertainty — the safest bet is to predict “More of the same” and allow plenty of latitude even in that forecast.

 

Notes:

To those of you who feel you have wasted your time reading these admittedly simplistic examples:  You are right, if you already have a firm grasp of these points and never ever let yourself be fooled by them, you may have wasted your time.

Recent studies on trends in non-linear systems [NB: “Amongst the dynamical systems of nature, nonlinearity is the general rule, and linearity is the rare exception.”  — James Gleick CHAOS, Making of a New Science] don’t offer much hope in using derived trends in a predictive manner — no more than “maybe things will go on as they have in the past — and maybe there will be a change.”  Climate processes are almost certainly non-linear – thus for metrics of physical outputs of climate processes [temperatures, precipitation, atmospheric circulations, ENSO/AMO/PDO metrics],  drawing straight lines (or curves) across graphs of numerical results of these nonlinear systems in order to make projections is apt to lead to non-physical conclusions and is illogical.

There is a growing body of evidence for the subject of Forecasting. [Hint: drawing straight lines on graphs is not one of them.]   Scott Armstrong has been heading a effort for many years to build a set of Forecasting Principles “intended to make scientific forecasting accessible to all researchers, practitioners, clients, and other stakeholders who care about forecast accuracy.”  His work is found at ForecastingPrinciples.com.   His site has many articles on the troubles of forecasting climate and global warming  (PgDn at the link).

# # # # #

Author’s Comment Policy:

I enjoy reading your input to the discussion — positive or negative.

The subject in this essay is really “What questions must  a researcher ask when his/her/their results just won’t come in line with their hypotheses regardless of repeated experimental attempts and modifications of the original hypothesis?”

Most readers here are skeptical of mainstream, IPCC-consensus Climate Science, which, in my opinion, has fallen prey to desperate attempts to shore-up a failed hypotheses collectively called “CO2 induced catastrophic global warming” — GHGs will generally induce some warming, but how much, how fast, how long, beneficial or harmful are all questions very much unanswered.  Still up in the air is whether or not the Earth’s climate is self-regulating despite changing atmospheric concentrations of GHGs and solar fluctuations.

I’d like to read your suggestions on what questions CliSci needs to ask itself to get out of this “failed hypotheses”mode and back on track.

[Re: Trends —  I know it seems impossible that some people actually believe that trends cause future results, I have been through two very rough post-and-comment battles on the subject — and the number of believers (all very vocal) is quite large.  Unfortunately, this concept runs up against a lot of the training of academic statisticians — who, in their own way,  are among the most vocal believers.  Let’s try not to fight that battle here again –  you can read all the comments and my replies at the two posts linked at the very beginning of this essay.]

[NB: 5 Jan 2018 — several minor typos that have been helpfully pointed out by readers have been corrected — since publication.  Details are in the comments section where pointed out. –kh]

# # # # #

 

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

203 Comments
Inline Feedbacks
View all comments
Don K
January 5, 2018 6:49 am

“As we see above, even obvious trends cannot be used to predict (no less cause or determine) future values in the absence of a true [or at last, “fairly true”] and clear understanding of the processes, systems and functions (causes) that are producing the results, data points, which form the basis of your trend.”

Counterexample: The Ptolemaic (geocentric) model of the universe with its cycles and epicycles was dead wrong. And by the 15th Century AD there were lots of folks who were pretty sure it was wrong. But it did make correct predictions. You could navigate using the star and planet position predictions it made.

Dan Evens
January 5, 2018 6:49 am

Let me restate this entire post: If you have no clue what the process is that produces the data, then any trend lines you may draw on a graph are not justified. They could very easily be completely different from reality.

You could have said this and saved so much time.

January 5, 2018 6:53 am

…graphing something like “annual average data” can present wildly misleading information…

Yes, all the “hottest year ever” claims are based on averages. NOAA’s Climate at a Glance allows you to track the Minimum and Maximum temperatures. Those of course are averages too, but the picture that emerges is a little different:

http://oi66.tinypic.com/bbjue.jpg

From my funny quotes and tag line file there’s this one on averages:

“Be careful of averages, the average person has one breast and one testicle” – Dixy Lee Ray

Don K
Reply to  Steve Case
January 5, 2018 7:33 am

The problem with “hottest year ever” is that the metric being used is basically the temperature of the tropical Pacific Ocean. That’ll work I suppose … If you have a century or three to collect data. Shorter term, you’re looking at the balance between ENSO and the Humboldt Current — which may not be all that informative about temperatures elsewhere on the planet.

paqyfelyc
Reply to  Steve Case
January 5, 2018 7:36 am

I (like lots of people) live in place where weather alternate between humid+mild temperature for a few days, and dry+extreme( hot in summer / cold in winter) for the few next days. So, everyday, the weatherman states that the temperature are either “higher than average [for the season]” or “colder than average’, except a few days in the whole year when he says that “temperature are average”. Most people do not get it, but these are the ABNORMAL days, the days with one breast and one testicle.

Reply to  Steve Case
January 5, 2018 1:11 pm

Time and temperature are independent variables. As such a trend of U.S. temperature to time has no meaning. The purple trend predicts that in order to lower U.S. temperature, we must go back in time.

Don K
January 5, 2018 7:12 am

A thought: Perhaps the best known example of deriving an important theory took place about 1600. And the order was NOT (predict-observe-adjust) until the answers converged. It was observe (Tycho Brahe)-analyze and predict(Kepler)-explain(Newton).

Maybe there’s a message there. First get good data. Second produce (a) model(s) that matches the data. Third verify that the model makes correct predictions. And only then explain why the model works.

BTW, it seems to me that quantum physics is following the observe-analyze-explain path. They’ve collected lots of data. They have a bunch of modeling that make good predictions. And they don’t really have the slightest idea why/how it all works.

Climate “science” OTOH seems to be following an observe-predict-change_the_observations_to better_fit_the_predictions-fire_off_insults_at_anyone_who_questions_the_process approach.

paqyfelyc
Reply to  Don K
January 5, 2018 7:43 am

+1
One thing most people forget is that the geocentic model DID work very well, giving close to perfect predictions. Astronomers were able to predict where and when eclipse would occur years ahead.
Climate “science” cannot even do that…

Thomas Homer
Reply to  paqyfelyc
January 5, 2018 12:45 pm

paqyfelyc – “the geocentic model DID work very well”

Indeed. However, the geocentric model had to introduce “celestial spheres” to complete the model for planetary orbits within our solar system. Does the correctness of the geocentric model then prove the existence of “celestial spheres”?

CAGW models have introduced a non-measurable property of Carbon Dioxide, and then claim that their model proves the existence of it. Even when their models don’t work very well.

Reply to  Don K
January 5, 2018 4:44 pm

And the order was NOT (predict-observe-adjust) until the answers converged. It was observe (Tycho Brahe)-analyze and predict(Kepler)-explain(Newton).

But would Kepler have gotten anywhere if he hadn’t started with Copernicus’s model?

Svend Ferdinandsen
January 5, 2018 7:47 am

Even with no real correlation you can get a very good correlation between two data sets by averaging, smoothing, and trendlines. The trendlines are bound to have 100% correlation, unless one of them is level.
Anomalies are another trick to make results look different than they are. Sometimes it improves the understanding, at other times it hides the reality. Think of temperature anomalies and ice cover and snow.
When you really wants to make things up, you can use matrix operations.

JimG1
January 5, 2018 8:09 am

Kip,
Excellent post. I always consider temperature information to be based upon very nebulous, if not nefarious, data due to a variety of factors, including but not limited to instrumental precision, sampling techniques, proxies, changes over time in proxies and equipment and algorithms and sample sizes. The relatively small variations which are usually cited are ridiculous and error bars do not include all of the potential error in the forgoing.
JimG1

January 5, 2018 9:37 am

These two guys were jogging along at a sedate pace until the mid 1970’s when they discovered steroids.
http://www.vukcevic.talktalk.net/CT4tl.gif

Don K
January 5, 2018 10:44 am

Kip

As usual, your essay is well written, but I’m I little confused. Is your point that the map (a linear data fit) is not the territory? Of course it isn’t.

Or are you saying that one can’t ever make and act on projections of anything that isn’t fully understood? Seems to me that the latter is a non-starter. Taken literally, humanity would never have moved beyond Eurasia and Africa. After all, there’s no way to be sure there’s any land beyond the horizon.

Bill Powers
Reply to  Don K
January 5, 2018 10:55 am

Don, I didn’t interpret what he was saying to mean that one can’t act on projections. I understood him to mean that one must understand his data and act accordingly.

Reply to  Don K
January 5, 2018 11:13 am

Don’t think that was what Kip was implying. “Fully understood” may not be necessary but you do need to have some understanding of what is underlying your trend before using it to predict. If you just use a trend with no knowledge of what is going on means you’re a mathematician (lol) with confidence in your skill with numbers. If it turns out badly, told you so!

Don K
Reply to  Kip Hansen
January 5, 2018 1:28 pm

Columbus-types were a bit more adventurous…

Completely tangential, but I’ve long suspected that the reason Columbus had trouble getting his project funded that most Court Wise Men in Southern Europe knew that Eratosthenes had measured the size of the Earth back in 200BC and they told their monarchs that there was no way Columbus could get to the East Indies in a reasonable sized ship. Columbus was just lucky that the Americas got in his way or he’d have been in serious trouble. Or maybe he wasn’t so lucky since he ended up suffering from a number of maladies probably related to his voyages and also did some jail time.

Bill Powers
January 5, 2018 10:52 am

Kip, Highly informative! My father, who was in the insurance business, use to remind me growing up that “figures lie and liars figure” That would make an apt subtitle to you article.

January 5, 2018 10:54 am

Kip -> Very good article. I just wish the essays you and Dave have written over the last few months would become required reading for many scientists (not just CliSci) publishing papers. I wish every editor required that accuracy and precision be addressed in every paper. And, I do know the saying about wishes in one hand, … .

Clyde Spencer
January 5, 2018 11:50 am

Kip,

It is well known that humans have the uncanny ability to see patterns where there are none, such as a ‘face’ in a cloud formation. I suspect it had survival value for our ancestors trying to find a predator hiding in the tall grass. We also would like to know what the future has to hold for us. It is a very unpleasant experience to be walking in the dark and discover that you have just walked off the edge of a high cliff. Therefore, we try to grasp at anything that might help us discern future events. Some even resort to auto-correlation to hazard a guess as to what the temperature might be tomorrow. I agree that a model based on physical principles is the best approach. However, in the absence of any reliable models, and a strong desire not to have the future always be a complete surprise, relying on auto-correlation to extrapolate a trend for a short distance might be rationalized.

I think that a good analogy might be a fighter pilot attempting to shoot down an enemy aircraft. The pilot leads the enemy aircraft, attempting to guess where the plane will be a couple seconds later, in the hope that the bullets and plane will intersect. Of course, the enemy pilot is trying hard to prevent that from happening, so it is continually changing its ‘trend.’ While it may not be the optimal strategy for destroying an enemy aircraft, it was the best we could do until the invention of heat-seeking or radar-guided missiles. That is to say, projecting a trend may be useful some of the time for short extrapolations, even if it is often wrong. After all, planes did get shot down with simple machine guns. However, projecting out 100 years is a whole different ball game!

Clyde Spencer
Reply to  Kip Hansen
January 5, 2018 2:18 pm

Kip,
Your Little Leaguers are unknowingly projecting the path of a parabola, which is quite predictable, even with numerical trends — unless there are strong gusty winds at the time. However, that reinforces the point about auto-correlation. The ball may come down in a place different for an ideal parabola, with winds, but it won’t be out of the ballpark.

Don K
January 5, 2018 12:53 pm

To make scientifically valid predictions or projections from a data set, one has to understand what the system is that is producing the data

Sorry. I can’t quite buy into that. One can understand fairly well without having all the details pinned down. In fact, I think that’s pretty usual. When that’s the case, you make estimates based on your best guess. Given the tendency for polynomials to zoom off to infinity or crash to zero when projected outside the data range and the fact that exponential growth doesn’t usually last that long, your best guess on on-cyclic data will likely be a linear projection. Of course it needs realistic uncertainty estimates and those typically get large very quickly. That seems to be really hard for a lot of people to deal with … including many who really should know better.

Nick Stokes
Reply to  Kip Hansen
January 5, 2018 3:32 pm

“Anyone can guess, even five-year-olds.”
They don’t usually do linear regression. That is an evidence-based forecast. It is of course advisable to use more evidence if you can. You never get certainty. You are drawing a non-existent distinction.

Nick Stokes
Reply to  Don K
January 5, 2018 3:29 pm

“Given the tendency for polynomials to zoom off to infinity or crash to zero when projected outside the data range”
That is really the key here (though polynomials don’t go to infinity in finite time). I remember in the 60s when people were trying fitting polynomials to stock prices for short term prediction. They weren’t necessarily bad fits. But higher order meant more variance, and that carries a cost. Most people have a nonlinear response to variance (eg going broke). So the frequent conclusion that tomorrow’s prices same as today’s was as good as any in practice.

And that is the quantitative issue re use of linear projection. The trend may well be optimal as a probability peak of outcomes, but it is uncertain, and the uncertainty is magnified by the leverage effect of projecting over long times. So it isn’t that trend is useless in prediction. It’s important. But at sometime the variance will be too much. That depends on how far you project relative to the extent of data you have and the goodness of fit, and the cost to you of variance.

That is all familiar. I’ve mentioned budgetting. People normally budget for a year. That is a period for which experience is a useful guide, and trend is part of that. But three year budgets are also quite common. It’s a tradeoff between uncertainty of forecasts vs the planning benefits of having a budget to work to.

January 5, 2018 1:09 pm

Time does not make button sales. Since the two are independent, the trend is constantly changing with each data point. In commodity trading, traders only use trends for breakouts from the trends to trigger buy/sell decisions, they do not use the trend itself to predict anything, as it predicts nothing. Least squares is based on functionally related x and y values in a graph, that is, they are dependent.

January 5, 2018 3:57 pm

Past data must be stored (archived).
There are programs to that which allow the upper and lower limits be set as just which values will be stored.
Some values are not stored because they don’t “break” the upper or lower limit. The idea is to reduce the disk space required and/or allow for more rapid retrieval.
Old stored data stored with by an older program can be run through a new program. Old values may be dropped (lost).
The upper and lower limit might be set to, say +/- 0.5 MGD (Million Gallons per Day), or they could be set to +0.6 MGD and -0.4 MGD.
No valid reason I can think of for doing that, but it could be done.
Past temperature?
PS The time frame for just which +/- limits are applied can also be set.
The past values can be changed en mass.
Present values going into the archive can also be … handled … in such a manner as they are being stored.
Most would want an accurate (enough) record to reflect the past to on which to base future plans.
Or “adjust” the past to trend the future.

January 6, 2018 8:27 am

I’ve been downing vitamin c powder for colds for years, it does work.

Reply to  Mark - Helsinki
January 6, 2018 11:14 am

Vitamin C became a ‘miracle tablet’ after Linus Pauling wrote a book about Vitamin C and cold. Pauling got Nobel prize for chemistry and some years later the Nobel peace prize; great scientist, academic, educator and peace campaigner, but according to a member of his wider family I met in the UK some three decades ago, he was a very difficult person to live with.

robinedwards36
January 8, 2018 4:31 am

I read stuff on linear fitting and “valid” predictions made from such fits. There’s a lot of this about! Linear fits (in time series calculations) are reasonable enough provided that the underlying model is indeed linear in nature. In the esoteric world of climate data fitting (and therefore by implication predictions – what s a model for?) – climatologists and others routinely compute linear fits to data that are grossly non-linear in nature, for which simple scatter plots are adequate reveal this.
Now, climate data are affected by countless forms of influence, and indeed errors of various sorts, which are typically ignored by the technique of allocating them to “noise”. Nevertheless these influences can have the effect of swamping any attempt to make a useful or valid “prediction” or “projection.
How is one to appreciate this underlying problem? Well, a simple one is to plot all the data “as is” together with the linear fit, including its equation and inferential statistics based on presumed underlying normality of residuals, PLUS confidence intervals for the regression line AND for an individual future observation from the same population. Choose your probability level at a sensible level.
This plot would illustrate to all the amateur statistical detectives who try to infer conclusions from the simple, single straight line that is all that is usually published, that their confident assertions are, to say the very least, subject to some uncertainty.
My own analyses /always/ include these essential details.

Reply to  Kip Hansen
January 9, 2018 12:21 pm

So all that blathering and commenting
and re-commenting
and you never got around to telling us
when all life on earth
is going to end
from runaway global warming ?

Or is that in your next article?

http://www.elOnionBloggle.Blogspot.com

Verified by MonsterInsights