Guest Essay by Kip Hansen
Prologue: This essay is a follow-up to two previous essays on the topic of the usefulness of trend lines [trends] in prediction. Readers may not be familiar with these two essays as they were written years ago, and if you wish, you should read them through first:
- Your Dot: On Walking Dogs and Warming Trends posted in Oct 2013 at Andy Revkin’s NY Times Opinion Section blog, Dot Earth. Make sure to watch the original Doggie Walkin’ Man animation, it is only 1 minute long.
- The Button Collector or When does trend predict future values? posted a few days later here at WUWT (but 4 years ago!)
Trigger Warning: This post contains the message “Trends do not and cannot predict future values” . If this idea is threatening or potentially distressing, please stop reading now.
If the trigger warning confuses you, please first read the two items above and all my comments and answers [to the same questions you will have] in the above two essays, it will save us both a lot of time.
I’ll begin this post commenting on an ancient comment to a follow-up Dot Earth column by Andrew C. Revkin, “Warming Trend and Variations on a Greenhouse-Heated Planet” (Dec. 8, 2014). [Alas, while the link is still good, Dot Earth is no longer, it has gone way of my old blog, The Bad Science Times, which faded away in the early 1990s.) Revkin’s piece repeated the Doggie Walking Animation, and contained a link to my response. This comment, from Dr. Eric Steig, Professor, Earth & Space Sci. at the University of Washington, where he is Director of the IsoLab and is listed on his faculty page as a founding member and contributor to the influential [their word] climate science web site, “RealClimate.org“, says:
“Kip Hansen’s “critique” of the dog-walking cartoon, is clever, and completely missing the point. Yes, the commentators of the original cartoon should not have said “the trend determines the future”; that was poorly worded. But climate forcing (CO2, mostly) does determine the trend, and the trend (where the man is walking) does determine where the dog will go, on average.” (my emphasis)
Dr. Steig, I believe, has simply “poorly worded” this response. He surely means that the climate forcings, which themselves are trending upwards, do/will determine (cause) future temperatures to be higher (“where the dog will go, on average”). He is entitled to that opinion but he errs when he insists “the trend (where the man is walking) does determine where the dog will go”. It is this repeated, almost universally used, imprecise choice of language that causes a great deal of misunderstanding and trouble for the rest of the English speaking world (and I suppose for others after literal translations) when dealing with numbers, statistics, graphs and trend lines. People, students, journalists, readers, audiences…begin to actually believe that it is the trend itself that is causing (determining) future values.
Many of you will say to your selves, “Stuff and nonsense! Nobody believes such a thing.” I didn’t think so either…but read the comments to either of my two essays… you will be astonished.
Data points, lines and graphs:
[Warning: These are all very simple points. If you are in a hurry, just scroll down and look at the images.]
Let’s look at the definition of a trend line: “A line on a graph showing the general direction that a group of points seem to be heading.” Or another version “A trend line (also called the line of best fit) is a line we add to a graph to show the general direction in which points seem to be going.”
Here’s an example (mostly in pictures):
Trend lines are added to graphs of existing data to show “the general direction in which [the data] points seem to be going.” Now, let’s clarify that a little bit — more precisely, the trend line only actual shows the general direction in which [the data] points have gone” — and one could add — “so far”.
That seems awfully picky, doesn’t it? But it is very important to our correct understanding of what a data graph is — it is a visualization of existing data — the data that we actually have — what has actually been measured. We would all agree that adding data points to either end of the graph — data that we just made up that had not actually been measured or found experimentally — would be fraudulent. Yet we hardly ever see anyone object to “trend lines” that extend far beyond the actual data shown on a graph — usually in both directions. Sometimes this is just lazy graphics work. Sometimes it is intentional to imply [unjustifiably] that past data and future data would be in line with the trend line. However, just to be clear, if there is no data for “before” and “after” then that assumption cannot and should not be made.
Now, one (or two) more little points:
I’ll answer in a graphic:
But (isn’t there always a ‘but’?):
Traces added to join data points on a graph can sometimes be misunderstood to represent the data that might exist between the data points shown. More properly, the graph would ONLY show the data points if that is all the data we have — but, as illustrated above, we are not really used to seeing time series graphs that way – we like to see the little lines march across time connecting the values. That’s fine as long as we don’t let ourselves be fooled into thinking that the lines represent any data. They do not and one should not let the little lines fool you into thinking that the intervening data lays along those little lines. It might…it might not…but there is no data, at least on the graph, to support that idea.
For eggnog sales, I have modified part of the graph to match reality:
This is one of the reasons that graphing something like “annual average data” can present wildly misleading information — the trace lines between the annual average points are easily mistaken for how the data behaved during the intermediate time — between year-end totals or yearly averages. Graphing just annual averages or global averages easily obscures important information about the dynamics of the system that generates the data. In some cases, like eggnog, looking only at individual monthly sales, like July sales figures (which are traditionally near zero), would be very discouraging and could cause an eggnog producer to vastly underestimate yearly sales potential.
There are several good information sources on the proper use of graphs — and the common ways in which graphs are misused and malformed – either out of ignorance or to intentionally spin the message for propaganda purposes. We see them almost everywhere, not just in CliSci.
Here’s two classic examples:
On both of the above graphs, there is another invisible feature — error bars (or confidence intervals even) — invisible because they are entirely missing. In reality, values before 1900 are “vague wild guesses”, confidence increases from 1900 – 1950 to “guesses based on some very imprecise, spatially thin data”, confidence increases again 1950-1990s to “educated guesses”, and finally, in the satellite era, “educated guesses based on computational hubris.”
That’s the intro — a few “we all already knew all that!” [“Wha’da’ya think we are? Stupid?”] points — of which we all need to remind ourselves every once in a while.
The Button Collector: Revisited
My two previous essays on Trends focused on “The Button Collector” — let me re-introduce him:
I have an acquaintance [actually, I have to admit, he is a relative] that is a fanatical button collector. He collects buttons at every chance, stores them away, thinks about them every day, reads about buttons and button collecting, spends hours every day sorting his buttons into different little boxes and bins and worries about safeguarding his buttons. Let’s call him simply The Button Collector. Of course, he doesn’t really collect buttons, he collects dollars, yen, lira, British pounds sterling, escudos, pesos…you get the idea. But he never puts them to any useful purpose, neither really helping himself nor helping others, so they might as well just be buttons.
He has, latest count, millions and millions of buttons, exactly, on Sunday night. So, we can ignore the “millions and millions” part and just say he has zero buttons on Monday morning to start his week, to make things easy. (see, there is some advantage to the idea of “anomalies”.) Monday, Tuesday and Wednesday pass, and on Wednesday evening, his accountant shows him this graph:
As in my previous essay, I ask, “How many buttons will BC have at the end of day on Friday, Day 5?”
Before we answer, let’s discuss what has to be done even to attempt an answer. We have to formulate an idea of what the process is that is being modeled by this little dataset. [By “modeled” we simply mean that the daily results of some system are being visually represented.]
“No, we don’t!”, some will say. We just grab our little rulers and draw a little line like this (or use or complicated maths program on our laptops to do it for us) and Viola! The answer is revealed:
And our answer is “10”……(and will be wrong, of course).
There is no mathematical or statistical or physical reason or justification to believe we have suggested the correct answer. We skipped a very important step. Well, actually, we rushed right over it. We have to first try to guess what the process is (mathematically, what function is being graphed) that produces the numbers we see. This guess is more scientifically called a “hypothesis” but is no different, at this point, than any other guess. We can safely guess that the process (the function) is “Tomorrow’s Total will be Today’s Total plus 2”. This is, in fact, the only reasonable guess given the first three day’s data – and it even complies with formal forecasting principles (when you know next to nothing, predict more of the same).
Let’s check Thursday’s graph:
We’re rocking! – right on target – now Friday:
Shucks! What happened? Certainly our hypothesis is correct. Maybe a glitch….? Try Saturday (we’re working the weekend to make up for lost time):
Well, that looks better. Let’s move our little trend line over to reassure ourselves….:
Well, we say, still pretty close…darn those glitches!
But wait a minute…what was our original hypothesis, our guess about the system, the process, the function that produced the first three days of results? It was: “Tomorrow’s Total will be Today’s Total plus 2”. Do our results (these results are a simple matter of counting the buttons – that’s our data gathering method – counting) support our original hypothesis, our first guess, as of Day 6? No, they do not. No amount of dissembling – saying “Up is Up”, or “The Trend is still going up” makes the current results support the original hypothesis.
What’s a self-respecting scientist to do at this point? There are a lot of things not to do: 1) Fudge the results to make them agree with the hypothesis, 2) Pretend that “close” is the same as supporting the hypothesis – “see how closely the trends correlate?”, 3) Adopt the “Wait until tomorrow, we’re sure this glitch will clear up” approach, 4) Order a button recount, making sure the button counters understand the numbers that they are supposed to find, 5) Try re-analysis, incremental hourly in-filling, krigging, de-trending and re-analysis and anything else until the results come into line like “they should”.
While our colleagues try these ploys, let’s see that happens on Day 7:
Oh, my…amidst the “Still going up” mantra, we see that the data can really no longer be used to support our original hypothesis – something else, other than what we guessed, must be going on here.
What a real scientist does at this point is:
Makes a new hypothesis which explains more correctly the actual results, usually by modifying the original hypothesis.
This is hard – it requires admitting that one’s first pass was incorrect. It may mean giving up a really neat idea, one that has professional or political or social value apart from solving the question at hand. But – it MUST be done at this point.
Day 8, despite being “in the right direction”, does not help our original guess either:
The whole week trend is still “going up” – but that is not what the trend line is for.
What is that trend line for?
- To help us visualize and understand the system or process that is creating (causing) the numbers (daily button counts) that we see – particularly useful with data much messier than this.
- To help us judge whether or not our hypothesis is correct
Until we understand what is going on, what the process is, we will not be able to make meaningful predictions about what the daily button counts will be in the future. At this point, we have to admit, we do not know because we do not understand clearly the process(es) involved.
Trend lines are useful in hypothesis testing – they can show researchers – visually or numerically — when they have correctly “guessed” the system or process underlying their results or, on the other hand, expose where they have missed the mark and give them opportunities to re-formulate hypotheses or even to “go back to the drawing board” altogether if necessary.
My example above is unfair to you, the reader, because by Day 10, there is no apparent answer to the question we need to answer: What is the process or function that is producing these results?
That is the whole point of this essay.
Let me make a confession: This week’s results were picked at random – there is no underlying system to discover in them.
This is much more common in research results than is generally admitted – one sees seemingly random results caused by poor study design, too small a sample, improper metric selection and “hypothesis way off base”. This has resulted in untold suffering of innocent data being unrelentingly tortured to reveal secrets it does not contain.
We often think we see quite plainly and obviously what various visualizations of numerical results have to tell us. We combine these with our understanding of things and we make bold statements, often overly certain. Once made, we are tempted to stick with our first guesses out of misplaced pride. If our time periods in the example above had been years instead of days, this temptation would have become even stronger, maybe irresistible – irresistible if we had spent ten years trying to show how correct our hypothesis was, only to have the data betray us.
When our hypotheses fail to predict or explain the data coming out of our experiments or observations of real world systems, we need new hypotheses — new guesses — modified guesses. We have to admit that we don’t have it quite right — or maybe worse, not right at all.
Linus Pauling, brilliant Nobel Prize winning chemist, is commonly believed, late in life, to have chased the unicorn of a Vitamin C Cancer Cure for way too many years, refusing to re-evaluate his hypothesis when the data failed to support it and other groups failed to replicate his findings. Dick Feynman blamed this sort of thing on, what he called in his homey way, “fooling one’s self”. On the other hand, Pauling may have been right about Vitamin C’s ability to ward off the common cold or, at least, to shorten its duration — the question still has not been subject to enough good experimentation to be conclusive.
When, as in our little Button Collector example above, our hypotheses don’t match the data and there doesn’t seem to be any reasonable, workable answer then we have to go back to basics in testing our hypotheses:
1) Is our experimental design valid?
2) Are our measurement techniques adequate?
3) Have we picked the right metrics to measure? Do our chosen metrics actually (physically) represent/reflect the thing we think they do?
4) Have we taken into account all the possible confounders? Are the confounders orders-of-magnitude larger than the thing we are trying to measure? (see here for an example.)
5) Do we understand the larger picture well enough to properly design an experiment of this type?
That’s our real topic today — the list of questions that a researcher must ask when his/her/their results just won’t come in line with their hypotheses regardless of repeated attempts and modifications of the original hypothesis.
I have started the list off above and I’d like you, the readers, to suggest additional items and supply your personal professional (or student era) experiences and stories in line with the topic.
# # # # #
“Wait”, you may say. What about trends and predictions?
- Trends are simply visualizations — graphical or mental — of the change of past, existing results. Let me repeat that – they are results of results – effects of effects — they are not and cannot be causes.
- As we see above, even obvious trends cannot be used to predict (no less cause or determine) future values in the absence of a true [or at last, “fairly true”] and clear understanding of the processes, systems and functions (causes) that are producing the results, data points, which form the basis of your trend.
- If one does have a clear and full-enough understanding of the underlying systems and processes, then if the trend of results fully supports your understanding (your hypothesis) and if you are using a metric that mirrors the processes closely enough, then you could possibly use it to suggest possible future values, within bounds – almost certainly if probabilities alone are acceptable as predictions – but it is your understanding of the process, the function, that allows you to produce the prediction, not the trend – and the actual causative agent is always the underlying process itself.
- If one is forced by circumstance, public pressure, political pressure or just plain hubris to make a prediction (a forecast) in an absence of understanding — under deep uncertainty — the safest bet is to predict “More of the same” and allow plenty of latitude even in that forecast.
To those of you who feel you have wasted your time reading these admittedly simplistic examples: You are right, if you already have a firm grasp of these points and never ever let yourself be fooled by them, you may have wasted your time.
Recent studies on trends in non-linear systems [NB: “Amongst the dynamical systems of nature, nonlinearity is the general rule, and linearity is the rare exception.” — James Gleick CHAOS, Making of a New Science] don’t offer much hope in using derived trends in a predictive manner — no more than “maybe things will go on as they have in the past — and maybe there will be a change.” Climate processes are almost certainly non-linear – thus for metrics of physical outputs of climate processes [temperatures, precipitation, atmospheric circulations, ENSO/AMO/PDO metrics], drawing straight lines (or curves) across graphs of numerical results of these nonlinear systems in order to make projections is apt to lead to non-physical conclusions and is illogical.
There is a growing body of evidence for the subject of Forecasting. [Hint: drawing straight lines on graphs is not one of them.] Scott Armstrong has been heading a effort for many years to build a set of Forecasting Principles “intended to make scientific forecasting accessible to all researchers, practitioners, clients, and other stakeholders who care about forecast accuracy.” His work is found at ForecastingPrinciples.com. His site has many articles on the troubles of forecasting climate and global warming (PgDn at the link).
# # # # #
Author’s Comment Policy:
I enjoy reading your input to the discussion — positive or negative.
The subject in this essay is really “What questions must a researcher ask when his/her/their results just won’t come in line with their hypotheses regardless of repeated experimental attempts and modifications of the original hypothesis?”
Most readers here are skeptical of mainstream, IPCC-consensus Climate Science, which, in my opinion, has fallen prey to desperate attempts to shore-up a failed hypotheses collectively called “CO2 induced catastrophic global warming” — GHGs will generally induce some warming, but how much, how fast, how long, beneficial or harmful are all questions very much unanswered. Still up in the air is whether or not the Earth’s climate is self-regulating despite changing atmospheric concentrations of GHGs and solar fluctuations.
I’d like to read your suggestions on what questions CliSci needs to ask itself to get out of this “failed hypotheses”mode and back on track.
[Re: Trends — I know it seems impossible that some people actually believe that trends cause future results, I have been through two very rough post-and-comment battles on the subject — and the number of believers (all very vocal) is quite large. Unfortunately, this concept runs up against a lot of the training of academic statisticians — who, in their own way, are among the most vocal believers. Let’s try not to fight that battle here again – you can read all the comments and my replies at the two posts linked at the very beginning of this essay.]
[NB: 5 Jan 2018 — several minor typos that have been helpfully pointed out by readers have been corrected — since publication. Details are in the comments section where pointed out. –kh]
# # # # #