Why You Shouldn’t Draw Trend Lines on Graphs

Guest essay by Kip Hansen

featured_image_linesWhat we call a graph is more properly referred to as “a graphical representation of data.”  One very common form of graphical representation is “a diagram showing the relation between variable quantities, typically of two variables, each measured along one of a pair of axes at right angles.”

Here at WUWT we see a lot of graphs —  all sorts of graphs of a lot of different data sets.  Here is a commonly shown graph offered by NOAA taken from a piece at Climate.gov called “Did global warming stop in 1998?” by Rebecca Lindsey published on September 4, 2018.

agw_propagsnda

I am not interested in the details of this graphic representation — the whole thing qualifies as “silliness”.  The vertical scale is in degrees Fahrenheit and the entire range change over 140 years shown is on the scale 2.5 °F or about a degree and a half C.   The interesting thing about the graph is the effort of drawing of “trend lines” on top of the data to convey to the reader something about the data that the author of the graphic representation wants to communicate.  This “something” is an opinion — it is always an opinion — it is not part of the data.

The data is the data.  Turning the data into a graphical representation (all right, I’ll just use “graph” from here on….), making the data into a graph has already  injected opinion and personal judgement into the data through choice of start and end dates, vertical and horizontal scales and, in this case, the shading of a 15-year period at one end.  Sometimes the decisions as to vertical and horizontal scale are made by software — not rational humans —  causing even further confusion and sometimes gross misrepresentation.

Anyone who cannot see the data clearly in the top graph without the aid of the red trend line should find another field of study (or see their optometrist).  The bottom graph has been turned into a propaganda statement by the addition of five opinions in the form of mini-trend lines.

Trend lines do not change the data — they can only change the perception of the data.  Trends can be useful at times [ add a big maybe here, please ] but they do  nothing for the graphs above from NOAA other than attempt to denigrate the IPCC-sanctioned idea of “The Pause”, reinforcing the desired opinion of the author and her editors at Climate.gov (who, you will notice from the date of publication, are still hard at it hammer-and-tongs, promoting climate alarm). To give Rebecca Lindsey a tiniest bit of credit, she does write “How much slower [ the rise was ] depends on the fine print: which global temperature dataset you look at”….   She certainly has that right.  Here is Spencer’s UAH global average lower tropospheric temperature:

spencrs_pause

One doesn’t need any trend lines to be able to see The Pause that runs from the aftermath of the 1998 Super El Niño to the advent of the 2015-2016 El Niño.  This illustrates two issues:  Drawing trend lines on graphs is adding information that is not part of the data set and it really is important to know that for any scientific concept, there is more than one set of data — more than one measurement — and it is critically important to know “What Are they Really Counting?”, the central point of which is:

So, for all measurements offered to us as information especially if accompanied by a claimed significance – when we are told that this measurement/number means this-or-that — we have the same essential question: What exactly are they really counting?

Naturally, there is a corollary question: Is the thing they counted really a measure of the thing being reported?

I recently came across an example in another field of just how intellectually dangerous the cognitive dependence (almost an addiction) on trend lines can be for scientific research.  Remember, trend lines on modern graphs are often being calculated and drawn by statistical software packages and the output of those packages are far too often taken to be some sort of revealed truth.

I have no desire to get into any controversy about the actual subject matter of the paper that produced the following graphs.  I have abbreviated the diagnosed condition on the graphs to gently disguise it.  Try to stay with me and focus not on the medical issue but on the way in which trend lines have affected the conclusions of the researchers.

Here’s the big data graph set from the supplemental information for the paper:

Note that these are graphs of Incidence Rates which can be considered “how many cases of this disease are reported per 100,000 population?”, here grouped by 10-year Age Groups.  They have added colored trend lines where they think (opinion) significant changes have occurred in incident rates.

age-specific_incidence_men

[ Some important details, discussed further on, can be seen on the  FULL-SIZED image, which opens in a new tab or window. ]

IMPORTANT NOTE:  The condition being studied in this paper is not something that is seasonal or annual, like flu epidemics.  It is a condition that develops, in  most cases,  for years before being discovered and reported, sometimes only being discovered when it becomes debilitating.  It can also be discovered and reported through regular medical screening which normally is done only in older people.  So “annual incidence” may not a proper description of what has been measured — it is actually a measure of “annual cases discovered and reported’ — not actually incidence which is quite a different thing.

The published paper uses a condensed version the graphs:

Incidence_Trends

The older men and women are shown in the top panels, thankfully with incidence rates declining from the 1980s to the present.  However, as considerately reinforced by the addition of colored trend lines, the incident rates in men and women younger than 50 years are rising rather steeply.  Based on this (and a lot of other considerations), the researchers draw this conclusion:

Conclusions

Again, I have no particular opinion on the medical issues involved…they may be right for reasons not apparent.  But here’s the point I hope to communicate:

Confused_by_Trendlines

I annotate the two panels concerning incidence rates in Men older than 50 and Men younger than  50.   Over the 45 years of data, the rate in men older than 50 runs in a range of 170 to 220 cases reported per year, varying over a 50 cases/year band.   For Men < 50, incidence rates have been very steady from 8.5 to 11 cases per year per 100,000 population for 40 years, and only recently, the last four data points, risen to 12 and 13 cases per 100,000 per year — an increase of one or two cases [per 100,000 population per year. It may be the trend line alone that creates a sense of significance. For Men > 50, between 1970 and the early 1980s, there was an increase of 60 cases per 100,000 population.  Yet, for Men < 50, the increased discovery and reporting of an additional one or two cases per 100,000  is concluded to be a matter of “highest priority” —  however, in reality, it may or may not actually be significant in a public health sense —  and it may well be within the normal variance in discovery and reporting of this type of disease.

The range of incidence among Men < 50 remained the same from the late 1970s to the early 2010s —  that’s pretty stable.  Then there are four slightly higher outliers in a row — with increases 1 or 2 cases per 100,000.   That’s the data.

If it were my data — and my topic — say number of Monarch butterflies visiting my garden annually by month or something, I would notice from the panel of seven graphs further above, that the trend lines confuse the issues.   Here it is again:

age-specific_incidence_men[ full-sized image in new tab/window]

If we try to ignore the trend lines, we can see in the first panel 20-29y incidence rates are the same in the current decade as they were in the 1970s — there is no change. The range represented in this panel, from lowest to highest data point, is less than 1.5 cases/year.

Skipping one panel, looking at 40-49y, we see the range has maybe dropped a bit but the entire magnitude range is less than 5 cases/100,000/year.  In this age-group, there is a trend line drawn which shows an increase over the last 12-13 years, but the range is currently lower than in the 1970s.

In the remaining four panels, we see “hump shaped” data, which over the 50 years, remains in the same range within each age-group.

It is important to remember that this is not an illness or disease for which a cause is known or for which there is a method of prevention, although there is a treatment if the condition is discovered early enough.   It is a class of cancers and incidence is not controlled by public health actions to prevent the disease.  Public health actions are not causing the change in incidence.  It is known to be age-related and occurs increasingly often in men and women as they age.

It is the one panel, 30-39y , that shows an increase in incidence of just over 2 Cases/100,000/year that is the controlling factor that pushes the Men < 50 graph to show this increase.  (It may be the 40-49y panel having the same effect.) (again, repeating the image to save readers scrolling up the page):

age-specific_incidence_men

Recall that the Conclusion and Relevance section of the paper called this “This increase in incidence among a low-risk population calls for additional research on possible risk factors that may be affecting these younger cohorts. It appears that primary prevention should be the highest priority to reduce the number of younger adults developing CRC in the future.”

This essay is not about the incidence of this class of cancer among various age groups — it is about how having statistical software packages draw trend lines on top of your data can lead to confusion and possibly misunderstandings of the data itself.   I will admit that it is also possible to draw trend lines on top of one’s data for rhetorical reasons [ “expressed in terms intended to persuade or impress” ], as in our Climate.gov example (and millions of other examples in all fields of science).

In this medical case, there are additional findings and reasoning behind the researchers conclusions — none of which change the basic point of this essay about statistical packages discovering and drawing trend lines over the top of data on graphs.

Bottom Lines:

  1. Trend lines are NOT part of the data. The data is the data.
  1. Trend lines are always opinions and interpretations added to the data and depend on the definition (model, statistical formula, software package, whatever) one is using for “trend”. These opinions and interpretations can be valid, invalid, or nonsensical (and everything in between)
  1. Trend lines are NOT evidence — the data can be evidence, but not necessarily evidence of what it is claimed to be evidence for. 
  1. Trends are not causes, they are effects. Past trends did not cause the present data.  Present data trends will not cause future data.   
  1. If your data needs to be run through a statistical software package to determine a “trend” — then I would suggest that you need to do more or different research on your topic or that your data is so noisy or random that trend maybe irrelevant.
  1. Assigning “significance” to calculated trends based on P-value is statistically invalid.
  1. Don’t draw trend lines on graphs of your data. If your data is valid, to the best of your knowledge, it does not need trend lines to “explain” it to others. 

# # # # #

 

Author’s Comment Policy:

Always enjoy your comments and am happy to reply,  answer questions or offer further explanations.  Begin your comment with “Kip…” so I know you are speaking to me.

As a usage note, it is always better to indicate who you are speaking to as comment threads can get complicated and comments do not always appear in the order one thinks they will.  So, if you are replying to Joe, start your comment with “Joe”.  Some online periodicals and blogs (such as the NY Times)  are now using an automaticity to pre-add “@Joe” to the comment field if your hit the reply button below a comment from Joe.

Apologies in advance to the [unfortunately] statistically over-educated who may have entirely different definitions of common English words used in this essay and thus arrive at contrary conclusions.

Trends and trend lines are a topic not always agreed upon — some people think trends have special meaning, are significant, or can even be causes.   Let’s hear from you.

# # # # #

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
252 Comments
Inline Feedbacks
View all comments
August 6, 2019 10:23 am

A trend is only useful when measuring a physical cause, e,g. I did a physics experiment at university to calculatie the heat conductivity of sand.

Reply to  Hans Erren
August 6, 2019 10:28 am

Also… Velocity gradients, production decline curves, any variables that exhibit mathematical functions.

Reply to  David Middleton
August 6, 2019 6:15 pm

Placing/analyzing a particular sample within a well defined relationship, based on understood, consistent, physical properties is somewhat different than making a claim, or projection, about something dependent upon many poorly understood variables, such as wide scale temperatures surface temperatures, sea level changes, and other sacred tenants of climate science, no?

Reply to  AndyHce
August 6, 2019 6:56 pm

Of course it is… Which is why you should rarely plot a graph without trend line analyses… 😉

August 6, 2019 10:30 am

Kip

Makes you wonder why Excel has a trend function (-:

Here’s two graphs one with and one without a trend line:

NASA 2019 minus 2009 Data
comment image

NASA Trend Comparison 1997-2018
comment image

One graph requires a trend line to see the difference, and looks like your point is that an increase of 0.25 deg C in 100 years is just that, 0.25 deg and so what.

The other graph shows a pattern that looks suspicious.

Editor
Reply to  steve case
August 6, 2019 11:35 am

Steve ==> Gotta say that the graph with trend lines is a good illustration of the problem.

Reply to  Kip Hansen
August 6, 2019 3:03 pm

Kip

One problem seems to be that one shouldn’t make a big deal about 0.25 deg/century. OK

The issue not addressed is the pattern of changes. That graph don’t need no stinking trend line.

Both graphs address the same issue, every month NASA’s LOTI changes over 40% (1st 6 mo of 2019) of their monthly entries and over time it has produced that pattern. So, is that a probloem?

Komrade Kuma
August 6, 2019 10:56 am

You could readily create an ‘uptrend’ by taking data calculated from a sinusoidal expression, i.e. y = a sin x, start it at a trough and finish it near a crest and hey presto, the end of the world is upon us!!

Matthew R Marler
August 6, 2019 10:57 am

Don’t draw trend lines on graphs of your data. If your data is valid, to the best of your knowledge, it does not need trend lines to “explain” it to others.

Are you opposed to publishing calibration curves? Using the functions fit to instrument calibration data? How about age-related reference regions for diagnostic interpretation of lab results?

To me, this is roughly equivalent to staying home when the advice is to “drive carefully!”

What do you think about nonlinear transforms of the data, for example to make log-log plots?

Editor
Reply to  Matthew R Marler
August 6, 2019 11:39 am

Matthew R Marler ==> “Calibration curves” are not “trend lines” — they just look like a trend line.

See some of my replies above, and the links to the Button Collector series and my replies below those essays.

Matthew R Marler
Reply to  Kip Hansen
August 7, 2019 12:58 am

Kip Hansen: “Calibration curves” are not “trend lines”

I disagree. They are generally estimated via least squares, same as most other trend lines. There may or may not be tests of linearity and homogeneity of variance, but they are trend lines.

Reply to  Matthew R Marler
August 7, 2019 5:23 am

None of my electronic equipment is evaluated using least square curve fitting. The calibration curve is determined by comparison of my instruments against a known calibration instrument or element. The curve thus generated is the calibration curve. If at some point my instrument deviates a large amount from the calibration standard due to non-linear factors just happening to coincide then the calibration curve goes through that point, it is not ignored.

Matthew R Marler
Reply to  Tim Gorman
August 7, 2019 9:56 am

Tim Gorman: None of my electronic equipment is evaluated using least square curve fitting. The calibration curve is determined by comparison of my instruments against a known calibration instrument or element.

Measuring instruments are always calibrated against something. It is the calibration curve that allows the value of the measured quantity (pH, volts, temperature) to be calculated from the indicator (usually current flow, but could be mercury extent in a common thermometer) — from the inverse of the calibration curve. Your phrase “comparison of my instruments” hides all the details of what “comparison” is actually carried out — most likely a least squares fit to the (possibly log or log-log transformed) calibration data. Your electronic measuring instruments have a calibration curve (or curves, along with expected accuracy) that may be supplied in the paperwork that is in the boxes the devices came in, or may be obtainable from the manufacturer.

Reply to  Matthew R Marler
August 7, 2019 7:36 pm

“It is the calibration curve that allows the value of the measured quantity (pH, volts, temperature) to be calculated from the indicator (usually current flow, but could be mercury extent in a common thermometer) — from the inverse of the calibration curve.”

I’m not even sure what you are saying here. The value of the measured quantity is not calculated from the inverse of the calibration curve. The calibration curve gives an adjustment factor to be applied to the instrument reading. If my power meter reads 1db low at 144Mhz according to the calibration curve then I must add 1db to my reading in order to have an accurate measurement.

“Your phrase “comparison of my instruments” hides all the details of what “comparison” is actually carried out — most likely a least squares fit to the (possibly log or log-log transformed) calibration data.”

I’m not even sure we are talking about the same thing. Calibration is done by applying a standard input to my equipment and to the calibration standard equipment. The difference then becomes the next point on the calibration curve. There is no “least squares fit”. The calibration curve goes through all the points that are measured for calibration. If you don’t do that then there isn’t much reason to calibrate your instrument against a standard.

Matthew R Marler
Reply to  Tim Gorman
August 8, 2019 11:40 am

Tim Gorman: I’m not even sure we are talking about the same thing. Calibration is done by applying a standard input to my equipment and to the calibration standard equipment.

I am sure that we are not talking about the same thing. My Micronta 43-Range Multitester displays the result of the measurement via a needle that swings in an arc from left to right. What is literally driving the needle is the magnetic field generated by the flow from a coil. To get the numbers on the dial, the developers put known standards to the test, and for the known standards marked the appropriate values on the dial (actually, the process is more complicated, but that’s the gist of it). They used 10 – 20 standards in the range. Points on the dial between the marked standards were interpolated. There is then a function relating the standard value to the magnetic field strength. Developing that function is what I mean by “calibration” of the measurement instrument. Inferring the value of the measured attribute from the position of the needle is effectively reading the inverse of the value of the magnetic field strength.

For an HPLC to measure a blood constituent, the standards are vials filled with aliquots that have known quantities of the analyte. The result of the HPLC run is the function relating the known concentrations to the area under the curve of the absorption function (or perhaps the maximum of the absorption function — n.b. it is the absorption function, smoothed data, not the raw absorption data). Then the area is computed for a sample of unknown concentration, and the functional inverse of that is used as the estimate of the measured quantity.

What you have described to me reads like “adjustments to the calibration curve”, which is probably perfectly well shortened, in this context, to “calibration”. Or you have some other well-defined usage for your setting.

Reply to  Matthew R Marler
August 8, 2019 2:56 pm

“There is then a function relating the standard value to the magnetic field strength. Developing that function is what I mean by “calibration” of the measurement instrument.”

The magnetic field strength will vary from instrument to instrument for various physical reasons. You *must* develop a calibration curve for each instrument individually by determining the difference between the instrument readings and the readings of a standard.

This has *nothing* to do with doing a linear regression of data points as was initially proposed. Not even *your* description has anything to do with developing a linear regression of the data points! Interpolation is not regression.

If your absorption function does not match the data exactly then integrating the function to find the area under the curve winds up with an error bar all of its own related to the error bars associated with the measurement of the data.

“estimate of the measured quantity.” The operative word here is “estimate”. Calibration of measurement devices is meant to provide for eliminating “estimating”.

Matthew R Marler
Reply to  Tim Gorman
August 8, 2019 5:25 pm

Tim Gorman: If your absorption function does not match the data exactly then integrating the function to find the area under the curve winds up with an error bar all of its own related to the error bars associated with the measurement of the data.

That is true. You have error either way, and the maximum or area computed from the fitted curve will have smaller mean square error than the area computed from only the data — if the model fits well enough, which can generally be checked.

But if you only use data in calibration, not the model, you restrict your calibration to the subset of values actually used in the calibration, not the full range of expected values.

Interpolation is not regression.

Do you do “linear” interpolation, based on a straight line fit to two data points?

You *must* develop a calibration curve for each instrument individually by determining the difference between the instrument readings and the readings of a standard.

If you do this with a subset of values, and then apply it to the full range of values expected to be encountered, then you are developing a calibration curve relating the new instrument to the “gold” standard. What you use in practice is the full calibration curve (which may be a higher order polynomial or sum of exponentials or b-spline approximation, or something from other families of functions), not just the relatively few values tested.

Matthew R Marler
Reply to  Kip Hansen
August 7, 2019 9:45 am

Kip Hansen: “A trend line (also called the line of best fit) is a line we add to a graph to show the general direction in which points seem to be going.”

“The line of best fit” gives away the story. How is “best” determined, least squares? “Best” among what — best low order polynomial, best among 2-compartment diffeqn models, best exponential decay model, best subject to continuity or parsimony constraints, etc?

Trend lines can not be depended on to predict the future accurately, but if you are trying to predict the future at least approximately, then you depend on the trend line, not the data.

Your cautionary remarks are the usual cautionary remarks about “believing” the trend line of a single data set without testing. There is no good case that trend lines should always be avoided.

Matthew R Marler
Reply to  Kip Hansen
August 7, 2019 3:18 pm

Kip Hansen: you see many examples in which trend lines obscure data

Sure.

Everything that can be done well can be done badly, including graphing trend lines. The solution is not to abjure trend lines altogether, but to use them thoughtfully.

What you are doing is “throwing the baby out with the bathwater.”

Editor
Reply to  Kip Hansen
August 7, 2019 3:45 pm

Matthew ==> I have specifically denied “throwing the baby out with the bath water” in the Epilogue above.

Matthew R Marler
Reply to  Kip Hansen
August 7, 2019 6:42 pm

Kip Hansen: I have specifically denied “throwing the baby out with the bathwater”

Then you need to rewrite this: Don’t draw trend lines on graphs of your data. If your data is valid, to the best of your knowledge, it does not need trend lines to “explain” it to others.

Otherwise the denial is hollow.

a_scientist
Reply to  Matthew R Marler
August 6, 2019 1:22 pm

My advisor told me, that absent very good data, in special cases, anyone making log-log plots should be shot.

It is too easy to hide the noise or scatter.

Matthew R Marler
Reply to  a_scientist
August 7, 2019 1:03 am

a_scientist: absent very good data, in special cases, anyone making log-log plots should be shot.

That’s absurd. They are just one of many transforms of the data that may be informative beyond merely eyeballing the raw data.

TonyL
August 6, 2019 10:57 am

Kip Hansen said:

Drawing trend lines on graphs is adding information that is not part of the data set

Exactly 100% Absolutely Wrong!
The trend line is derived 100% exactly from the data. The information represented by the line comes from nowhere else.
Allow me to explain.

The Situation In One Dimension:
We have a set of numbers, all measurements of the same thing. We have simply conducted our measurement multiple times so as to generate an average of some sort. There is much information in the whole data set. Maybe we care, maybe not.
A) We report the mean. It is the figure of merit and all we care about. What is lost is any information about the dispersion of the individual points about the mean. We do not care.
B) We report the mean and standard deviation. Some information about the dispersion is retained, some is lost. For instance, there may be a tendency for the measurements to systematically increase as the experiment proceeds. This will be lost.
C) We report the mean, standard deviation, and sample size, which is the best condensation we can typically do. Of this is still not good enough, no sense trying to condense the data, back to the raw data you go.
In all cases, no information is magically added to the system. Just the opposite, information is discarded for the sake of brevity.

The Situation In Two Dimensions:
The case, and issues are the same. The mean becomes the slope and intercept. Two values instead of one.
A) Just the same as above, slope and intercept reported, dispersion lost.
B) Just the same as above, slope, intercept, and confidence intervals reported.
C) A Little Different! Show all the Raw Data AND the summary Trend Line. Keep everybody happy.
100% of the data and the information contained within is displayed. No information is discarded, and none is magically conjured up ad added.

Editor
Reply to  TonyL
August 6, 2019 11:55 am

TonyL ==> If only it was that simple. Of course, the trend is derived from the data.

The practical problem is we never have all the data, there are start points and end points — there is the matter of scale — which definition and formula for trend we might use — there are a lot of variables and thus your decision to use and show one of them is an OPINION about the data, even if it is derived from the data.

The statistical mean is not a trend line — it is the statistical mean. It may or may not be useful. In multiple measurements of one unchanging thing — it is how we reduce measurement error.

Your 2-D example only works for trivial situations — like mechanical processes with short simple data sets.

For physics problems, sometimes a “slope” can be useful information — a “slope” is not exactly the same as statistical trend line laid over a set of data — but looks like one.

Reply to  TonyL
August 6, 2019 11:59 am

The trend line is derived 100% exactly from the data. The information represented by the line comes from nowhere else.

So which bit of the data describes the beginning and end of the trend lines?

TonyL
Reply to  M Courtney
August 6, 2019 12:35 pm

I honestly do not even know what you are attempting to say, you seem to be so far off the point.
So how do we determine the end points of a trend line?
The mathematician tells us that a point is infinitely small, and also that a line is an infinite collection of points. We also find out that a line extends infinitely in both directions.
We know that a line is fully and uniquely described by the slope and the intercept. Nothing else is needed. A line does not have endpoints, nor does it does not need them.

If your trend line is truly a line, it has the above described properties of a line. If your trend line has endpoints, it may be because you added them.

Reply to  Kip Hansen
August 6, 2019 2:06 pm

Past performance is no guarantee of future results. Corrolary: short term trends can become long term trends, how does linear regression capture this?

Reply to  TonyL
August 6, 2019 1:46 pm

I’ll make it simpler.
There are numerous trend lines on the temperature graph. It is an editorial decision – a political decision – to choose that many trend lines.

You could just draw a straight line over the whole data set. But that is making a choice that the data is homogenous and no new factor has come into play over that period. That choice is not in the data.

You could draw inflexions when something changes, such as the moon entering the house of Aquarius. But choosing to prioritise astrology is again, not in the data set.

Now, you are clearly fluent in the language of statistics; you know your stuff. But that means you can understand the ‘words’ even when they are untrue.
And you seem to not realise that you are being misled.

TonyL
Reply to  M Courtney
August 6, 2019 4:56 pm

Ah, I see. I honestly did not see what you were getting at.

You could just draw a straight line over the whole data set. But that is making a choice that the data is homogenous

If you want to be this strict about it then still no, I am merely accepting the null hypothesis of no change. This is not making a decision and adding information to the data set.
If I were to break the data set into two or more portions, and plot two or more trend lines, then yes, I have added to the system. I am well aware of that. Whenever anybody does that, the howls and cries of “Cherry Picking” bombard your ears. You have to be ready for that. So you are correct for a split graph. But I would never claim that all the information is contained within the data for a split graph.

Reply to  TonyL
August 6, 2019 1:24 pm

Nick: “A) We report the mean. It is the figure of merit and all we care about. What is lost is any information about the dispersion of the individual points about the mean. We do not care.”

Of course we care. If what you have determined is the mean length of 1000 steel girders then the dispersion of the individual points about the mean is *very* important and we *very* much care about it!

Matthew R Marler
Reply to  TonyL
August 8, 2019 2:25 pm

TonyL The trend line is derived 100% exactly from the data. The information represented by the line comes from nowhere else.

I would not say it that way. Information is provided from other research showing that some member of a parameterized family will fit the data well, and that the specific estimated parameter is physically meaninful. With radioactivity counts, for example, the linear trend estimated from the log-transformed data can be transformed to the half-life of the sample. Likewise for the terminal portion of a concentration curve from a pharmacokinetic experiment; parameters from the fit of a compartment model can used to devise a dosing schedule for experiments on the efficacy of the drug.

Kip Hansen’s essay does not distinguish between fields where such evidence of applicable models has been verified, fields where verification has barely begun, and fields where no such verification has been done but the experiment under consideration may be a pathfinder in the field.

DocSiders
August 6, 2019 11:07 am

Before even arriving at meaningful trends, you need accurate enough and consistent enough data over a “long enough” interval.

Prior to the 1940’s we don’t know what Global Average Temperatures were with a high enough accuracy to calculate trends as accurately as they are often stated.

Prior to the 1940’s there wasn’t enough data to know the 1° Grid temperatures of over 75% of the world to with better than +/- 2 C accuracies…if that. For 3/4 of the world, it was all extrapolations and guesswork. Prior to 1920 it was considerably worse. With that level of uncertainty there is no way to append the old data onto the better new data to generate meaningful trends for the last century.

The minimum temporal interval for discussing climate is 30-50 years FOR INDIVIDUAL DATA POINTS. Noise from decades long ocean and air circulation cycles and decades long energy transfer lags define about a half century interval FOR Detecting ANY CLIMATIC CHANGES…let alone trends…else we are really only talking about weather trends.

UAH GAT’s have risen 0.28 C the last 30 years. Or 0.09 C per decade. The previous 30 years (cludged together with far less certain data sources) the GAT’s rose 0.19 +/- 0.09 C. For the 30 years prior to that (1928-1958), the data is too sparse and too uncertain to produce any meaningful trend values. “Meaningful” here being trends accurate to +/- 0.25 C per decade for INCLUSION IN THE DISCUSSION about trends in the GAT.

So, we really only have 2 useful data points and one of those is “shaky” for CLIMATIC trends (60 to 100 years for 2 or 3 points) in GAT’s. The average of this insufficiently small data set indicates a century trend in GAT of less than + 1 C.

Editor
Reply to  DocSiders
August 6, 2019 11:58 am

DocSiders ==> Shaky is a good word — and I agree. Wait til yu read the new SLR Acceleration paper!

Joel Brown
August 6, 2019 11:54 am

Temperature can be considered an output variable of the climate system and is dependent on many input variables. Any scientific analysis of the trend in an output variable must include analysis of the input (KIVs) variable(s) having the most impact on the output variable to provide a complete picture of observed trend.

While determining statistical trends in data is sometimes necessary, it is important to keep in mind the confidence in the estimated trend. Usually when working with continuous data (e.g. temperature), 25 to 30 data points are necessary to establish a confidence estimate of 95% that you are seeing a true behavior in the output data. In the temperature data example, the use of 15/point trends significantly reduces the confidence that you’re seeing a true behavior in the data (as this particular data set clearly shows).

Editor
Reply to  Joel Brown
August 6, 2019 12:29 pm

Joel Brown ==> When one does not know the cause of the measured data — then the trend of the existing data does not inform us of anything not already apparent in the data — the set starts at 1 then 2 then 3 then 10 then 5 then 8 then 9 then 2……one could draw a trend line — which will tell you nothing other than what the data did in the past seven measurement periods — but you already knew that. You only need look at the graph….

The trend, calculated from sufficient data points, only tells you what the system being measured did over the period covered by those data points — and I repeat — that is something you already know because you have the data points.

August 6, 2019 11:56 am

Trend lines are useful for addressing clearly defined questions.
The questions need to be clearly defined so as the viewer can then consider of the trend line is:

A) Important.
B) Pertinent.
C) Significant.

If the question isn’t important the trend won’t be.
If the trend line doesn’t relate to the question then the trend line won’t be pertinent.
If the data doesn’t support the trend line sufficiently for the question being addressed then it’s meaningless.

The question being asked in the climate data is about the Pause and can be written as:
“Has there been any Global Warming in Greta Thunberg’s lifetime?”

When you know what the question is you can judge whether the trend lines are fit for purpose.

Editor
Reply to  M Courtney
August 6, 2019 12:36 pm

M Courtney ==> Your Greta question is answered by simply by listing or graphing the data from her birth date to present — then look at it. It is either warmer now or not. But your real question is not so simple….you want some kind of statistically determined “trend” line to tell you “has it been getting warmer”… even though you have access to the original data.

If the data itself can’t tell you by simple looking — then you are asking a question that maybe the data can’t really honestly tell you.

By most measures, the “global average surface temperature” (as measured and manipulated) is higher now than 16 years ago…marginally.

Reply to  Kip Hansen
August 6, 2019 1:51 pm

What you have noticed is the irony of the word “Significant”.
Statistically the graph is asking a question that the data can’t really honestly tell you.

But that assumes the graph was created with “really honestly” being a consideration. Not necessarily so.

Reply to  Kip Hansen
August 6, 2019 2:51 pm

Point taken. I wasn’t directly on the point of the essay. Nor was I first though.

Yet, as the topic of this essay was why trend lines are didactic and illustrative I thought that summarising when being didactic is acceptable would be helpful.

Not in the case you used for example.

August 6, 2019 12:09 pm

Flow charts can also be misleading and/or useless.

comment image

Paramenter
August 6, 2019 12:10 pm

Hey Kip,

Part of the ‘replication crisis’ in some branches of science is misuse of statistical techniques. For instance, small change in a treatment of outliers may lead to different conclusions.

This ‘pause’ in global warming is interesting. As such it contradicts severity of the greenhouse effect – despite steady increase of CO2, what supposedly should lead to proportional retention of thermal energy in the atmosphere, we did not observe increase in averaged ‘global’ temperature. I can see at least two possible explanations:

1. Another large-scale physical effect temporarily contracted greenhouse effect. The question is then why not attribute most of the observed warming to such naturally occurring effects, other than greenhouse effect?
2. Local events skewed averaged ‘global’ temperatures, dragging it down. In such case we should ask for the detailed ‘heat map’. Again, the question would be why chain of such local events is not responsible for most of the observed warming, pushing ‘global’ averages up?

Editor
Reply to  Paramenter
August 6, 2019 12:38 pm

Paramenter ==> You’ve got good questions — always the most important part of thinking about something.

The majority of CliSci researchers are trying to figure out the answers — some minority are just pushing alarmism.

Clyde Spencer
August 6, 2019 12:36 pm

Kip,

I will agree that trend lines are often misused or abused. However, they do have utility in data analysis. Perhaps the greatest value is to emphasize the “opinion” of someone making conjectures about the meaning of the data.

Succinctly, if one has a data set with a lot of scatter, and there is a need for a best guess for the Y-value at a certain X-value for which there is no measurement, the equation for the regression line gives a quantitative interpolation that is better than a visual guess. Extrapolations are more risky, but it may be the only way of predicting. Linear extrapolations are better behaved than polynomial extrapolations and may be acceptable if one has other reasons to believe that the relationship is actually linear.

Calibration curves ARE a form of regression that minimizes the effect of scatter in a set of measurements.

Editor
Reply to  Clyde Spencer
August 6, 2019 12:44 pm

Clyde Spencer ==> Guessing may be important in some fields — fooling oneself that the guesses are “scientifically provided answers” is rife and unfortunate.

That said, guessing for hypothesis formulation is an important part of science — but we mustn’t confuse our guesses, no matter how much math is involved, with actual measurements and reality in the physical world.

Guessing that the Global Average Surface Temperature anomaly is XX.xx degrees C is pretentious and dangerously foolhardy. (not that you did it, of course, just an example.)

August 6, 2019 12:47 pm

@Kip/ Michael/A C Osborn

I show another example:
comment image

Rainfall versus time. Measured at a particular place on earth, it is of course looking highly erratic measured from year to year. Not a high correlation…

But the point in showing the straight trend line was to prove a relationship over time i.e. the 87 year Gleissberg cycle.

Jim
August 6, 2019 12:48 pm

This topic of trendlines needs a qualifier. Sometimes the data is following known physical laws and its continuation by extrapolation with trend lines is predictable. For example, oil well production follows a predictable exponential decline, as does groundwater contaminant extraction, natural attenuation and many other things. Drawing a trendline and extrapolating helps to obtain the “K” factor or half-life of the decline curve and forecast the economic limit of extraction.
S curves are common in technology waves, for example a new smart phone or digital TV resolution. At first sales are slow, then accelerate, then taper off. The trouble comes when a trend line extrapolation is used to extrapolate a phenomenon that has no tangible reason to continue to follow the trendline. To put it simply, “statistics are descriptive, not explanatory”.
Even the Hubbert curve or “peak oil” is useful as long as we keep in mind that it applies only to current technology. The old Hubbert Curve was for oil production from sandstone and limestone. Hydraulic fracturing in oil shales in horizontal wells was a major technology change, new source of oil reserves and curve disrupter. Nevertheless a new Hubbert curve should form for this relatively new technology and the old curve is still good if limited to sandstone and limestone reserves.
On the other hand, extrapolation of high dose toxicology studies to low dose remains controversial. The most difficult of all is known as the “single fiber theory” with asbestos fibers. According to the theory, if a million people were each exposed to a single asbestos fiber, there is someone out there so sensitive that they will develop mesothelioma. Or so the theory goes. This enables extrapolation down to zero exposure level. The reality is the body has defenses to keep particulates out of the lungs (sinuses, mucous, nasal hair, etc), which must be overwhelmed to induce disease, so most likely there is a minimum threshold exposure value and extrapolation to zero is invalid. The fact we can’t identify a threshold value does not mean it does not exist. Extrapolation of high dose exposures to low dose and trans-species studies (mice as surrogates for people) are done all the time. This enables us to make at least conservative estimates of toxicity. But it should be remembered that these are nothing but worst-case estimates. Occupational exposure studies are the most trustworthy toxicological studies and even then should be limited at low dose to interpolation, not extrapolation. The most ridiculous thing we do is apply a risk value of 1:100,000 adverse health consequence to a rural area where no one is exposed. Or a maximum contaminant limit to an aquifer that no one drinks.
Extrapolating a temperature trend that is known to be cyclical is risky business. Complex data usually has three components, all scale dependent, a trend (not necessarily linear), a cyclical component, and a random variation. Unfortunately, random variation (variance) is extremely high with weather compared to the others. Put simply, if the average high is 80 and the record is 110, and the average low is 50 and the record is 20, what difference will it make if the average low changes to 51? When making trend lines, different trends should be tried (linear, exponential, quadratic, step change, etc) and the highest correlation coefficient should be found. Correlation coefficients should always be stated with trend analysis.

August 6, 2019 1:09 pm

Jim said: Correlation coefficients should always be stated with trend analysis.

True. But even if R=0 the trend line might still say something…
https://wattsupwiththat.com/2019/08/06/why-you-shouldnt-draw-trend-lines-on-graphs/#comment-2763217

1sky1
August 6, 2019 1:16 pm

This “something” is an opinion — it is always an opinion — it is not part of the data.

While fitting a trend, linear or otherwise, to data often involves some subjective choices, it definitely is NOT just an opinion. It is entirely an objective mathematical exercise performed upon the data. The onerous part is that few have the mathematical capability to recognize the appropriateness of their choices. Bad ones can indeed produce grossly misleading results. Contrary to William Briggs’ misguided admonition, however, good ones can reveal quantitative features that are not apparent simply by looking at the data. In either event, trend-fitting is not a proper time-series analysis tool. Almost invariably, there’s no recognition that linear (regressional) trends are very crude band-pass filters, whose desired function can be achieved far more objectively and powerfully through other analytic methods.

1sky1
Reply to  Kip Hansen
August 6, 2019 2:49 pm

What we have been referring to as “trend lines” is pretty specific …

While your link shows various examples of curve-fitting, you deal only with linear ones. Nothing presented here truly specifies the exact nature or source of the pitfalls.

1sky1
Reply to  Kip Hansen
August 6, 2019 4:39 pm

Your links are for total laymen. They fail to get to the analytic core of the issue.

1sky1
Reply to  Kip Hansen
August 7, 2019 2:09 pm

Kip:

You continue to miss the point. Of course WUWT is for the general public. But it doesn’t require going into abstruse technical weeds to identify what the heart of the problem is with “trend lines drawn over times series graphs.” See:

https://wattsupwiththat.com/2019/08/06/why-you-shouldnt-draw-trend-lines-on-graphs/#comment-2763555

1sky1
Reply to  1sky1
August 7, 2019 4:02 pm

Say what you will, but Peter specifies the nature of the pitfalls in a rigorous way that you never did.

Izaak Walton
August 6, 2019 1:40 pm

Kip,
The claim that “The data is the data” is nonsense. Take for example the temperature graph
from Dr. Spencer that you like and ask what was actually measured. The answer is the number
of electrons flowing through a wire over a period of time. If anything that is the data. Then to
get a temperature there are a large number of processing steps. Firstly the number of electrons
is converted to a voltage using a trend line (i.e. Ohm’s law) then that is converted to a photon
flux. That photon flux is then converted to a ratio of intensities at different wavelengths. Then
using another calibration curve you get a “temperature”. You then need to correct for the speed,
and height of the satellite, the angle of inclination of the sensor (so you know how much of the atmosphere
you are looking at) the time of day etc. Then the UAH group averages that over a month and releases
the final result.

Compared to all of that processing complaining about someone else drawing a straight line on top of
part of the graph seems a little silly. So unless you want Dr. Spencer to release graphs showing the
number of electrons versus time (and even time is measured in electrons if you get right down to it) you
have to accept that scientists process data all the time for valid reasons. And say “the data is the data”
is nonsense.

Reply to  Kip Hansen
August 6, 2019 2:00 pm

Trend lines are just a form of SMOOTHING data sets.

Then explain how these trend lines “smooth” the data sets.

The development of AVO crossplot analysis has been the subject of much discussion over the past decade and has provided interpreters with new tools for meeting exploration objectives. Papers by Ross (2000) and Simm et al. (2000) provide blueprints for performing AVO crossplot interpretation. These articles refer to the Castagna and Swan (1997) paper which laid the foundation for AVO crossplotting. The AVO classification scheme presented by Castagna and Swan, which was expanded from the work of Rutherford and Williams (1989), has become the industry standard. Castagna and Swan also investigated the behavior of constant Vp/Vs trends, concluding that for significant variations in Vp/Vs many different trends may be superimposed within AVO crossplot space, making it difficult to differentiate a single background trend. A misapplication of this concept has been to infer a direct correlation between these changing background Vp/Vs trends with rotating intercept/gradient crossplot slopes (what Gidlow and Smith (2003) call the fluid factor angle, and Foster et al. (1997) the fluid line) observed in seismic data. The central question of this paper is: when is this background trend (or fluid line) rotation a representation of real geology and when is it a processing-related phenomenon? To answer this question I review the theoretical expectations regarding rotating AVO crossplot trends, the role of seismic gather calibration (or lack thereof), and the value of various compensating methods. Furthermore, I investigate the appropriateness of using constant Vp/Vs lines in a crossplot template by examining both the mathematical and modeled AVO crossplot responses which incorporate an established compaction trend. I conclude that when exploring in a reasonably compacted environment (Vp/Vs ratios of 1.6-2.4) a relatively small range of fluid angles (background trend rotations) can be expected. Large variations of fluid angle observed in seismic data can be attributed to the difficulty in preconditioning gathers for AVO analysis.

Reply to  Kip Hansen
August 6, 2019 4:14 pm

There’s nothing “sophisticated” about AVO trend lines. They are all linear regressions and can be used to differentiate oil from gas from brine on seismic data.

Reply to  Kip Hansen
August 6, 2019 4:02 pm

All of the examples I cited were trend lines, Kip. Trend lines, without which the graphs would be useless.

The purpose of cross-plotting two or more variables on a graph is to determine if a mathematical relationship exists. The trend line is the mathematical relationship.

While trend lines are often misused and/or intended to deceive, this is simply ridiculous generalization: “Don’t draw trend lines on graphs of your data. If your data is valid, to the best of your knowledge, it does not need trend lines to “explain” it to others.”

Izaak Walton
Reply to  Kip Hansen
August 6, 2019 4:42 pm

Kip,
Suppose I get an undergraduate student to plot voltage against current for a
simple resistor. They will get a good approximation to a straight line. Surely in
that case plotting a trend line will tell them what the value of the resistor is. And
using that information the student will be able to predict what the value of the voltage
will be for new values of the current. Do you really want to claim that in that case a trend line is useless or that it doesn’t add anything?

Now of course not everything is as simple as a resistor. But trend lines can still be used to
extract useful information from a graph. They can also be used to mislead or can be complete nonsense (see http://www.tylervigen.com/spurious-correlations for excellent examples). But if Dr. Spencer chooses to average the UAH data over time periods of one month why shouldn’t I choose a different averaging period? What makes monthly smoothly correct while decadal smoothing wrong?

Reply to  Izaak Walton
August 6, 2019 5:47 pm

Izaak: “Suppose I get an undergraduate student to plot voltage against current for a
simple resistor. They will get a good approximation to a straight line. Surely in
that case plotting a trend line will tell them what the value of the resistor is.”

Why do you have to plot a line? R=V/I All you need is one measurement, i.e. a POINT, to determine the value of the resistor.

The trend line *is* useless and doesn’t add anything.

“Now of course not everything is as simple as a resistor. But trend lines can still be used to
extract useful information from a graph.”

Really? What useful information can be extracted from a trend line of temperature, for instance? Can it tell you what is happening in the atmosphere? What if the data used to generate the trend line is an average? Does the trend line tell you what is happening in reality? Can the trend line be extended past the last data point to give a reliable prediction?

Izak Walton
Reply to  Tim Gorman
August 6, 2019 7:03 pm

Tim,
do you really believe that one point is sufficient for measuring the value of a
resistor? All real measurements have errors and those need to be accounted for.
Which can be done by taking different measurements and fitting a curve to them.

Again with temperatures trend lines are useful if applied correctly but can be
misleading if extended too far. Knowing the temperature in May and June might
allow me to predict the temperature in July but I would get it wrong if I tried to
extrapolate through to December.

Reply to  Izak Walton
August 7, 2019 4:52 am

Izaak,

Yes, I believe one point is sufficient to determine the value of a resistor. If your measuring devices have a systemic error built-in then no quantity of measurements at different points will result in a more accurate measurement of the value of the resistor. If the error is in reading the devices output (i.e. an analog scale) then you just read it multiple times using the same values for voltage and current.

“Knowing the temperature in May and June *might* allow me to predict the temperature in July” (asteriks mine, tim)

The word *might* is quite telling in your statement! If extending the trend line only *might* allow future predictions then of what use is extending the trend line? Extending the trend line and expecting it to come true just becomes an article of faith, not a scientific article of fact.

yarpos
August 6, 2019 2:28 pm

Can I draw a hockey stick instead them? thats not really a trend line

Sheri
Reply to  yarpos
August 6, 2019 4:14 pm

Michael Mann clearly says it is a trend line and it’s his creation. 🙂

Tom Johnson
August 6, 2019 3:34 pm

Dear Readers:
Depending on a trend line cost Dr Richard Feynman a Nobel Prize.
You can read about it in his biography.
He was attempting to participate in the discussion about what happens in “weak” decay of nuclear particles. He extrapolated from published data and ignored the last two known points.
Had he not extrapolated, he would have been first off the mark and gotten a second Nobel.
DON’T DO IT!
Graphed data is a valid approximation when used between known data points.
Almost no physical phenomena are linear. Nature isn’t like that.
The error bars outside of known measured data are all over the place.
Best story of extrapolation: Mark Twain, “Life on the Mississippi”, describing the length of the Mississippi River. Bleed-through of oxbows had resulted in a shortening of 200 miles in 100 years. So Twain projected forward and back. He ended up with New Orleans a suburb of Chicago!

tty
Reply to  Tom Johnson
August 7, 2019 7:18 am

New Orleans and Cairo actually, Chicago not being on the Mississippi even in Twain’s day. His observation on science in this connection is well worth remembering:

“There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.”

curly
August 6, 2019 4:01 pm

Two books:

How to Lie with Maps
How to Lie with Charts

Both have multiple editions

And Tufte’s books, on selling your position with attractive charts and graphs.

Editor
August 6, 2019 4:28 pm

EPILOGUE:

Well, that was fun. I even correctly predicted that statistic-ish folks would not like it and I was right. Stats-heads had all sorts of irrelevant examples of how some lines drawn by some advanced statistical shenanigans might produce some sort of useful something. I’m pretty sure most of them knew I wasn’t talking about their little preciousnesses.

Most readers correctly read the essay as dealing with simplistic trend lines as used in the examples and thus hopefully gained something, even it just reinforcement of your level of trust in your own understanding.

Never fear — there is no need to throw the baby out with the bathwater — there are times and uses for even simplistic linear trend lines — most of them rhetorical, used in persuasion — and all of them based on the opinion of the “drawer of the lines”.

In truth, I have often imported graphs from journal papers and then used Photoshop to remove the trend lines so as not to influence my perception of the data — it is easy if they are colored lines.

Thanks for Reading.

# # # # #

Reply to  Kip Hansen
August 6, 2019 4:47 pm

To paraphrase Benjamin Disraeli…

There are lies.
There are damn lies
And there are people who don’t understand statistics.

😉

n.n
August 6, 2019 4:32 pm

Lost or created information. Inference, certainly, but inference from a limited, circumstantial set of data, is especially prone to misinterpretation and interpretation.

August 6, 2019 4:46 pm

Kip ==> Good job. You pulled up one of the three types of alarmist “science” (as I classify them).

This is a Type One: Take your data set concerning a bad thing and plot a trend line (it doesn’t matter whether it is linear or some other kind). Report your mathematically valid, but in reality meaningless trend as something to be alarmed about (exaggeration is sometimes necessary in your graphical presentation, as here, but not always). Summarized as “This bad thing is increasing, be afraid. Very, very afraid. (Send us money.)”

Type Two is: Take your Type One trend line and correlate it with another thing that you think is bad. Whether it is the burning of fossil fuels, or consumption of sugary soft drinks, or eating asparagus – you can probably find a correlation between the acknowledged bad thing and what you want other people to agree with you is bad.

Type Three, of course, is: “Alter the contents of one (or both) data sets as necessary to create the required correlation between acknowledged bad thing and what I think is a bad thing.

The first two are mathematically defensible, and even when deceptively presented (as in your source study) can be detected by a thinking and reasonably well educated person. The third entirely fraudulent alarmism is where we have real trouble.

Editor
Reply to  Writing Observer
August 6, 2019 5:42 pm

Writing Observer ==> Very nice — we see a lot of that!

Reply to  Kip Hansen
August 6, 2019 9:31 pm

Meant to add to that comment, but it was rather long already…

The disease in question (um, your hiding didn’t work all that well) – all of those incidence changes are completely due to medical advances.

The younger cohorts show an increasing “trend” thanks to a lowering of the cost of a biochemical test for the disease markers. Insurers (whether private or the government run one in Canada) have hit the “magic point” where paying for wider deployment of the test is equal to or less than treatment for the small number of cases that were previously missed. (The test is only indicative, not conclusive – but it allows the insurer to target a much smaller group for more expensive testing).

In the older cohorts, the drop is thanks to an outpatient surgical procedure that detects and removes the disease precursors and is also becoming more routine. (Again, having the precursors doesn’t absolutely mean that you will develop the disease – but removing them vastly increases the odds that you will not. If you are in one of those older age cohorts, talk to your doctor!)

Prjindigo
August 6, 2019 5:07 pm

Additionally I’d point out that a straight line isn’t a “trend” at all, it’s a slope that’s been forced by the compilation of anecdotes because the margin of error for each datum is not only different but differs depending on sensitivity and response of a device.

In atmospherics that means that a spurious electrical discharge suddenly becomes a 135°F “high” for the day.

There is no on-going margin of error study for ANY of the sensing devices used in meteorological data collection. That means their readings are no more accurate than the old 2°F marked thermometers in boxes covered with snow. Their margin of error was +- 2.5F+(.1F per year of use) and many were used until they broke… making history colder due to diminishing heat response.

Editor
Reply to  Prjindigo
August 6, 2019 5:44 pm

Prjindigo ==> Do yo have a handy reference for that last bit “Their margin of error was +- 2.5F+(.1F per year of use) and many were used until they broke…”? If so, can you post a link or a cite? thanks….

Geoff Sherrington
August 6, 2019 5:41 pm

Kip,
Thank you for this much needed essay.
Relatedly, another failure that needs to be stressed is the improper use, or lack of use, of statistics that demonstrate error bounds.
For over a year now, I have been trying to extract from our BOM an estimate of the uncertainty of measurement for routing daily Tmax temperatures. I have even simplified my ask to frame it as “How far apart should temperatures from 2 stations be so that their difference is not because of statistical noise, but can be stated confidently as a real difference?” or words to that effect.
The BOM gives me semantic lectures with few useful figures. I think that some of their authors, like many others in the climate domain, simply do not know enough of the basics of statistics, errors and uncertainty to give a useful response.
The subjects of your essay are intertwined with error measurement, so there are two problems needing repair.
Geoff S

Editor
Reply to  Geoff Sherrington
August 6, 2019 5:49 pm

Geoff Sherrington ==> Thanks, Geoff. Yes, the failure to report, and often to even consider, real measurement error is huge for human reported temperatures — and needs to be considered scientifically for Automated Weather stations.

I have done a couple of pieces on measurement error — which the stats-men complain about (they seem to believe that stats will make measurement error disappear.).

The latest trick is anomalization of data sets, pretending that by reducing the data set to anomalies of means that they eliminate (by an order of magnitude) the original measurement error.

Steven Mosher
August 6, 2019 6:37 pm

Nice Kip

I will expect less grief here when I remind people that the data is the data and the trend is in the model selected, not the data

Reply to  Steven Mosher
August 6, 2019 10:24 pm

Steven Mosher … at 6:37 pm
…the data is the data…

Except when 40% of the data gets re-written every month.
https://postimg.cc/MnHTSH7x