A condensed version of a paper entitled: “Violating Nyquist: Another Source of Significant Error in the Instrumental Temperature Record”.

By William Ward, 1/01/2019

The 4,900-word paper can be downloaded here: https://wattsupwiththat.com/wp-content/uploads/2019/01/Violating-Nyquist-Instrumental-Record-20190112-1Full.pdf

The 169-year long instrumental temperature record is built upon 2 measurements taken daily at each monitoring station, specifically the maximum temperature (Tmax) and the minimum temperature (Tmin). These daily readings are then averaged to calculate the daily mean temperature as Tmean = (Tmax+Tmin)/2. Tmax and Tmin measurements are also used to calculate monthly and yearly mean temperatures. These mean temperatures are then used to determine warming or cooling trends. This “historical method” of using daily measured Tmax and Tmin values for mean and trend calculations is still used today. However, air temperature is a signal and measurement of signals must comply with the mathematical laws of signal processing. The Nyquist-Shannon Sampling Theorem tells us that we must sample a signal at a rate that is at least 2x the highest frequency component of the signal. This is called the Nyquist Rate. Sampling at a rate less than this introduces aliasing error into our measurement. The slower our sample rate is compared to Nyquist, the greater the error will be in our mean temperature and trend calculations. The Nyquist Sampling Theorem is essential science to every field of technology in use today. Digital audio, digital video, industrial process control, medical instrumentation, flight control systems, digital communications, etc., all rely on the essential math and physics of Nyquist.

NOAA, in their USCRN (US Climate Reference Network) has determined that it is necessary to sample at 4,320-samples/day to practically implement Nyquist. 4,320-samples/day equates to 1-sample every 20 seconds. This is the practical Nyquist sample rate. NOAA averages these 20-second samples to 1-sample every 5 minutes or 288-samples/day. NOAA only publishes the 288-sample/day data (not the 4,320-samples/day data), so to align with NOAA the rate will be referred to as “288-samples/day” (or “5-minute samples”). (Unfortunately, NOAA creates naming confusion with their process of averaging down to a slower rate. It should be understood that the actual rate is 4,320-samples/day.) This rate can only be achieved by automated sampling with electronic instruments. Most of the instrumental record is comprised of readings of mercury max/min thermometers, taken long before automation was an option. Today, despite the availability of automation, the instrumental record still uses Tmax and Tmin (effectively 2-samples/day) instead of a Nyquist compliant sampling. The reason for this is to maintain compatibility with the older historical record. However, with only 2-samples/day the instrumental record is highly aliased. It will be shown in this paper that the historical method introduces significant error to mean temperatures and long-term temperature trends.

NOAA’s USCRN is a small network that was completed in 2008 and it contributes very little to the overall instrumental record. However, the USCRN data provides us a special opportunity to compare a high-quality version of the historical method to a Nyquist compliant method. The Tmax and Tmin values are obtained by finding the highest and lowest values among the 288 samples for the 24-hour period of interest.

 

NOAA USCRN Examples to Illustrate the Effect of Violating Nyquist on Mean Temperature

The following example will be used to illustrate how the amount of error in the mean temperature increases as the sample rate decreases. Figure 1 shows the temperature as measured at Cordova AK on Nov 11, 2017, using the NOAA USCRN 5-minute samples.

clip_image002

Figure 1: NOAA USCRN Data for Cordova, AK Nov 11, 2017

The blue line shows the 288 samples of temperature taken that day. It shows 24-hours of temperature data. The green line shows the correct and accurate daily mean temperature that is calculated by summing the value of each sample and then dividing the sum by the total number of samples. Temperature is not heat energy, but it is used as an approximation of heat energy. To that extent, the mean (green line) and the daily-signal (blue line) deliver the exact same amount of heat energy over the 24-hour period of the day. The correct mean is -3.3 °C. Tmax is represented by the orange line and Tmin by the grey line. These are obtained by finding the highest and lowest values among the 288 samples for the 24-hour period. The mean calculated from (Tmax+Tmin)/2 is shown by the red line. (Tmax+Tmin)/2 yields a mean of -4.7 °C, which is a 1.4 °C error compared to the correct mean.

Using the same signal and data from Figure 1, Figure 2 shows the calculated temperature means obtained from progressively decreased sample rates. These decreased sample rates can be obtained by dividing down the 288-sample/day sample rate by a factor of 4, 8, 12, 24, 48, 72 and 144. Therefore, the sample rates will correspond to: 72, 36, 24, 12, 6, 4 and 2-samples/day respectively. By properly discarding the samples using this method of dividing down, the net effect is the same as having sampled at the reduced rate originally. The corresponding aliasing that results from the lower sample rates, reveals itself as shown in the table in Figure 2.

clip_image004

Figure 2: Table Showing Increasing Mean Error with Decreasing Sample Rate

It is clear from the data in Figure 2, that as the sample rate decreases below Nyquist, the corresponding error introduced from aliasing increases. It is also clear that 2, 4, 6 or 12-samples/day produces a very inaccurate result. 24-samples/day (1-sample/hr) up to 72-samples/day (3-samples/hr) may or may not yield accurate results. It depends upon the spectral content of the signal being sampled. NOAA has decided upon 288-samples/day (4,320-samples/day before averaging) so that will be considered the current benchmark standard. Sampling below a rate of 288-samples/day will be (and should be) considered a violation of Nyquist.

It is interesting to point out that what is listed in the table as 2-samples/day yields 0.7 °C error. But (Tmax+Tmin)/2 is also technically 2-samples/day with an error of 1.4°C as shown in the table. How can this be possible? It is possible because (Tmax+Tmin)/2 is a special case of 2-samples per day because these samples are not spaced evenly in time. The maximum and minimum temperatures happen whenever they happen. When we sample properly, we sample according to a “clock” – where the samples happen regularly at exactly the same time of day. The fact that Tmax and Tmin happen at irregular times during the day causes its own kind of sampling error. It is beyond the scope of this paper to fully explain, but this error is related to what is called “clock jitter”. It is a known problem in the field of signal analysis and data acquisition. 2-samples/day, regularly timed, would likely produce better results than finding the maximum and minimum temperatures from any given day. The instrumental temperature record uses the absolute worst method of sampling possible – resulting in maximum error.

Figure 3 shows the same daily temperature signal as in Figure 1, represented by 288-samples/day (blue line). Also shown is the same daily temperature signal sampled with 12-samples/day (red line) and 4-samples/day (yellow line). From this figure, it is visually obvious that a lot of information from the original signal is lost by using only 12-samples/day, and even more is lost by going to 4-samples/day. This lost information is what causes the resulting mean to be incorrect. This figure graphically illustrates what we see in the corresponding table of Figure 2. Figure 3 explains the sampling error in the time-domain.

clip_image006

Figure 3: NOAA USCRN Data for Cordova, AK Nov 11, 2017: Decreased Detail from 12 and 4-Samples/Day Sample Rate – Time-Domain

Figure 4 shows the daily mean error between the USCRN 288-samples/day method and the historical method, as measured over 365 days at the Boulder CO station in 2017. Each data point is the error for that particular day in the record. We can see from Figure 4 that (Tmax+Tmin)/2 yields daily errors of up to ± 4 °C. Calculating mean temperature with 2-samples/day rarely yields the correct mean.

clip_image008

Figure 4: NOAA USCRN Data for Boulder CO – Daily Mean Error Over 365 Days (2017)

Let’s look at another example, similar to the one presented in Figure 1, but over a longer period of time. Figure 5 shows (in blue) the 288-samples/day signal from Spokane WA, from Jan 13 – Jan 22, 2008. Tmax (avg) and Tmin (avg) are shown in orange and grey respectively. The (Tmax+Tmin)/2 mean is shown in red (-6.9 °C) and the correct mean calculated from the 5-minute sampled data is shown in green (-6.2 °C). The (Tmax+Tmin)/2 mean has an error of 0.7 °C over the 10-day period.

clip_image010

Figure 5: NOAA USCRN Data for Spokane, WA – Jan13-22, 2008

 

The Effect of Violating Nyquist on Temperature Trends

Finally, we need to look at the impact of violating Nyquist on temperature trends. In Figure 6, a comparison is made between the linear temperature trends obtained from the historical and Nyquist compliant methods using NOAA USCRN data for Blackville SC, from Jan 2006 – Dec 2017. We see the trend derived from the historical method (orange line) starts approximately 0.2 °C warmer and has a 0.24 °C/decade warming bias compared to the Nyquist compliant method (blue line). Figure 7 shows the trend bias or error (°C/Decade) for 26 stations in the USCRN over a 7-12 year period. The 5-minute samples data gives us our reference trend. The trend bias is calculated by subtracting the reference from the (Tmaxavg+Tminavg)/2 derived trend. Almost every station exhibits a warming bias, with a few exhibiting a cooling bias. The largest warming bias is 0.24 °C/decade and the largest cooling bias is -0.17 °C/decade, with an average warming bias across all 26 stations of 0.06C. According to Wikipedia, the calculated global average warming trend for the period 1880-2012 is 0.064 ± 0.015 °C per decade. If we look at the more recent period that contains the controversial “Global Warming Pause”, then using data from Wikipedia, we get the following warming trends depending upon which year is selected for the starting point of the “pause”:

1996: 0.14°C/decade

1997: 0.07°C/decade

1998: 0.05°C/decade

While no conclusions can be made by comparing the trends over 7-12 years from 26 stations in the USCRN to the currently accepted long-term or short term global average trends, it can be instructive. It is clear that using the historical method to calculate trends yields a trend error and this error can be of a similar magnitude to the claimed trends. Therefore, it is reasonable to call into question the validity of the trends. There is no way to know for certain, as the bulk of the instrumental record does not have a properly sampled alternate record to compare it to. But it is a mathematical certainty that every mean temperature and derived trend in the record contains significant error if it was calculated with 2-samples/day.

clip_image012

Figure 6: NOAA USCRN Data for Blackville, SC – Jan 2006-Dec 2017 – Monthly Mean Trendlines

clip_image014

Figure 7: Trend Bias (°C/Decade) for 26 Stations in USCRN

Conclusions

1. Air temperature is a signal and therefore, it must be measured by sampling according to the mathematical laws governing signal processing. Sampling must be performed according to The Nyquist Shannon-Sampling Theorem.

2. The Nyquist-Shannon Sampling Theorem has been known for over 80 years and is essential science to every field of technology that involves signal processing. Violating Nyquist guarantees samples will be corrupted with aliasing error and the samples will not represent the signal being sampled. Aliasing cannot be corrected post-sampling.

3. The Nyquist-Shannon Sampling Theorem requires the sample rate to be greater than 2x the highest frequency component of the signal. Using automated electronic equipment and computers, NOAA USCRN samples at a rate of 4,320-samples/day (averaged to 288-samples/day) to practically apply Nyquist and avoid aliasing error.

4. The instrumental temperature record relies on the historical method of obtaining daily Tmax and Tmin values, essentially 2-samples/day. Therefore, the instrumental record violates the Nyquist-Shannon Sampling Theorem.

5. NOAA’s USCRN is a high-quality data acquisition network, capable of properly sampling a temperature signal. The USCRN is a small network that was completed in 2008 and it contributes very little to the overall instrumental record, however, the USCRN data provides us a special opportunity to compare analysis methods. A comparison can be made between temperature means and trends generated with Tmax and Tmin versus a properly sampled signal compliant with Nyquist.

6. Using a limited number of examples from the USCRN, it has been shown that using Tmax and Tmin as the source of data can yield the following error compared to a signal sampled according to Nyquist:

a. Mean error that varies station-to-station and day-to-day within a station.

b. Mean error that varies over time with a mathematical sign that may change (positive/negative).

c. Daily mean error that varies up to +/-4°C.

d. Long term trend error with a warming bias up to 0.24°C/decade and a cooling bias of up to 0.17°C/decade.

7. The full instrumental record does not have a properly sampled alternate record to use for comparison. More work is needed to determine if a theoretical upper limit can be calculated for mean and trend error resulting from use of the historical method.

8. The extent of the error observed with its associated uncertain magnitude and sign, call into question the scientific value of the instrumental record and the practice of using Tmax and Tmin to calculate mean values and long-term trends.

Reference section:

This USCRN data can be found at the following site: https://www.ncdc.noaa.gov/crn/qcdatasets.html

NOAA USCRN data for Figure 1 is obtained here:

https://www1.ncdc.noaa.gov/pub/data/uscrn/products/subhourly01/2017/CRNS0101-05-2017-AK_Cordova_14_ESE.txt

NOAA USCRN data for Figure 4 is obtained here:

https://www1.ncdc.noaa.gov/pub/data/uscrn/products/daily01/2017/CRND0103-2017-AK_Cordova_14_ESE.txt

NOAA USCRN data for Figure 5 is obtained here:

https://www1.ncdc.noaa.gov/pub/data/uscrn/products/subhourly01/2008/CRNS0101-05-2008-WA_Spokane_17_SSW.txt

NOAA USCRN data for Figure 6 is obtained here:

https://www1.ncdc.noaa.gov/pub/data/uscrn/products/monthly01/CRNM0102-SC_Blackville_3_W.txt

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
5 1 vote
Article Rating
575 Comments
Inline Feedbacks
View all comments
Geoff Sherrington
January 15, 2019 12:07 am

Clyde Spencer January 14, 2019 at 5:25 pm noted “The utility of the Fourier Transform is that ANY varying signal can be decomposed … ”

There are various ways to decompose signals. Just for fun, what do you make of these semivariograms from a geostatistical approach, using time as a proxy for distance (since weather systems move with time) with temperatures taken half-hourly, daily, monthly and yearly; and aggregated in the standard BOM way for recent years, with 1 second glimpses etc.
http://www.geoffstuff.com/semiv_time_bases.xlsx
Fourier gets boring after a while. Not every wave is a sine wave, though it can be decomposed.

Geoff.

Editor
January 15, 2019 2:19 am

Great discussion. The entire subject of signal theory and its application to climate data is often overlooked, particularly in the integration of signals with wide frequency differences.

All time-variant sequences of numbers are signals and subject to the principles of signal theory and processing.

William Ward
Reply to  David Middleton
January 16, 2019 3:13 pm

Thanks David for the validation. Some readers seem to think that the ideas around sampling theory are novel. They are not. They are standard fare everywhere except climate science.

January 15, 2019 2:35 am

It seems to me that the only result here that bears at all on what is actually done with the data – calculating monthly averages – is the comparison of trends. But all it says is that if you calculate the trends two different ways, you get two different results. Of course. But are they significantly different, so you could deduce something from that. No information is supplied.

So I calculated some. My dataset finishes in Oct 2017 – NOAA is off the air at the moment. I first restricted to USCRN stations in CONUS with at least ten years of data. There were 109. As with Blackville above, the trends were high. The mean for min/max was 13.69°/Cen; the mean for integrated was 14.19°C/Cen. That is a difference of -0.5 opposite to the sign of the 26 stations in this article.

But the sd of the trend differences was 8.62, so the sd of the mean would be expected to be about 0.8 °C/cen. If I restrict to 11 years of data, 83 stations qualify, and the difference of means was -0.86°C/cen.

This seems to be just normal random variation. The mean difference is less than one standard error from zero. There is nothing significant there.

Scott W Bennett
Reply to  Nick Stokes
January 15, 2019 6:50 am

It is discussed in the literature that the bias between true monthly mean temperature (Td0) – defined as the intergral of the continuous temperature measurements in a month – and the monthly average of Tmean (Td1) is very large in some places and cannot be ignored. (Brooks, 1921; Conner and Foster, 2008; Jones et al., 1999*)

Wang (2014) compared the multiyear averages of bias between Td1 and Td0 during cold seasons and warm seasons and found that the multi‐year mean bias during cold seasons in arid or semi‐arid regions could be as large as 1 °C.

See my comment above for more detail.

* Jones et al. recognised that there is a difference between the two.

Richard Linsley Hood
January 15, 2019 3:49 am

Strictly Nyquist applies in both time and space. Thus there are not only the inaccuracies that come from (min + max) / 2 but also the spacial sampling from non-linear placement of the measuring stations.

As any engineer knows, a suitable anti-alias filter ahead of the sampling engine will reduce the aliasing errors and the simplest way, as has been suggested above, is to add a suitable mass around the thermometer to integrate the signal before sampling.

In fact this anti-alias signal IS available in lots of stations, just use the below ground thermometer values that some stations provide. Depending on how far below the surface the signal is taken a larger and larger anti-alias filter can then be provided. A daily sampling of that signal will then provide a much more accurate value for tAvg at that station.

Of course this will not help compensate for the well known spacial errors that come from station placement horizontally and vertically.

William Ward
Reply to  Richard Linsley Hood
January 16, 2019 3:18 pm

Richard, thanks for the validation and additional comments. Spatial aliasing is extremely important as well. Aliasing of video is a good analogy for that 2 dimensional effect. Antarctica is as large as the US + 2 Mexico’s. 1000 times more thermal energy locked up in the ice below 0C as the atmosphere has above 0C, yet we have only 26 stations there. How many from the US in the datasets? Percentage wise the weighting of the US is massive and Antarctic is microscopic.

Paramenter
January 15, 2019 3:53 am

Hey George,

William, it is Tmin/Tmax data. It is not “bad data”.

Of course is not. Simply, resolution of those records is not sufficient enough to draw firm conclusions about for example, minor variations in the trends. Also, error due to aliasing adds to another errors due to imperfect measurements, rounding and uncertainty errors and so on. It’s not the sole error associated with historical records but must be taken into account as well.

January 15, 2019 3:57 am

Well, someone will tell me the temperature of the next day, Nov. 12, starts at -1° instead of at the start of Nov. 11, -9°. This would be a temperature difference of 8° from one day to the next. This is really a nice condition to show … em, whatever.

Seldom saw such a nonsense here at WUWT.

Tom Johnson
January 15, 2019 4:31 am

The authors seem to be missing several important nuances of digital sampling theory. For starters, the Nyquist sample rate criterion of “more than two times the highest frequency” involves assumptions about the statistical nature of the data, the frequency content of the data, and the mathematical techniques used for processing the data. It precludes determining a precise value for the high and low “peaks”. If you wish to determine a precise value for a peak, a sample rate of 20 times the highest frequency involved will only give you a value within about 1% of the actual peak. Conversely, if the data are random, stationary, and ergodic, using the 2x criterion will indeed give you an accurate value for the mean, PROVIDING SUFFICIENT DATA ARE ANALYZED TO REMOVE RANDOM ERRORS. It will never give you a valid answer for the mean within in a single day. In addition, aliasing errors are not lost, they are just shifted in frequency, which can obviously create other errors.

Unfortunately, daily temperature data are neither random, nor stationary, nor ergodic. And, even more important, averaging the data from any single day amongst dozens of others defeats the whole concept of highs and lows within a single day. Thus the whole idea of using Nyquist sampling for statistics on data within a single day is mostly irrelevant. The daily high often occurs mid afternoon, and the daily low early morning. However, that is not always true. Sometimes the daily high occurs at midnight, and it may also be the daily low for the next day. Or vise versa. In addition, the thermal mass of a thermometer bulb is much higher than the thermal mass of the influencing air surrounding it, making the thermometer a somewhat effective first order anti-aliasing filter. This may be either good, or bad, as the thermal mass values of the temperature transducer can vary widely and are mostly unknown in the historical record.

The historical record is what it is. It is clearly important to make the current data more accurate. Including digital sampling theory in the data recording will certainly improve that. However, the proper method of appending new data to the historical record using different sampling techniques is just another part of climate science that is not yet “settled”.

Paramenter
Reply to  Tom Johnson
January 15, 2019 5:58 am

Hey Tom,

Thus the whole idea of using Nyquist sampling for statistics on data within a single day is mostly irrelevant.

My understanding is that problems with Nyquist starts even before you can do any statistics – at the data acquisition step. Daily midrange does not allow you to replicate the daily signal, nowhere near. Usual response to it is we don’t really need that. Crude daily averaging is all we need for monthly and yearly averages so we can happily accept errors due to undersampling of daily signal. Fine, that may be perfectly OK for all practical applications. But, by the way, using those crude averages we also want to do fine analysis with the precision down to thousandths of C. Well, in such case you will need high accuracy signal. Thus, you cannot have it both ways. For me, that’s the crux of the problem.

Problems due to spatial and to lesser extent temporal aliasing of the temperature records are discussed in the mainstream literature so looks like aliasing of such records is relevant.

Tom Johnson
Reply to  Paramenter
January 15, 2019 6:55 am

I agree with all you said. The point I was trying to make about the Nyquist criterion being irrelevant is that if you need a sample rate 20 times the highest frequency of interest to accurately capture a peak, Nyquist at twice the highest frequency is clearly not adequate.

It’s always best to apply the most up-to-date techniques when acquiring data, including sampling and anti-aliasing. It’s not stated if USCRN does or does not do that with their sampling and down sampling techniques. For example, it’s best to use constant delay anti-aliasing filters (such as Bessel) if you wish to capture peaks (like daily highs and lows), but higher roll-off filters (Such as Butterworth) for statistics such as means. It’s not stated what, if any, USCRN recommends. It’s never good to assume that your transducer is a good antialiasing filter, since other noises might be present in the data beyond the actual signal being measured. I always go by the old adage: “It’s easy to read a thermometer, it’s quite difficult to measure temperature.”

Bright Red
Reply to  Tom Johnson
January 15, 2019 1:59 pm

“The point I was trying to make about the Nyquist criterion being irrelevant is that if you need a sample rate 20 times the highest frequency of interest to accurately capture a peak, Nyquist at twice the highest frequency is clearly not adequate.”

Sorry Tom but that is not correct. Meeting Nyquist allows the waveform to be accurately reconstructed from which the min and max values can be determined. You do not need to physically sample them.

William Ward
Reply to  Bright Red
January 15, 2019 10:25 pm

That’s right Bright Red – thanks for confirming it.

Reply to  Paramenter
January 15, 2019 7:57 pm

You’re banging close to the problem.

Measurement error of historical readings are also of importance in determining if you can really do arithmetic on data and come up with accuracies out to thousandths of a degree.

Here is a simple thought experiment.

1. I measure temperature each day for 100 days and each reading is 50 degrees.
2. Each reading is accurate to +- 0.5 degrees.
3. Each reading is rounded to the nearest degree.

A. What is the mean?
B. What is the uncertainty of the mean?
C. Do you know the probability of each 0.1 degree between 49.5 and 50.5 for each day?
D. If you don’t know the probability distribution, then each tenth or even one hundredth of a degree is just as likely as another.

What does this do to the accuracy of the mean? The mean is 50 +- 0.5 degrees. You can do all the averaging and statistics you want and try using the uncertainty of the mean to convince folks that your arithmetic lets you get better and better accuracy but you’re only kidding yourself. Draw it out on a graph. Each and every day, you are going to have three points, the recorded temp plus 0.5 and the recorded temp minus 0.5. Since you have no way to know what the real temp was on each day, you are going to end up with three lines, one at 50.5, one at 50.0 (the mean), and one at 49.5.

The only thing you’ll know for sure is that the average temp is somewhere between those lines.
You can average for months, years, decades, or centuries and you just won’t be able to tell anyone that you know the temperature any more accurately than +- 0.5 degrees.

You can carry out arithmetic to the ten thousandths place but all you’ll be doing is calculating noise. Know why? Because you are measuring different things each and every time you make a measurement. You are not measuring the same thing multiple times.

Frank
Reply to  Tom Johnson
January 16, 2019 10:01 pm

Tom: Given that daily temperature is driven by the sun, the only significant frequency constantly present in continuous temperature data is going to have a period of 24 hours and longer (365 days).

The diurnal cycle in SSTs is so low, we don’t even pay attention to the minimum or maximum temperature.

Reply to  Frank
January 16, 2019 11:02 pm

Frank, that’s simply not true. I posted a periodogram twice showing that there is a significant frequency at both 12 hours and 8 hours. Here it is again.

w.

Steve O
January 15, 2019 4:45 am

The average of two readings gives a metric that is different from the average of 288 readings. There is a pattern to how a day warms up and cools down and two readings (rounded to the nearest whole number) means you don’t know much about the actual temperature on any particular day. However, nobody really cares about the temperature of any particular day. What if we are measuring the average temperature in a year? If 288 readings in a day can tell you something useful about the average temperature in a day, why wouldn’t 365 readings tell you something about the average temperature in a year?

Also, I can accept that there is a difference between the “actual mean temperature” as measured with 288 measurements is different from the employed metric of taking two readings, but as long as that difference does not change over time, I don’t see why you can’t use the employed metric to measure changes in the average temperature from one decade to the next.

Steve O
Reply to  Steve O
January 15, 2019 8:14 am

Put another way:

If instead of taking a temperature measurement 288 times a day, I divided the day into 365 intervals. At each interval I took the high and the low for the interval and divided by two. From this data, could I not know the average temperature for the day?

Clyde Spencer
Reply to  Steve O
January 15, 2019 12:49 pm

Steve,
You asked, “.. but as long as that difference does not change over time, I don’t see why you can’t use the employed metric to measure changes in the average temperature from one decade to the next.”

I think that there are two answers to your question. First, if you treat the averages of the mid-range values as an index, then you can say something about trends. But, the unanswered question is, “Just what is it that you can say?”

Second, as Ward demonstrated, the mid-range values can be and often are very different (+ & -) from the robust metric, the mean. That means that the error bars for the monthly and annual means are much larger than if a true daily mean were used. That larger error should be acknowledged, and that implies reducing the claimed precision of the values making up long-term trends.

We only have the historical data that we have, but we need to be honest about the accuracy and precision of the analysis of that data. A clue to that is given by a plot of the annual frequency distribution of global temperatures: what is shown is a skewed distribution where the range implies a standard deviation of tens of degrees. That does not support claims of an annual mean temperature known to hundredths (NOAA) or thousandths (NASA) of a degree Celsius.

Steve O
Reply to  Clyde Spencer
January 15, 2019 2:49 pm

Clyde, thank you. It’s an important point that the min plus max and divide by two can only provide an index. Back when they started recording the data I imagine they never realized its future use. I’m still trying to get a mental grasp on how important the differences are since large sampling errors have a way of getting cutting them down to size with large sample sizes.

But you’re right that we should not get too excited about hundredths of a degree. At that level, there are other errors to worry about.

Clyde Spencer
Reply to  Steve O
January 16, 2019 1:42 pm

Steve O
You said, “At that level, there are other errors to worry about.”

Indeed! When NOAA reports an annual global anomaly to be so many hundredths above the preceding year, or NASA reports it to thousandths, the reader assumes that it is highly accurate and known precisely. Whereas, neither may be the case! But, when the changes are at that order of magnitude, the only way it can be made to appear to be a crisis is to report the very small numbers, without error bars, and hope that the readers believe that the numbers are reliable.

William Ward
Reply to  Clyde Spencer
January 16, 2019 3:28 pm

Clyde, my ears thank you for the music!

Reply to  Clyde Spencer
January 17, 2019 8:37 am

Without error bars is the key!

Clyde Spencer
Reply to  Clyde Spencer
January 18, 2019 9:32 am

William
Happy to be your muse! 🙂 Would it were that alarmists had the same taste in music.

Solomon Green
January 15, 2019 6:12 am

When I suggested on WUWT, some years ago, that for a continuous function, such as temperature, (Tmax + Tmin)/2 did not equal Tmean and that with modern instrumentation we could get a more accurate figure for Tmean, Steven Mosher put me in my place by writing that everyone knew that but since these estimates had always been used in the past climate scientist must continue to use them for comparison purposes.
I am looking forward to his contribution on this thread.

Steve O
Reply to  Solomon Green
January 15, 2019 8:20 am

I suspect his point would be the same. The two methods will result in different values, but as long as the difference is constant we can use either method to compare temperature trends over time.

Yes, it’s an assumption that the difference has remained the same over time and there is a risk to it. The best we can do now is to compare the difference as it is today with the difference as when continual measurements first started.

Editor
January 15, 2019 6:59 am

William ==> Well done!

Scott W Bennett
January 15, 2019 7:02 am

I guess I may have missed something as Tmean seems to me to be an absurd way to measure daily average temperature!

Someone help me here, are we really saying that adding the max and min and dividing by two is meaningful in any way other than to provide a daily range? I do see that it might also serve the “comparative” purpose of setting normals though.

Surely this is a sampling problem alone and it is unnecessary to invoke Nyquist.

Tmean = (Tmax+Tmin)/2 is problematic to say the least.

It is not simply a matter of not knowing the time each happened – the “clock” but the duration is of vast importance!

It’s not hard to imagine a simple list of samples that make Tmean meaningless!

We’ve had days here where the min was low say, -8 C but it remained at 13 all day before reaching 14 briefly then within an hour dropped back to 8 C degrees. The “true” mean for 12 observations is 10.9. I’m too nervous to calculate the Tmean because it seems so foolish and I would have sworn the actual mean was 13 because the low and high only occurred over an hour each for a total of 2 samples out of 12 while it was 13 C for 9 hours!

These are my raw thoughts, I’ve given a more considered comment from the literature above! 😉

Hugs
Reply to  Scott W Bennett
January 15, 2019 11:54 am

Tminmax/2 is so banal it comes as a big surprise to people who didn’t know about it.

It is not meaningless though, when you look at a large set of unbiased data to find trends. But how do we know the set is unbiased?

We don’t. New biases are found, there are an endless amount of biases in the data all very difficult to remove.

unka
January 15, 2019 7:57 am

” Nyquist-Shannon Sampling Theorem tells us that we must sample a signal at a rate that is at least 2x the highest frequency component of the signal”. – this is true but author forgot to mention that it apples to signal reconstruction. To estimate the average of the signal one can under sample the signal and get good estimates of the average. This obviously can be quantified. The author could have done with little bit of simulation that would show him that he is raising here a red herring.

January 15, 2019 8:11 am

With a degree in Math and Engineering, i think a little differently. I was always taught “Work is the Integral of the area under the curve.” That heat added – removed is WORK when you consider it is actually joules. The average of Tmax and Tmin is NOT the “Area under the curve.” It is not even a close approximation of the “Area under the curve.” Thus, IMHO, if you want a true representation of the heat added (work – joules) Nyquist Theorem applies. Otherwise aliasing of the data could produce trends that are not there. Also believe that putting this into a Digital computer [think computer model] amplifies the problem. Have seen this happen [aliasing] with computer models I developed for accident analysis on Nuclear Power Plants.

Reply to  Usurbrain
January 15, 2019 9:04 am

PS, How much of the lost heat is lost in the use of (Tmax + Tmin)/2 rather than “hiding in the ocean?”

Gums
Reply to  Usurbrain
January 15, 2019 12:58 pm

Thank you Brain!
My limited math skills did not allow me to express the underlyng aspect of “average temperature” and Gorebull warming numbers that scare many each month they are published. And there are other things to consider, huh?
– hysteresis for the hot adobe to cool at night and stay cool for early the next day
– radiative heat loss as I see up at 8,000 feet on a cool night, and I guess the desert folks see it more often

Gums sends…

Joe Campbell
January 15, 2019 11:40 am

To William Ward – Great article. Thanks…Joe

William Ward
Reply to  Joe Campbell
January 15, 2019 10:19 pm

Thank you very much Joe.

January 15, 2019 11:52 am

Ward
Sampling at a rate less than [the Nyquist limit) introduces aliasing error into our measurement.

I believe this is incorrect. My understanding of Fourier analysis is that performing a DFT on any set of measurements (i.e. thermometer readings) has no effect on the sample points (“measurements”) themselves, nor do I see any way it could have any effect on any subsequent measurements or standard statistical operations performed on those sets of measurements, up to _spectral analysis_.

Yes, “aliasing” can occur if you perform a DFT on samples of a time series of a “signal” containing frequencies above the Nyquist limit . These higher frequencies will be folded back, overlapping the band-limited domain, starting at the lowest frequency, distorting the resulting spectrum. (In the sense that the higher, out of limits, frequencies will be displayed at a much lower frequency. And if you perform Shannon interpolation (aka sinc interpolation) on an undersampled series, the resulting interpolation will display the overlapped spectra.

However, Parseval’s theorem will still hold, the total ‘energy’ (sum of squared absolute values) in the frequency domain will still be the same as the time domain.

This is because Parseval’s theorem is a _mathematical identity_ on trig functions, requiring no physical interpretation of what the signal represents. Also, DFT is an involution, meaning that it is its own inverse so IDFT(DFT(x)) = x. That means you get the same points back that you started with, even if undersampled or completely random.

If you don’t believe that, try this little experiment, reconstructs the original points and squared energy of a random time series, without any regard for Nyquist limits or physical meaning of the values.

N=4096
x = rand(N,1) + 1j * rand(N,1);
sum(abs(x).^2)
sum(abs(fft(x)).^2)/N

You should see output similar to this:
N = 4096
ans = 2783.0
ans = 2783.0

I say “similar”, because the last two numbers will not be exactly the same as above. This is because it is a randomly generated complex signal. But both values will be the same, showing that Parseval’s theorem holds even for “undersampled” data.

Furthermore, since the measurements themselves are not distorted by undersampling, any standard statistical operations performed on the measured values (mean, variance etc) will not change.

The only “trouble” you could get into is “believing” the frequency of the aliased signals. These signals are “real” in the sense of energy content, but their frequencies are just wrapped around the band. In fact, if there no other frequencies in the overwrap region, the aliasing can be fixed merely by unwrapping the aliased frequencies.

So, what am I missing here? Show me an example of actual measured temperatures (or computed means of same) which are somehow undersampled.

Reply to  Johanus
January 15, 2019 12:00 pm

… oops I meant to say”Show me an example of actual measured temperatures (or computed means of same) which are erroneous because they are undersampled”

Paramenter
January 15, 2019 11:58 am

Hey Jim:

I’m working on an essay about uncertainty of the averages and something I notice here is that this paper doesn’t address the errors in measurements, nor should it necessarily do so.

I reckon this paper concentrates on the errors in the temperature records due to aliasing. Those errors obviously add to other known errors as mentioned by you errors in measurements or rounding. Looking forward for your results!

HaM
January 15, 2019 2:12 pm

Lots of straw today. Nyquist is best understood as a statement about perfect reconstruction of the original signal. Sampling below the Nyquist frequency guarantees that you cannot. Careful sampling at or above the Nyquist frequency in theory allows perfect reconstruction, but in practice cannot – consider, for instance, the very high bandwidth noise in your electronic measurement device; you must exceed the Nyquist rate for that noise if you hope to measure and suppress the noise effects.
Nyquist is nearly silent on the impact on your results – consider sampling the signal that actually is the mean daily temperature once per day It’s easy to argue that the value varies over the day (global warming/cooling both require this to be true). Despite this, if you measure it once per day, you will have a perfect representation of the mean daily temperature. Of course, to do this you need to first agree that there is such a thing as the mean daily temperature and that it takes a single, discrete value each day. Suddenly, Nyquist doesn’t even apply (only continuous signals, not discrete – non-continuous signals have infinite bandwidth).
More, consider a large number of samples of a sum of zero-mean stationary random processes, with the sampling well below the Nyquist cutoff for some of the processes. Average the results. I think the result will be a stationary, zero-mean random process. In other words, you will get the correct result even though you ignored Nyquist. That doesn’t mean Nyquist was wrong, it just means you can’t predict the values you didn’t sample – in other words, no perfect reconstruction. Ignoring Nyquist doesn’t mean the average will be wrong, just that it could be and that time-dependent (as opposed to measured) estimates likely will be.

As several people point out mean != (max + min)/2. A fundamental problem is that mean daily temperature is not captured by the historical data. Worse, it is not a well defined quantity, and the vagueness of the definition has always seemed to be exploited to advance the climate change cause.

Dr. S. Jeevananda Reddy
January 15, 2019 4:15 pm

The differences also can be seen in rainfall. Take the manual raingauge data [total for 24 hour period] and hydrograph cumulative sum [second by second or minute by minute or hour by hour or two hour by two hour, etc]. They present differences. Why? depends upon several factors — wind is the main cause for differences.

Aerial sum is more complicated — In late 70s as a scientist at ICRISAT installed 54 raingauges around 3000 ha farm. On one day, I noticed the rainfall varied between 5 and 60 mm. The average is different from the met station rainfall.

Dr. S. Jeevananda Reddy

Dr. S. Jeevananda Reddy

Dr. S. Jeevananda Reddy
January 15, 2019 5:46 pm

Heatwaves and coldwaves are expressed by maximum and minimum temperature and not by average

Human comfort is expressed by averages of temperature along with the relative humidity and wind speed at a place — hourly temperatures, hourly wind speed and hourly relative humidity are used

0830 and 1730 ist observations [standard met observations at 3 & 12 GMT]: dry and wet bulb temperatures used to estimate relative humidity

00 and 12 GMT: upper air ballon observations are made for computation of water vapour in the atmosphere.

All these were put forth after discussions based on their local experiences by international meteorologists but not by people — you scratch my back and I scratch you back.

Lowest to highest: probability curve defines the values at different probability level, when mean of that data set coincides with median [50% probability value], the data is said to be following normal distribution — bell-shaped pattern. If not they are skewed distribution, here the probability estimates are biassed. To correct this, incomplete gamma is used to get the unbiassed estimates. Second by second or part of a second by part of second data set mean: check whether that follows the normal or skewed distribution?

Dr. S. Jeevananda Reddy

Tom Abbott
Reply to  Dr. S. Jeevananda Reddy
January 16, 2019 11:05 am

“Heatwaves and coldwaves are expressed by maximum and minimum temperature and not by average”

Yes, and isn’t that what the CAGW claims are all about. Alarmists say the 21st century’s temperatures are warmer than any time in the past, but that’s not what you see if you look at the record of maximum temperatures only(Tmax), where you see that the 1930’s were as warm or warmer than current temperatures.

Here’s a Tmax chart of the U.S.:

comment image

Let’s compare actual high temperatures to actual high temperatures if we are talking about “hotter and hotter”.

Editor
January 15, 2019 9:40 pm

In the research for this discussion I discovered an interesting fact. This is that the average hourly temperature changes for a wide variety of temperature stations have a very similar shape. I looked at five years of hourly records for the following temperature stations:

Vancouver
Portland
San.Francisco
Seattle
Los.Angeles
San.Diego
Las.Vegas
Phoenix
Albuquerque
Denver
San.Antonio
Dallas
Houston
Kansas.City
Minneapolis
Saint.Louis
Chicago
Nashville
Indianapolis
Atlanta
Detroit
Jacksonville
Charlotte
Miami
Pittsburgh
Toronto
Philadelphia
New.York
Montreal
Boston

Here is the average day at each of the locations:

As you can see … not much difference. It does make it obvious why (max+min)/2 is NOT an unbiased estimator of the true mean …

w.

Reply to  Willis Eschenbach
January 16, 2019 5:46 am

Since the error is consistent at around +0.1 sigma. You get just adjust the MIN-MAX mean to get the true mean. It does not seem to be a random error.

Reply to  Willis Eschenbach
January 16, 2019 6:29 am

Willis,
Just a WAG, but looks to me that the small difference in max+min/2 and true mean could be explained by the wave shape. There appears to be a longer slope at the bottom (colder), and a smoother curve at the top (warmer).

Clyde Spencer
Reply to  Usurbrain
January 18, 2019 9:52 am

Userbrain and Strangelove
I think that what we are looking at is an artifact of skewness in the data. All Willis’ examples are for mid-latitude cities in the Northern Hemisphere. I’m speculating that the error will be different for different parts of the world and different seasons. While the error may dance around the mean, I’m not sure that can be shown rigorously to be the case. At the very least, we know that the mid-range value is a biased estimator of the mean and should be regarded with suspicion.

Frank
Reply to  Willis Eschenbach
January 16, 2019 10:06 pm

Willis: Any sign of the hypothetical higher frequency signals present in this data?

Clyde Spencer
Reply to  Willis Eschenbach
January 18, 2019 9:44 am

Willis
Normally, I would say that the difference is negligible and can be ignored, at least for first order estimates. However, it is the practice of alarmists to quote average annual global anomaly differences to hundredths or thousandths of a degree, in a world where the temperature range may be as great as 300 deg F, demonstrates a need to be precise and look at the unstated assumptions and the names assigned to the measurements.

Bright Red
January 15, 2019 11:26 pm

As an electronic engineer (Mostly lurking on the sidelines at WUWT) I have to say I am disappointed at the general level of understanding of Nyquist and signal sampling and processing in general and I thank William for his efforts to bring this important topic to the attention of WUWT readers.

Slightly off topic but I would like to add another source of error is Electro Magnetic Interference ( EMI ). In today’s modern world there are many sources of RF interference man made an natural. I would question the immunity of the measuring equipment and its suitability for the task as in my experance laboratory instruments in particular rate very poorly in this area. Also does anybody know if site surveys are done to determine the background RF levels at the measurement sites.

William Ward
Reply to  Bright Red
January 16, 2019 3:46 pm

Hello Bright Red – It is good to have a fellow electronics engineer here! Thanks for the concept validation. I was hoping to bring something new to this field that has benefitted the rest of the technology world for decades. Some seem to appreciate it. But I’m disappointed at some who are so fast to reaffirm what they already know rather than learn something new. There are many, many things that could be studied that currently are not. You mention RFI/EMI. A good one. I think also about the entire data acquisition circuit design: Power supply accuracy, common-mode rejection, ripple, DC offset, drift, linearity, thermal linearity, etc., etc. When we are talking about a few hundredths of a degree C/decade then there are many tens of things that can be the cause of that. How much of the trends and records are from the instruments themselves. But climate science has found the miracle cure, if you get enough of the wrong stuff it magically becomes right.

Editor
January 16, 2019 12:06 am

William Ward January 15, 2019 at 10:09 pm

Willis,

At some point posts no longer offer the option to reply directly below, so I’m not sure where this is going to land in-line. This is in reply to your post where you say

“William, you did “write a book” … but nowhere in it did you answer my questions. So I’ll ask them again:”

I sent you a few more posts that do address your questions. I’ll give you time to read those and respond. Did you read my “book”? Sorry, it seems like we need to find out where we disconnect on the fundamentals. Some of your questions are answered in those posts. For your new questions…

William, thanks for all of your answers. I’m starting a new thread with this down at the bottom of the page. I believe I’ve read all your posts, and yet questions remain unanswered. For example, you seem to be saying that we need to sample every 5 minutes to get a reasonably accurate annual average … which I think makes no sense.

Willis said:

“Next, you seem to be overlooking the fact that Nyquist gives the limit for COMPLETELY RECONSTRUCTING a signal … but that’s not what we’re doing. We are just trying to determine a reasonably precise average. It doesn’t even have to be all that accurate, because we’re interested in trends more than the exact value of the average.”

Reply: Define “reasonably precise”. And do you mean accurate???

No, I mean precise. As to “reasonably precise”, we want to determine the decadal trend in temperature. I took a look at the Boston hourly temperature for five years. Short period, so wide range in trends from errors. Running a Monte Carlo analysis by randomly adding the monthly RMS error for Boston (0.53°C for hourly vs max/min), I get an average of 1000 trends that have an average that is the same as the true trend … with a standard deviation of the trend of 0.3°/year. Since this is a 5 year sample, a sample of e.g 50 years would have much smaller errors.

For example. The trend on 75 years of Anchorage temperatures is 0.218°C/decade. If we add three times the Boston error, a random error with an SD of 1.5°C, a larger random error than any of the 30 US cities I’ve studied, to every month of that 75 years of Anchorage data, a thousand random instances give an average trend of 0.218°C ± 0.01°C/decade … a meaninglessly small trend error, in other words.

Now, I can’t see the USCRN sites because of the shutdown, but I don’t find that the errors from the two-sample method are a deal-breaker by any means.

I say trend errors of 0.24C/decade are not accurate. Daily means of +4C error are not accurate. Monthly mean errors of 1.5C are not accurate. You can’t determine the actual average with 2-samples/day. You are welcome to add the max and min and divide by 2 but this won’t give you an accurate result in all or most cases. Did you look at my paper?? What about the table in Fig 7?? Its not about COMPLETELY RECONSTRUCTING the signal. Knowing you CAN reconstruct it tells you your samples actually mean something relative to your original signal.

We agree that (min+max)/2 gives poor answers … but it has nothing to do with Nyquist.

“Chaotic” issue: addressed separately. Why the demand to know the exact number needed before you let in the concept.

I fear I don’t understand this. YOU said we had to sample the temperature at a frequency 2X the “highest frequency”. I merely asked what the “highest frequency” is for chaotic temperature data. And if, as it appears, you don’t know the answer … then how can you claim that Nyquist is “violated”?

As to “letting in the concept”, I’ve known about the Nyquist limit for decades. What am I supposed to be “letting in”?

Research can be done to determine where the means converge to a specified limit. For example, once a higher sample rate doesn’t give you more than 0.1C or 0.05C or 0.01C difference you can stop.

Huh? Did you just say that? We can throw out Nyquist and just look at shorter periods until our results are good enough for the purpose? That’s what I’ve been saying. Did you look at my graph? The difference in daily mean between hourly samples and every five minutes is 0.04°C … so I’d say hourly data is totally adequate, and our work here is done …

The concept is what is important here. 2-samples are not enough to support the results that generate the alarm we see. If the magnitude of error I show from comparing 288-samples to 2-samples is not important then fine. Just stop bother me with daily headlines of the sky is falling. (Not you, Willis, those who do this.)

I fear you haven’t shown that the error makes a significant difference. Quoting extreme errors doesn’t help. On any typical dataset like the Anchorage dataset, random errors don’t significantly alter the trend. Yes, the errors are there … but they appear to be random. They are assuredly Gaussian normal. I took a look at 1882 days worth of daily errors for each of 30 cities. In all thirty cases, the Shapiro-Wilk normality test says the 1882 errors are Gaussian normal. Here are the daily errors for 30 US cities …

As you can see, the errors in general are not large, and the standard deviation of the errors is not large. So we’ve determined that the two-sample method sucks … but we don’t know that it is a difference that actually makes a difference.

You said:

“Finally, you still haven’t grasped the nettle—the problem with (max+min)/2 has NOTHING to do with Nyquist. Pick any chaotic signal. If you want to, filter out the high frequencies as you’d do for a true analysis. Sample it once every millisecond. Then pick any interval, take the highest and lowest values of the interval, and average them.

Will you get the mean of the interval? NO, and I don’t care if you sample it every nanosecond. The problem is NOT with the sampling rate. It’s with the procedure—(max+min)/2 is a lousy and biased estimator of the true mean, REGARDLESS of the sampling rate, above Nyquist or not.”

My reply: I’m confused by what you write… Are we even arguing here or agreeing… I can’t tell… I think we agree that the historical method is not good for accurate true mean calculation. But how do you prove that without sampling theory? I’m not suggesting we continue with (Tmax+Tmin)/2 so I’m not sure what you are saying. I’m saying do it like USCRN does and use all of the samples.

We agree that the historical method is a poor estimator of the true mean. However, we do NOT need either Nyquist or sampling theory to prove that. We just need to look at the difference between 5-minute, hourly, and two-sample results to know that two-sample results are the weak one.

And Nyquist doesn’t help, which is where I started this discussion. I read your paper and said you were right for the wrong reasons—right that (max+min)/2 is a poor estimator, wrong that it has anything at all to do with Nyquist. You can’t even tell us what the Nyquist limit is for temperature data … so what use is Nyquist?

Finally, my thanks to you for putting up a most fascinating post … always more to learn.

w.

Bright Red
Reply to  Willis Eschenbach
January 16, 2019 2:06 am

Hi Willis, I will have a go at the the upper frequency limit for Nyquist question when measuring temperature from a technical point of view. In the real world the temperature transducer itself acts as a low pass filter although not a particularly good one. So if you want all the information that the transducer can provide then its specifiacation will determine the Nyquist sampling rate which will be a bit higher than 2x the transducer specification unless additional analong filtering is provided before sampling. From this position of having met the Nyquist you can of course choose to throw away some of the information that the temperature transducer is capable of providing by further digital processing, how much will be determined by what you want from the data which of course could be different from someone else’s requirements. Or you can implement additional low pass analog filtering of the transducer signal, at a frequency that again you have determined will give you what you want from the data, prior to sampling it and in doing so throw away some information. Note that none of the above violates Nyquist as to do so will introduce issues as noted by William.
It seems that many years ago it was decided that two samples a day was adequate but as it turns out that while a reasonable and practical decision at the time given the technology available it was a decision that deliberately discarded information that could be useful now or in the future. So unless you or someone can come up with the frequency/sample rate that will cover all future needs I suggest we do the best we can now which means getting everything out of the transducers we have available while properly acknowledging the limitations in the min/max recordings as one thing is for sure if you have aliasing you have non recoverable issues. In the end we may not need the higher sample rate data now but better to have it just in case future generations do find it useful given the cost to do so is minimal. /sarc Now where is that piece of wet string I had lying around.

William Ward
Reply to  Willis Eschenbach
January 16, 2019 4:24 pm

Hi Willis,

Thanks for your words of appreciation and for your many contributions here on this. I enjoyed discussing this with you. We can probably leave the remaining open items in the “agree-to-disagree” category. If there is something that you really want to pursue I’ll join you, but I think we are good for now. Agree? I think well have more opportunity to interact on this and other good things in the near future. Thanks again!

WW

Geoff Sherrington
January 16, 2019 1:06 am

Two tears ago I wrote –
“People who use the historic land records of temperature, with a century or more based almost entirely on Tmax and Tmin measured by LIG thermometers in shelters, seem not to appreciate that they are not presented with a temperature that reflects the thermodynamic state of a weather site, but with a special temperature – like the daily maximum – that is set by a combination of competing factors.
Not all of these factors are climate related. Few of them can ever be reconstructed.
So it has to be said that the historic Tmax and Tmin, the backbones of land reconstructions, suffer from large and unrecoverable errors that will often make them unfit for purpose when purpose means reconstructing past temperatures for inputs into models of climate.
Tmax, for example, arises when the temperature adjacent to the thermometer switches from increasing to decreasing. The increasing component involves at least some of these:- incoming insolation as modified by the screen around the thermometer; convection of air outside and inside the screen allowing exposure to hot parcels; such convection as modified from time to time by acts like asphalt paving and grass cutting, changing the effective thermometer height above ground; radiation from the surroundings that penetrates necessary slots in the screen housing; radiation from new buildings if they are built.
On the other side of the ledger, the Tmin is set when the above factors and probably more are overcome by:- reduced insolation as the sun angle lowers; reduced radiation from clouds; reduction of radiation by shade from vegetation, if present; reduction of convective load by rainfall, if it happens; evaporative cooling of shelter and surroundings, if it is rained on at critical times.
It does not seem possible to model the direction and magnitude of this variety of effects, some of which need metadata that were never captured and cannot now be replicated. Some of these effects are one-side biased, others have some possibility of cancelling of positives against negatives, but not greatly. The factors quoted here are in general not amenable to treatment by homogenization methods currently popular. Homogenization applies more to other problems, such as rounding errors from F to C, thermometer calibration and reading errors, site shifts with measured overlap effects, changes to shelter paintwork, etc.
The central point is that Tmax is not representative of the site temperature as would be more the case if a synthetic black body radiator was custom designed to record temperatures at very fast intervals, to integrate heat flow over a day for a daily record with a maximum. Tmax is a special reading with its own information content; and that content can be affected by factors like the hot exhaust gas of a passing car. The Tmax that we have might not even reflect some or all of the UHI effect because UHI will generally happen at times of day that are not at Tmax time. And, given that the timing of Tmax can be set more by incidental than fundamental mechanisms, like time of cloud cover, corrections like TOBs for Time of Observation have no great meaning.
It seems that it is now traditional science, perceived wisdom, to ignore effects like these and to press on with the excuse that it is imperfect but it is all that we have.
The more serious point is that Tmax and Tmin are unfit for purpose and should not be used.”
Geoff.

Steve O
Reply to  Geoff Sherrington
January 16, 2019 10:26 am

The instruments and the methodology provide a metric that serves as a proxy. And it is important to understand that the measurements are just that — metrics serving as a proxy. The points you raise are valid, but I don’t believe they justify a conclusion that the metric is unfit.

Yes there are conditions that affect the readings artificially. If those conditions are relatively constant, then how do they affect the utility of the metric? If a parking lot gets built next to the instruments, and conditions change, then that’s a different story.

Reply to  Steve O
January 16, 2019 11:19 am

You may make the decision to use this data but then you must also acknowledge the large errors range that occurs when doing so. You can’t dismiss the errors by saying that the theory of large numbers or the calculation of error of mean make them go away!

The error range that you have will undoubtedly be larger than any trend you find meaning that you have no conclusion and that the data was really unfit for purpose.

Geoff Sherrington
January 16, 2019 1:49 am

Turning to the topic of the uses to which T data are put, one that has concerned me for years is this correlogram and similar ones from other studies.
http://www.geoffstuff.com/BEST_correlation.jpg
It is trite to remind readers that two straight lines inclined at 45 degrees to an axis will give a correlation coefficient of unity – but that result has little practical purpose. I am trying to strip away the low frequency parts of the separation response, to conduct correlations on the medium to high frequencies that have the bulk of the information content of interest.
From preliminary work using the geostatistical semivariogram approach, I am finding it hard to see pairs of stations more than (say) 300 km apart being useful T predictors for each other. This has consequences for those who construct elaborate adjustment procedures for homogenization, procedures that are wide open to subjective inputs and which have cited potential to affect estimates of global and regional warming/cooling rates over multi-decade time spans. Which is pretty much the name of the game for some like GISS.
Have other readers here worked with geostatistics in these ways?
The question is relevant to this post because all methods to clarify predictability requires proper understanding of accuracy and precision, including the consequences of Nyquist/Shannon sampling theory. Also, there are interesting conceptual overlaps between classical stats and geostats. Geoff.
http://www.geoffstuff.com/semiv_time_bases.xlsx

January 16, 2019 6:40 am

William,
You said:
“it is a mathematical certainty that every mean temperature and derived trend in the record contains significant error if it was calculated with 2-samples/day.”

If the error is random, the probability of positive and negative errors are equal. In large samples (n >> 30) the errors may cancel out. For example, you can get a smaller error in the annual mean using daily data with larger errors. The probability of this happening is given by the binomial distribution:
P (k, n, p) = n!/(k! (n – k)!) p^k (1 – p)^(n – k)

where: n = 365 daily data, k = 182 positive or negative errors, p = 0.5 probability of +/- error

tty
Reply to  Dr. Strangelove
January 16, 2019 7:11 am

“If the error is random”

That is a VERY big “if”

Reply to  tty
January 16, 2019 7:24 pm

tty January 16, 2019 at 7:11 am

“If the error is random”

That is a VERY big “if”

Actually, no, it is NOT an “if” at all. I looked at the daily errors from using hourly data for the calculation of the daily means. In EVERY city, the errors were Gaussian normal. Or as I said above but you didn’t read:

They are assuredly Gaussian normal. I took a look at 1882 days worth of daily errors for each of 30 cities. In all thirty cases, the Shapiro-Wilk normality test says the 1882 errors are Gaussian normal.

Regards,

w.

Paramenter
Reply to  tty
January 22, 2019 6:51 am

“If the error is random”
That is a VERY big “if”

I’ve run for couple of sites comparison between temperature mean calculated from 5-min sampled data per month and calculated by averaging daily midrange values (Tmin+Tmax)/2 per month. Normality test against subsequent errors was done using Shapiro-Wilk and D’Agostino tests. For some sites monthly error is normal, for some not, with pretty bad examples here and here.

William Ward
Reply to  Dr. Strangelove
January 16, 2019 7:48 am

Dr Strangelove,

This reply is from a “smart” phone. My apology in advance if there are formatting/typo problems.

The error appears to track the shape of the signal. Specifically whether the sine-ish wave spends more time near the max or min. Of course, the shape is a function of frequency content and vice versa. And, of course that is a function of what the Earth, Sun, clouds serve up that day. Is this random? I think the question is what does the distribution look like. I saw someone anslyzed it (Willis?) and said it was Gaussian, but the distribution didn’t look strictly Gaussian to me. It was close.

Refer to my Figure 4, where I show a year of error for Boulder. I have similar graphs for other stations in USCRN. They all seem to exhibit similar behavior. I think Fig 4 seems to favor error towards warming ( positive error). Others seem to favor a cooling bias. When I then analyze the trends given by the 2 methods, the bias seems to match the tendency in the daily error plot. Perhaps someone can run analysis to see how Gaussian the error is over time.

I propose that the rather small trend error compared to the large daily mean error is a function of this averaging effect. It might behave as a dither signal for an ADC. The trend biases are “small” (but similar to the trend magnitudes that cause panic), but there is absolute error – endpoint error that can be several degrees.

What do you think?

Reply to  William Ward
January 16, 2019 9:21 am

W Ward,
When I look at the curve provided by Willis at 9:40 pm, I get the impression that the upper half has a almost sinusoidal shape, which I believe is a result of the infinite heat sink to solar radiation. However the lower curve appears to flatten until heat is added. Temperature drops to the low point at the coldest part of the day. It slows due to the fact that as the temperature of the air decreases it approaches the temperature of the ground (the source of the heat?) and less energy is transferred, just like the graph of a cooling cup of coffee. Looks very similar to the discharge of a capacitor. The difference in the time between each Tmax and the time between each Tmin could be caused by the angle of the earth and the latitude of the measuring point.
Would like to see graphs for southern points taken during the same time period.

Reply to  William Ward
January 16, 2019 12:21 pm

“Refer to my Figure 4, where I show a year of error for Boulder. I have similar graphs for other stations in USCRN. They all seem to exhibit similar behavior. I think Fig 4 seems to favor error towards warming ( positive error).”
Well, I’ll refer to my figure here, also for Boulder, with 3 years of data. It shows annual smoothing, which takes out both daily and seasonal fluctuation. But more to the point, it shows the offsets you get when you variously choose to read the min/max at 8 am, 11 am, 2pm, 5pm etc. and they are different; some above, some below. All the talk here about min/max is meaningles sunless you first specify that timing (not done) and then look at other possible choices.

Reply to  Dr. Strangelove
January 16, 2019 11:05 am

Your assumption is fine for measurements of the same thing. If you take 365 measurements of the length of the same block of wood, your error of the mean can be reduced by this method. Not only does this require random errors but a normal distribution the errors.

You can’t use the same error rationale for measurements of different items like temperature measurements hours, days, or months later.

Answer these:

How can the temperature you read today affect the accuracy of the temperature you read yesterday?

If you compute an average, do you use the recorded temps + the recording errors, the recorded temps – the recording errors, or the recorded temps without any error adjustments? Why?

Why is one more accurate than the other two or do they all have equal probabilities?

Can the average of the temperatures be anywhere between the average of the highest figures or the lowest?

Does a reading from a third day affect the accuracy of either of the prior two day?

Geoff Sherrington
Reply to  Jim Gorman
January 17, 2019 2:08 am

JG, “How can the temperature you read today affect the accuracy of the temperature you read yesterday?”
We cannot understand how/if it will affect it physically, because its value can be fixed by multiple inputs that are usually not measured; but it can afterwards be examined through statistics.
That is why I am delving again into geostatistics, which was originally developed to help estimate how much one assay value down a mining drill hole can predict for another assay, by looking at the changes in differences between pairs of assays separated by various distances.
With weather station sites, the first finding might be that one should not expect T at a site to have predictive value for another site if they are separated by more than a certain distance. What is that critical separation distance? Informally, my work so far is suggesting 300 km, which is a lot less than is used for conventional pair matching during homogenization, but I have not finished the work. Geoff.

Reply to  Geoff Sherrington
January 17, 2019 9:39 am

You are missing my point. You may judge the accuracy of a given thermometer by comparing it to others but you can’t use other thermometers to tell where a given recorded measurement falls within the range of recording error.

For example, you may determine that a given thermometer is reading two degrees low and adjust the reading by that factor. That is, move a reading of 48 +- 0.5 degrees to 50 degrees but the error range remains, i.e., 50 +- 0.5 degrees.

The error range of one individual measurement taken at one point in time is fixed. Earlier or later temperature readings will not affect the error range of that one measurement. This is an important distinction that applies when computing averages. You can not reduce the error range by averaging recordings from multiple days. In fact, if you average a recorded temp with one that has a smaller recording error you must use the larger error range as the limiting factor. Otherwise you will be crediting the far less accurate recording with a false error range.

That is why trends with different error ranges shouldn’t be spliced together without their error ranges being included on a graph. Temp readings with higher error ranges shouldn’t be averaged together with more accurate data in order to claim more accuracy than they are entitled to.

I’m not receiving many responses to my criticisms so I don’t know if people agree or not. Or maybe I’m just too far off topic. I do know that when trends of one tenth or or hundredth of a degree are quoted from temperatures that were only recorded to the nearest degree something is not kosher with the treatment of errors. These are not trends at all, they are within the range of error and should be considered noise.