A guest blogger recently1 made an analysis of the twice per day sampling of maximum and minimum temperature and its relationship to Nyquist rate, in an attempt to refute some common thinking. This blogger concluded the following:
(1) Fussing about regular samples of a few per day is theoretical only. Max/Min temperature recording is not sampling of the sort envisaged by Nyquist because it is not periodic, and has a different sort of validity because we do not know at what time the samples were taken.
(2) Errors in under-sampling a temperature signal are an interaction of sub-daily periods with the diurnal temperature cycle.
(3) Max/Min sampling is something else.
The purpose of the present contribution is to show that these first two conclusions are misleading without further qualification; and the third conclusion could use fleshing out to explain Max/Min values being “something else”.
1. Admonitions about sampling abound
In the world of analog to digital conversion admonitions to bandlimit signals before conversion are easy to find. For example, consider this verbatim quotation from the manual for a common microprocessor regarding use of its analog to digital (A/D or ADC) peripheral. The italics are mine.
“…Signal components higher than the Nyquist frequency
(fADC/2) should not be present to avoid distortion from unpredictable signal convolution. The user is advised to remove high frequency components with a low-pass filter before applying the signals as inputs to the ADC.”
Date: February 14, 2019.
2. Distortion from signal convolution
What does distortion from unpredictable signal convolution mean? Signal convolution is a mathematical operation. It describes how a linear system, like the sample and hold (S/H) capacitor of an A/D, attains a value from its input signal. For a specific instance, consider how a digital value would be obtained from an analog temperature sensor. The S/H circuit of an A/D accumulates charge from the temperature sensor input over a measurement interval, 0 → t, between successive A/D conversions.
Equation 1 is a convolution integral. Distortion occurs when the signal (s(t)) contains rapid, short-lived changes in value which are incompatible with the rate of sampling with the S/H circuit. This sampling rate is part of the response function, h(t). For example the S/H circuit of a typical A/D has small capacitance and small input impedance, and thus has very rapid response to signals, or wide bandwidth if you prefer. It looks like an impulse function. The sampling rate, on the other hand, is typically far slower, perhaps every few seconds or minutes, depending on the ultimate use of the data. In this case h(t) is a series of impulse functions separated by the sampling rate. If s(t) is a slowly varying signal, the convolution produces a nearly periodic output. In the frequency domain, the Fourier transform of h(t), the transfer function (H(ω)), also is periodic, but its periods are closely spaced, and if the sample rate is too slow, below the Nyquist rate, spectra of the signal (S(ω)) overlap and add to one another. This is aliasing, which the guest blogger covered in detail.
From what I have just described, several things should be apparent. First, the problem of aliasing cannot be undone after the fact. It is not possible to figure the numbers making up a sum from the sum itself. Second, aliasing potentially applies to signals other than the daily temperature cycle. The problem is one of interaction between the bandwidth of the A/D process and the rate of sampling. It occurs even if the A/D process consists of a person reading analog records, and recording by pencil. Brief transient signals, even if not cyclic, will enter the digital record so long as they are within the passband of the measurement apparatus. This is why good engineering seeks to match the bandwidth of a measuring system to the bandwidth of the signal. A sufficiently narrow bandwidth improves the signal to noise ratio (S/N), and prevents spurious, unpredictable distortion.
One other thing not made obvious in either my discussion, or that of the guest blogger, concerns the diurnal signal. While a diurnal signal is slow enough to be captured without aliasing by a twice per day measurement cycle, it would never be adequately defined by such a sample. One would be relatively ignorant of the phase and true amplitude of the diurnal cycle with twice per day sampling. For this reason most people sample at least as fast as 2 and one-half times the Nyquist rate to obtain usefully accurate phase and amplitude measurements of signals near the Nyquist rate.
3. An example drawn from real data
Figure 1. A portion of AWOS record.
As an example of distortion from unpredictable signal convolution refer to Figure 1. This figure shows a portion of temperature history drawn from an AWOS station. Note that the hourly temperature records from 23:53 to 4:53 show temperatures sampled on schedule which vary from −29◦F to −36◦F, but the 6 hour records show a minimum temperature of −40◦F.
Obviously the A/D system responded to and recorded a brief duration of very cold air which has been missed in the periodic record completely, but which will enter the Max/Min records as Min of the day. One might well wonder what other noisy events have distorted the temperature record. Obviously the Max/Min temperature records here are distorted in a manner just like aliasing– a brief, high frequency, event has made its way into the slow, twice per day Max/Min record. The distortion is about 2◦F difference between Max/Min and the mean of 24 hourly temperatures–a difference completely unanticipated by the relatively high sampling rate of once per hour, if one accepts the blogger’s analysis uncritically. Just as obviously, if such event had occurred coincident with one of the hourly measurement schedules, it would have become a part of the 24 samples per day spectrum, but at a frequency not reflective of its true duration. So, there are two issues here. The first one being the distortion from under-sampling, and the second being that transient signals possibly aren’t represented at all in some samples but are quite prevalent in others.
In summary, while the Max/Min records are not the sort of uniform sampling rate that the Nyquist theorem envisions, they aren’t far from being such. They are like periodic measurements with a bad clock jitter. It is difficult to argue that a distortion from unpredictable convolution does not have an impact on the spectrum resembling aliasing. Certainly sampling at a rate commensurate with the brevity of events like that in Figure 1 would produce a more accurate daily “mean” than does midpoint of the daily range; or, alternatively one could use a filter to condition the signal ahead of the A/D circuit, just as the manual for the microprocessor suggests, and just as anti-aliasing via the Nyquist criterion, or improvement of S/N would demand. Trying to completely fix the impact of aliasing from digital records is impossible after the fact. The impact is not necessarily negligible, nor is it mainly an interaction with the diurnal cycle. This is not just a theoretical problem; especially considering that Max/Min temperatures are expected to detect even brief temperature excursions, there isn’t any way to mitigate the problem in the Max/Min records themselves. This provides a segue into a discussion about the “something otherness” of Max/Min records.
4. Nature of the Midrange
The midpoint of the daily range of temperature is a statistic. It is among a group known as order statistics, as it comes from data ordered from low to high value. It serves as a measure of central tendency of temperature measurements, a sort of average; but is different from the more common mean, median, and mode statistics. To speak of the midpoint range as a daily mean temperature is simply wrong.
If we think of air temperature as a random variable following some sort of probability distribution, possessing a mean along with a variance, then the midpoint of range may serve as an estimator of mean so long as the distribution is symmetric (kurtosis, excess, and higher moments are zero). It might also be an efficient or robust estimator if the distribution is confined between two hard limits, a form known as platykurtic for having little probability in the distribution tails. In such case we could also estimate a monthly mean temperature using a midrange value from the minimum and maximum temperatures of the month or even an annual mean using the highest and lowest temperatures for a year.
In the case of the AWOS of Figure 1 the annual midpoint is some 20◦F below the mean of daily midpoints, and even a monthly midpoint is typically 5◦F below the mean of daily values. The midpoint is obviously not an efficient estimator at this station, although it could work well perhaps at tropical stations where the distribution of temperature is more nearly platykurtic.
The site from which the AWOS data in Figure 1 was taken is continental; and while this particular January had a minimum temperature of −40◦F, it is not unusual to observe days where the maximum January temperature rises into the mid 60s. The weather in January often consists of a sequence of warm days in advance of a front, with a sequence of cold days following. Thus the temperature distribution at this site is possibly multimodal with very broad tails and without symmetry. In this situation the midrange is not an efficient estimator. It is not robust either, because it depends greatly on extreme events. It is also not an unbiased estimator as the temperature probability distribution is probably not symmetric. It is, however, what we are stuck with when seeking long-term surface temperature records.
One final point seems worth making. Averaging many midpoint values together probably will produce a mean midpoint that behaves like a normally distributed quantity, since all elements to satisfy the central limit theorem seem present. However, people too often assume that averaging fixes all sorts of ills–that averaging will automatically reduce variance in a statistic by the factor 1/√n. This is strictly so only when samples are unbiased, independent and identically distributed. The subject of data independence is beyond the scope of this paper, but here I have made a case that the probability distribution of the maximum and minimum values are not necessarily the same as one another, and may vary from place to place and time to time. I think precision estimates for “mean surface temperature” derived from midpoint of range (Max/Min) are too optimistic.