A condensed version of a paper entitled: “Violating Nyquist: Another Source of Significant Error in the Instrumental Temperature Record”.

By William Ward, 1/01/2019

The 4,900-word paper can be downloaded here: https://wattsupwiththat.com/wp-content/uploads/2019/01/Violating-Nyquist-Instrumental-Record-20190112-1Full.pdf

The 169-year long instrumental temperature record is built upon 2 measurements taken daily at each monitoring station, specifically the maximum temperature (Tmax) and the minimum temperature (Tmin). These daily readings are then averaged to calculate the daily mean temperature as Tmean = (Tmax+Tmin)/2. Tmax and Tmin measurements are also used to calculate monthly and yearly mean temperatures. These mean temperatures are then used to determine warming or cooling trends. This “historical method” of using daily measured Tmax and Tmin values for mean and trend calculations is still used today. However, air temperature is a signal and measurement of signals must comply with the mathematical laws of signal processing. The Nyquist-Shannon Sampling Theorem tells us that we must sample a signal at a rate that is at least 2x the highest frequency component of the signal. This is called the Nyquist Rate. Sampling at a rate less than this introduces aliasing error into our measurement. The slower our sample rate is compared to Nyquist, the greater the error will be in our mean temperature and trend calculations. The Nyquist Sampling Theorem is essential science to every field of technology in use today. Digital audio, digital video, industrial process control, medical instrumentation, flight control systems, digital communications, etc., all rely on the essential math and physics of Nyquist.

NOAA, in their USCRN (US Climate Reference Network) has determined that it is necessary to sample at 4,320-samples/day to practically implement Nyquist. 4,320-samples/day equates to 1-sample every 20 seconds. This is the practical Nyquist sample rate. NOAA averages these 20-second samples to 1-sample every 5 minutes or 288-samples/day. NOAA only publishes the 288-sample/day data (not the 4,320-samples/day data), so to align with NOAA the rate will be referred to as “288-samples/day” (or “5-minute samples”). (Unfortunately, NOAA creates naming confusion with their process of averaging down to a slower rate. It should be understood that the actual rate is 4,320-samples/day.) This rate can only be achieved by automated sampling with electronic instruments. Most of the instrumental record is comprised of readings of mercury max/min thermometers, taken long before automation was an option. Today, despite the availability of automation, the instrumental record still uses Tmax and Tmin (effectively 2-samples/day) instead of a Nyquist compliant sampling. The reason for this is to maintain compatibility with the older historical record. However, with only 2-samples/day the instrumental record is highly aliased. It will be shown in this paper that the historical method introduces significant error to mean temperatures and long-term temperature trends.

NOAA’s USCRN is a small network that was completed in 2008 and it contributes very little to the overall instrumental record. However, the USCRN data provides us a special opportunity to compare a high-quality version of the historical method to a Nyquist compliant method. The Tmax and Tmin values are obtained by finding the highest and lowest values among the 288 samples for the 24-hour period of interest.

 

NOAA USCRN Examples to Illustrate the Effect of Violating Nyquist on Mean Temperature

The following example will be used to illustrate how the amount of error in the mean temperature increases as the sample rate decreases. Figure 1 shows the temperature as measured at Cordova AK on Nov 11, 2017, using the NOAA USCRN 5-minute samples.

clip_image002

Figure 1: NOAA USCRN Data for Cordova, AK Nov 11, 2017

The blue line shows the 288 samples of temperature taken that day. It shows 24-hours of temperature data. The green line shows the correct and accurate daily mean temperature that is calculated by summing the value of each sample and then dividing the sum by the total number of samples. Temperature is not heat energy, but it is used as an approximation of heat energy. To that extent, the mean (green line) and the daily-signal (blue line) deliver the exact same amount of heat energy over the 24-hour period of the day. The correct mean is -3.3 °C. Tmax is represented by the orange line and Tmin by the grey line. These are obtained by finding the highest and lowest values among the 288 samples for the 24-hour period. The mean calculated from (Tmax+Tmin)/2 is shown by the red line. (Tmax+Tmin)/2 yields a mean of -4.7 °C, which is a 1.4 °C error compared to the correct mean.

Using the same signal and data from Figure 1, Figure 2 shows the calculated temperature means obtained from progressively decreased sample rates. These decreased sample rates can be obtained by dividing down the 288-sample/day sample rate by a factor of 4, 8, 12, 24, 48, 72 and 144. Therefore, the sample rates will correspond to: 72, 36, 24, 12, 6, 4 and 2-samples/day respectively. By properly discarding the samples using this method of dividing down, the net effect is the same as having sampled at the reduced rate originally. The corresponding aliasing that results from the lower sample rates, reveals itself as shown in the table in Figure 2.

clip_image004

Figure 2: Table Showing Increasing Mean Error with Decreasing Sample Rate

It is clear from the data in Figure 2, that as the sample rate decreases below Nyquist, the corresponding error introduced from aliasing increases. It is also clear that 2, 4, 6 or 12-samples/day produces a very inaccurate result. 24-samples/day (1-sample/hr) up to 72-samples/day (3-samples/hr) may or may not yield accurate results. It depends upon the spectral content of the signal being sampled. NOAA has decided upon 288-samples/day (4,320-samples/day before averaging) so that will be considered the current benchmark standard. Sampling below a rate of 288-samples/day will be (and should be) considered a violation of Nyquist.

It is interesting to point out that what is listed in the table as 2-samples/day yields 0.7 °C error. But (Tmax+Tmin)/2 is also technically 2-samples/day with an error of 1.4°C as shown in the table. How can this be possible? It is possible because (Tmax+Tmin)/2 is a special case of 2-samples per day because these samples are not spaced evenly in time. The maximum and minimum temperatures happen whenever they happen. When we sample properly, we sample according to a “clock” – where the samples happen regularly at exactly the same time of day. The fact that Tmax and Tmin happen at irregular times during the day causes its own kind of sampling error. It is beyond the scope of this paper to fully explain, but this error is related to what is called “clock jitter”. It is a known problem in the field of signal analysis and data acquisition. 2-samples/day, regularly timed, would likely produce better results than finding the maximum and minimum temperatures from any given day. The instrumental temperature record uses the absolute worst method of sampling possible – resulting in maximum error.

Figure 3 shows the same daily temperature signal as in Figure 1, represented by 288-samples/day (blue line). Also shown is the same daily temperature signal sampled with 12-samples/day (red line) and 4-samples/day (yellow line). From this figure, it is visually obvious that a lot of information from the original signal is lost by using only 12-samples/day, and even more is lost by going to 4-samples/day. This lost information is what causes the resulting mean to be incorrect. This figure graphically illustrates what we see in the corresponding table of Figure 2. Figure 3 explains the sampling error in the time-domain.

clip_image006

Figure 3: NOAA USCRN Data for Cordova, AK Nov 11, 2017: Decreased Detail from 12 and 4-Samples/Day Sample Rate – Time-Domain

Figure 4 shows the daily mean error between the USCRN 288-samples/day method and the historical method, as measured over 365 days at the Boulder CO station in 2017. Each data point is the error for that particular day in the record. We can see from Figure 4 that (Tmax+Tmin)/2 yields daily errors of up to ± 4 °C. Calculating mean temperature with 2-samples/day rarely yields the correct mean.

clip_image008

Figure 4: NOAA USCRN Data for Boulder CO – Daily Mean Error Over 365 Days (2017)

Let’s look at another example, similar to the one presented in Figure 1, but over a longer period of time. Figure 5 shows (in blue) the 288-samples/day signal from Spokane WA, from Jan 13 – Jan 22, 2008. Tmax (avg) and Tmin (avg) are shown in orange and grey respectively. The (Tmax+Tmin)/2 mean is shown in red (-6.9 °C) and the correct mean calculated from the 5-minute sampled data is shown in green (-6.2 °C). The (Tmax+Tmin)/2 mean has an error of 0.7 °C over the 10-day period.

clip_image010

Figure 5: NOAA USCRN Data for Spokane, WA – Jan13-22, 2008

 

The Effect of Violating Nyquist on Temperature Trends

Finally, we need to look at the impact of violating Nyquist on temperature trends. In Figure 6, a comparison is made between the linear temperature trends obtained from the historical and Nyquist compliant methods using NOAA USCRN data for Blackville SC, from Jan 2006 – Dec 2017. We see the trend derived from the historical method (orange line) starts approximately 0.2 °C warmer and has a 0.24 °C/decade warming bias compared to the Nyquist compliant method (blue line). Figure 7 shows the trend bias or error (°C/Decade) for 26 stations in the USCRN over a 7-12 year period. The 5-minute samples data gives us our reference trend. The trend bias is calculated by subtracting the reference from the (Tmaxavg+Tminavg)/2 derived trend. Almost every station exhibits a warming bias, with a few exhibiting a cooling bias. The largest warming bias is 0.24 °C/decade and the largest cooling bias is -0.17 °C/decade, with an average warming bias across all 26 stations of 0.06C. According to Wikipedia, the calculated global average warming trend for the period 1880-2012 is 0.064 ± 0.015 °C per decade. If we look at the more recent period that contains the controversial “Global Warming Pause”, then using data from Wikipedia, we get the following warming trends depending upon which year is selected for the starting point of the “pause”:

1996: 0.14°C/decade

1997: 0.07°C/decade

1998: 0.05°C/decade

While no conclusions can be made by comparing the trends over 7-12 years from 26 stations in the USCRN to the currently accepted long-term or short term global average trends, it can be instructive. It is clear that using the historical method to calculate trends yields a trend error and this error can be of a similar magnitude to the claimed trends. Therefore, it is reasonable to call into question the validity of the trends. There is no way to know for certain, as the bulk of the instrumental record does not have a properly sampled alternate record to compare it to. But it is a mathematical certainty that every mean temperature and derived trend in the record contains significant error if it was calculated with 2-samples/day.

clip_image012

Figure 6: NOAA USCRN Data for Blackville, SC – Jan 2006-Dec 2017 – Monthly Mean Trendlines

clip_image014

Figure 7: Trend Bias (°C/Decade) for 26 Stations in USCRN

Conclusions

1. Air temperature is a signal and therefore, it must be measured by sampling according to the mathematical laws governing signal processing. Sampling must be performed according to The Nyquist Shannon-Sampling Theorem.

2. The Nyquist-Shannon Sampling Theorem has been known for over 80 years and is essential science to every field of technology that involves signal processing. Violating Nyquist guarantees samples will be corrupted with aliasing error and the samples will not represent the signal being sampled. Aliasing cannot be corrected post-sampling.

3. The Nyquist-Shannon Sampling Theorem requires the sample rate to be greater than 2x the highest frequency component of the signal. Using automated electronic equipment and computers, NOAA USCRN samples at a rate of 4,320-samples/day (averaged to 288-samples/day) to practically apply Nyquist and avoid aliasing error.

4. The instrumental temperature record relies on the historical method of obtaining daily Tmax and Tmin values, essentially 2-samples/day. Therefore, the instrumental record violates the Nyquist-Shannon Sampling Theorem.

5. NOAA’s USCRN is a high-quality data acquisition network, capable of properly sampling a temperature signal. The USCRN is a small network that was completed in 2008 and it contributes very little to the overall instrumental record, however, the USCRN data provides us a special opportunity to compare analysis methods. A comparison can be made between temperature means and trends generated with Tmax and Tmin versus a properly sampled signal compliant with Nyquist.

6. Using a limited number of examples from the USCRN, it has been shown that using Tmax and Tmin as the source of data can yield the following error compared to a signal sampled according to Nyquist:

a. Mean error that varies station-to-station and day-to-day within a station.

b. Mean error that varies over time with a mathematical sign that may change (positive/negative).

c. Daily mean error that varies up to +/-4°C.

d. Long term trend error with a warming bias up to 0.24°C/decade and a cooling bias of up to 0.17°C/decade.

7. The full instrumental record does not have a properly sampled alternate record to use for comparison. More work is needed to determine if a theoretical upper limit can be calculated for mean and trend error resulting from use of the historical method.

8. The extent of the error observed with its associated uncertain magnitude and sign, call into question the scientific value of the instrumental record and the practice of using Tmax and Tmin to calculate mean values and long-term trends.

Reference section:

This USCRN data can be found at the following site: https://www.ncdc.noaa.gov/crn/qcdatasets.html

NOAA USCRN data for Figure 1 is obtained here:

https://www1.ncdc.noaa.gov/pub/data/uscrn/products/subhourly01/2017/CRNS0101-05-2017-AK_Cordova_14_ESE.txt

NOAA USCRN data for Figure 4 is obtained here:

https://www1.ncdc.noaa.gov/pub/data/uscrn/products/daily01/2017/CRND0103-2017-AK_Cordova_14_ESE.txt

NOAA USCRN data for Figure 5 is obtained here:

https://www1.ncdc.noaa.gov/pub/data/uscrn/products/subhourly01/2008/CRNS0101-05-2008-WA_Spokane_17_SSW.txt

NOAA USCRN data for Figure 6 is obtained here:

https://www1.ncdc.noaa.gov/pub/data/uscrn/products/monthly01/CRNM0102-SC_Blackville_3_W.txt

The climate data they don't want you to find — free, to your inbox.
Join readers who get 5–8 new articles daily — no algorithms, no shadow bans.
5 1 vote
Article Rating
575 Comments
1sky1
January 22, 2019 5:48 pm

We are now more than 500 comments deep and nothing resembling a compellingly clear view of adequate sampling has emerged. Two quite distinct problems are still being unduly conflated or confused:

1. The relatively simple task of closely estimating the daily or monthly mean.

2. The much more demanding objective of preserving the spectral structure of the continuous signal in the discrete, periodic samples, so that the bandlimited interpolation of Shannon’s Theorem can closely reconstruct the original signal.

While potential aliasing of high frequency components into lower (baseband) frequencies is a major concern in the latter case, it’s quite irrelevant (barring any aliasing into zero-frequency) in the former. After all, in climatic-scale analyses, we’re not trying to reconstruct the ever-varying diurnal cycle, whose highly atypical example William produces to over-dramatize the difference between the daily mid-range value and the mean. His misleading “thought exercise” about “clock-jitter” ignores the fact that the daily Min and Max tend to occur near dawn and in mid-afternoon, with a highly irregular separation of considerably less than half a day. That’s simply not explainable by any periodic, semi-diurnal “sampling” with random jitter.

What is overlooked almost entirely throughout the discussion is the intimate dependence of the great discrepancy between the mid-range value and the true mean upon the sharp daytime rise to the Max followed by the gradual, night-long decline to the following day’s Min near dawn. This asymmetry turns out to be quite stable at each station in the long run, resulting in good low-frequency coherence between the two distinctly different metrics. In fact, other analytically-derived estimates of the true monthly mean based solely upon the recorded extrema can reduce that discrepancy to nearly negligible levels. The value of historical Min/Max observations should not be dismissed out of hand.

William Ward
Reply to  1sky1
January 22, 2019 9:55 pm

Hello 1sky1,

1sky1 said: “We are now more than 500 comments deep and nothing resembling a compellingly clear view of adequate sampling has emerged.”

My reply: I don’t agree with that statement. I think Willis has shown that hourly sampling (24-samples/day) seems to be the rate that beyond which error is measured in hundredths or thousandths of a degree C. From a system engineering perspective I would still go with USCRN rate of 288 (averaged from 4,320). If a more detailed study showed the system requirements could be lower than 288, then I have no objection to that.

1sky1 said: “Two quite distinct problems are still being unduly conflated or confused: 1. The relatively simple task of closely estimating the daily or monthly mean. 2. The much more demanding objective of preserving the spectral structure of the continuous signal in the discrete, periodic samples, so that the bandlimited interpolation of Shannon’s Theorem can closely reconstruct the original signal.”

My reply: I think #1 has been a point you have been emphasizing for a while and maybe one we should take up in more detail. Am I correct that you don’t think the daily and/or monthly means are affected by the aliasing from working with the historical method (max/min)? What do you think of the recent post by Paramenter where he showed the distribution of monthly mean error? I showed a year’s worth of daily mean error for at least 3 locations. Regarding your #1, can you put a figure on “closely estimate” and explain what you mean by simple task? 1sky1 I’m trying to have a more open conversation with you so I want to understand your point more thoroughly. Maybe we can get your analysis for the data Paramenter or I provided or you can provide your counter analysis.

1sky1 said: “While potential aliasing of high frequency components into lower (baseband) frequencies is a major concern in the latter case, it’s quite irrelevant (barring any aliasing into zero-frequency) in the former.”

My reply: Do you agree that at 2-samples/day, the content at or near 2-cycles/day will alias the zero-frequency?

1sky1 said: “His misleading “thought exercise” about “clock-jitter” ignores the fact that the daily Min and Max tend to occur near dawn and in mid-afternoon, with a highly irregular separation of considerably less than half a day. That’s simply not explainable by any periodic, semi-diurnal “sampling” with random jitter.”

My reply: “What if you designed a system to sample electronically at dawn and mid-afternoon? Day after day you would get samples of the analog signal. Can these samples be used to accurately reconstruct the original signal? Not likely. Can they be used to accurately calculate the mean (daily or monthly)? It doesn’t appear to be so based upon all of the analysis done so far. Do you have information that shows something different? Why cant they be used to determine the mean? Unless the signal is symmetrical about an axis, we need to integrate the entire signal. Why can’t we integrate the entire signal? Because we don’t have samples that comply with Nyquist. Therefore, I conclude that the historical method is a sampling/Nyquist problem. The point about jitter was just to show that violating timing doesn’t invalidate Nyquist it violates it. I guess I could be convinced to abandon the use of the word jitter in this application, but I’m still not convinced by any arguments that the historical method is not a violation of Nyquist.

1sky1 said: “What is overlooked almost entirely throughout the discussion is the intimate dependence of the great discrepancy between the mid-range value and the true mean upon the sharp daytime rise to the Max followed by the gradual, night-long decline to the following day’s Min near dawn. This asymmetry turns out to be quite stable at each station in the long run, resulting in good low-frequency coherence between the two distinctly different metrics. In fact, other analytically-derived estimates of the true monthly mean based solely upon the recorded extrema can reduce that discrepancy to nearly negligible levels. The value of historical Min/Max observations should not be dismissed out of hand.”

My reply: It sounds like you have done some work around this – studying the stability of the asymmetry and the good low-frequency coherence. Can you tell us more? What are you comparing to what? What is your reference? It seems to me that without a properly sampled signal (24 to 288-samples/day) to use as a reference, it would be difficult to really know anything. I noticed the signal shape you mention. But how can max and min values get you to where you want to go. Is there a formula that can be used that, based upon this typical shape we can get the correct mean? It is similar to a capacitor charge/discharge. Maybe with a time constant and the max and min, then means could be more accurately calculated?? That would be interesting. If so, then we could reevaluate the record with more accuracy. Overall, I’m struggling to see the value of max/min when properly sample signals seem to show much different means.

Also, I don’t think anyone has gotten around to critically analyzing the 26 trends I show. In the discussions I think I also provided 3-5 charts showing the yearly mean differences between 288 and historical and the associated linear trend over 10-12 years. 1sky1, what do you think about the trends and the charts showing the yearly mean errors. I can grab the links and provide them again if they have been lost in the shuffle.

Clyde Spencer
Reply to  William Ward
January 23, 2019 9:43 am

William
You said, “I think Willis has shown that hourly sampling (24-samples/day) seems to be the rate that beyond which error is measured in hundredths or thousandths of a degree C.” True enough, but, I think that there is another concern with 24 samples per day. As a rule of thumb, something like 20 to 30 samples are recommended as a minimum number of samples to be able to demonstrate statistical significance. So, 24 samples are right on the edge of the minimum for statistically analyzing daily data. Now, it is assumed that what happens at the daily interval is mostly meteorological noise. However, what if it isn’t? What if there is a signal or trend that could be teased out at the daily level that tells us something about climatological changes? It would be better to have a more statistically robust data set to work with to explore that. We would never find it with only mid-range samples, and 24 samples would not allow the rigor that 100+ samples would. So, by standardizing on what NOAA has selected (288/day) gives us something to work with should someone want to pursue a path down Heresy Lane. If we settle on hourly data, then future researchers won’t have historical data to work with that would allow them to go beyond what we know today.

Incidentally, Willis’ data suggest that the error in the mean asymptotically approaches zero around an order of magnitude more frequent sampling than hourly. Why wouldn’t we want to eliminate a potential source of error or uncertainty for a trivial cost increase? Lastly, even though hourly data allows a good estimate of the mean, which is the primary use to which it is being put today, Nyquist-compliant sampling assures the ability to reconstruct the times-series faithfully, which future researchers may thank us for if they want to go beyond where we are, such as looking for trends in the standard deviation or daily energy exchanges.

1sky1
Reply to  William Ward
January 23, 2019 4:02 pm

I have neither the time nor the interest to keep dispelling the same misreadings and basic analytic misconceptions over and over in stubbornly fixated minds.

In a nutshell, the periodic sampling rate required for close signal RECONSTRUCTION is very much greater than that for accurate determination of the signal mean. 288 samples/day from fast-response thermistors is not enough to avoid aliasing when there are significant temperature variations produced by 3-sec gusts in strong winds associated with frontal passages. Conversely, hourly readings of LIG thermometers are more than adequate to establish the monthly means for CLIMATIC investigations. The details of diurnal wave-forms are not only irrelevant in the the latter case, they constitute an obstacle to perceiving the low-frequency climate signal. Since the spectral content of those forms is almost always negligible beyond the fourth harmonic, even periodic sampling at 3-hour intervals is sufficient to prevent significant aliasing into the monthly mean–provided the data are properly decimated.

A singular property of the much-maligned diurnal mid-range metric is the total absence of ANY aliasing, because that metric is determined not from discrete samples, but from the CONTINUOUS signal. While the (usually positive) offset from the true mean is a significant discrepancy, it can be greatly reduced by empirically determining for each station (and for each of 12 months) the coefficient 0 < eta <0.5 in a much more effective estimate, (1 – eta)Tmin + eta Tmax, of the true signal mean.

That pretty much sums everything up. Farewell!

Reply to  1sky1
January 23, 2019 1:24 am

“The value of historical Min/Max observations should not be dismissed out of hand.”
Indeed. One thing I’ve been emphasising is that the Min/Max measure tends to be offset from the integrated by a fairly constant amount. That constant depends on the time at which the min/max is read (this is the basis of the TOBS adjustment). So at one level, there is an apparent discrepancy of up to a degree or so, and min/max indices don’t even agree with each other, let alone the integrated.

But this doesn’t take into account that what is sought is the monthly mean of the anomaly. That is, the mean is subtracted, and so these offsets will disappear.

The same is actually true of periodically sampled averages. It is true that if you sample twice a day, say, that will alias with the second harmonic of the diurnal to give an offset, which could be up to a degree, as Willis has shown. But for a given month, the diurnal cycle doesn’t change much from year to year, so it is a fairly constant offset, and again disappears when you calculate temperature anomalies, since it is also present in the reference value.

Editor
January 22, 2019 10:33 pm

William Ward January 22, 2019 at 8:43 pm

Hi Willis,

You ended you post with “Your Friend”. Well alright! Thanks Willis.

I was perfectly serious in that, and you are welcome.

Willis said:

“All the data that I’ve looked at give a mean daily error on the order of five-thousandths of a degree; an RMS daily error on the order of five-hundredths of a degree; and a maximum daily error on the order of ± a quarter of a degree. Together these add up to a trend error on the order of a few thousandths of a degree per decade. None of these are significant in the field of climate science.”

My reply: Can you clarify what you are comparing here? Is it 24-samples/day vs. 288-samples/day? Or is it one of those vs. max/min? I’m assuming the former, but please clarify so I can respond to the correct concept. I’m not hung up on the difference between 288 and 24. We have shown error between 288 and max/min. Paramenter has provided some good information in addition to mine. If 24-samples/day produces the same error as 24, this doesn’t really change the core message. I think we are in agreement.

Yes, it was comparing hourly versus 288 samples per day.

Willis said:

“I don’t see the “oscillation” that you mention so perhaps I don’t understand what you are referring to.”

My reply: Look at my Fig 2. As you read the chart from the bottom up (increasing sample rate). If you were to plot the error vs sample-rate would decrease from 0.7 or 0.8 to 0.1, cross over zero to -0.1 and then back up to 0. I didn’t plot other rates, so we don’t know what it does other than the ones I show for that example. Not exactly an oscillation, but a convergence with ripple. The error changes signs. Your analysis is RMS. Can you explain how you account for sign of error in your analysis?

I believe from looking at Figure 2 that you are looking at one day’s data. I’m looking at years of daily data.

Next, the RMS error as the name implies is the square root of the mean of the squares of the errors, so it always has a positive value. Since I was looking at more than one day, I had lots of errors, and I wanted to know how well they were doing on average.

Willis said:

“A much more important question is, what we can do with the errors that using min-max has created in the past?” And: “So let me invite you to consider that question, of how we might minimize the errors of the traditional method ex post, as a much more important puzzle than the exact reason that we get errors from the traditional method. I’d be very happy to hear your thoughts, particularly on removing the aliasing …”

My reply: An admirable goal! I wish I had a more optimistic reply to match the good intention of your goal. There are plenty of texts you can refer to. I found this brief paper to be convenient:

http://www.dataphysics.com/downloads/technical/Effects-of-Sampling-and-Aliasing-on-the-Conversion-by-R.Welaratna.pdf

Quoting the paper: “Aliasing is irreversible. There is no way to examine the samples and determine which content to ignore because it came from aliased high frequencies. Aliasing can only be prevented by attenuating high frequency content before the sampling process…”

Yeah, I was afraid of that … however, I’m not sure it is completely true in the larger sense. It seems to me that it might only be true if we sample the signal at a single sampling rate.

But suppose we sample it at 288, 287 286, 285, etc. samples per day. In your opinion, could there be information in that set of samples which would allow us to distinguish between aliased and non aliased signals?

For example, my periodograms show very little aliasing in the hourly samples, but strong aliasing in the 4-hour band of the two-hour samples … shouldn’t that tell us something about the signal?

Maybe if you study the individual station signals you can come up with some innovative way to reduce the daily mean error generated by max/min for days in those stations. If you can do this successfully for day after day in a station then maybe you are on to something. I’ll think about this some more…

Curiously, what I realized today is that we don’t really need to reduce the absolute error. What we need to do is to reduce the trend error. I haven’t worked out yet what that might mean … I know that a proper combination of max and mean data can likely do that, and I’ve done it for the Redding data. But how stable that might be over time is a question …

I suspect that the eventual trend of the traditional method is related to the trends of the max and the min. By that I mean that say if the max has no trend and the min is warming, it pushes both the true trend and the traditional trend in the same direction, but by different amounts. So we may be able to use that to reduce the trend error.

Anyhow, those are my thoughts. I’m working as usual on about three projects right now (Argo data, buoy data, and this one), so as time and the tides permit I’ll post up what I’m finding.

Finally, you said to 1sky1:

I think Willis has shown that hourly sampling (24-samples/day) seems to be the rate that beyond which error is measured in hundredths or thousandths of a degree C. From a system engineering perspective I would still go with USCRN rate of 288 (averaged from 4,320). If a more detailed study showed the system requirements could be lower than 288, then I have no objection to that.

I have no problem with the system requirements being 288. My question is more practical—is the error in hourly sampling small enough to get solid results? I am looking at that because we have lots of hourly data and little 288 sample data.

My best to you, and my personal thanks for fighting through the headwinds in order to get back to the science … much appreciated.

w.

Clyde Spencer
Reply to  Willis Eschenbach
January 23, 2019 10:00 am

Willis,
In response to William, you said, “Curiously, what I realized today is that we don’t really need to reduce the absolute error. What we need to do is to reduce the trend error.” That is true with respect to answering the prevailing question of the times. But, what if future researchers want to go beyond what we are worried about today? They would be quite thankful if their inheritance from us would be data that allowed faithful reconstruction of daily temperature data. As an example, there are claims that warming has been resulting in more extreme weather. Weather is what happens on a daily basis. What if researchers had a historical data set that allowed rigorous analysis of meteorological parameters on a 5-minute basis to see if there really is a change in extremes?

Reply to  Clyde Spencer
January 23, 2019 12:52 pm

Clyde, as I said, “I have no problem with the system requirements being 288.” I’m a data guy, and the more facts we have the better off we are.

w.

Bright Red
Reply to  Clyde Spencer
January 23, 2019 6:29 pm

Clyde said “That is true with respect to answering the prevailing question of the times. But, what if future researchers want to go beyond what we are worried about today? They would be quite thankful if their inheritance from us would be data that allowed faithful reconstruction of daily temperature data. As an example, there are claims that warming has been resulting in more extreme weather. Weather is what happens on a daily basis. What if researchers had a historical data set that allowed rigorous analysis of meteorological parameters on a 5-minute basis to see if there really is a change in extremes?”
and Clyde said
” Why wouldn’t we want to eliminate a potential source of error or uncertainty for a trivial cost increase? Lastly, even though hourly data allows a good estimate of the mean, which is the primary use to which it is being put today, Nyquist-compliant sampling assures the ability to reconstruct the times-series faithfully, which future researchers may thank us for if they want to go beyond where we are, such as looking for trends in the standard deviation or daily energy exchanges.”

I agree with what you have said above as we do not know what the data collected now will be used for in the future. Further we only get one chance to take the measurements unlike in a laboratory.
The amount of data and transmission time are excuses not reasons. We should be doing the best job possible and in my view 288samples/day is at the very bottom end.
These readings SHOULD be on the record UNALTERED for as long there is someone or something to look at them and that hopefully will be a very very long time.

To coin a phrase
Anyone who claims to know all that will be required from the data in the year 2100 is blowing smoke up your fundamental orifice …

William Ward
Reply to  Clyde Spencer
January 23, 2019 8:52 pm

Clyde,

As to your “inheritance” thoughts… right on! I would like to see what we could learn about climate with better use of data and available technology. I have an interesting example from the process control industry. As you know, some factories can generate revenue from operations that can be measured in the millions of dollars/hr. If a line goes down because a machine goes down, then revenue and profit can suffer. If a company misses earnings estimates because the line is down for a prolonged period of time then stock prices can take a big hit, causing misery for the executives and even the employees if their compensation is tied to company stock. If the company runs their factory 24/7 then there is no way to make up lost time from a shut down line. So systems have been developed to study frequency content from sensors on machines with motors. Through gathering the right data and analyzing it, we now have frequency profiles of what a bearing starts to “sound” like when it starts to go bad. Using this information, a bearing that is starting to go bad can be identified and replaced prior to failure with only a brief service interruption that doesn’t shut down the line.

What could we learn about weather (and therefore climate) with better tools and data? The mindset of climate scientists has to change first.

Ps – for the record – Willis and I are much better in sync now. My comments to you are generic and not related to anything Willis said. Just clarifying.

Editor
January 23, 2019 12:00 am

Well, after my thought that the difference in trends between the traditional and the true trends might be related to the trends of the max and mean values, I thought of a quick test of that. I took the trends of the true, traditional, max, and min monthly values for each of the 12 years of the Redding USCRN record. I then used linear regression with the difference in the true and the traditional trends as the predictand and the max and min trends as the variables. In other words, I was seeing if I could predict the trend error from the min and max trends. Here is that result.

Interesting result, huh? The encouraging part is that the one set of parameters from the linear regression apply well to all of the individual years.

Hmmm … I’ll have to take a look at some other datasets. I’ve seen far too many one-off good fits to get too excited. Lots of flashes in the pan, not a lot of actual successes …

Best to all,

w.

Reply to  Willis Eschenbach
January 23, 2019 1:13 pm

I neglected to mention an interesting part of the equation. When the max trend goes up, the trend error increases … but when the min trend goes up the trend error decreases by about the same amount. Here’s the regression:

Call:
lm(formula = (trad - true) ~ themax + themin)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.12079 -0.09099 -0.04888  0.08142  0.25382 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  0.07488    0.12109   0.618  0.55167   
themax       0.07933    0.01715   4.626  0.00124 **
themin      -0.08135    0.03666  -2.219  0.05367 . 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1347 on 9 degrees of freedom
Multiple R-squared:  0.715,	Adjusted R-squared:  0.6517 
F-statistic: 11.29 on 2 and 9 DF,  p-value: 0.003522

Go figure …

w.

Clyde Spencer
Reply to  Willis Eschenbach
January 23, 2019 8:23 pm

Willis
You said, “When the max trend goes up, the trend error increases … but when the min trend goes up the trend error decreases by about the same amount.” I think that I have an explanation for you. As the “max trend goes up” the difference between min and max increases, meaning an interpolation (mid-range) over a larger difference. However, when the min trend goes up, the difference is decreased, meaning that the interpolation is over a smaller difference with less chance for error. Not unlike calculating the slope of a line by letting the limit approach zero.

Paramenter
January 23, 2019 2:01 pm

Hey Bernie,

We are talking here of knowing (or NOT knowing) the sample time RELATIVE to a regular spacing. That is the “time” we don’t know.

Quite correct but that only makes things worse from the signal recovery point of view. Or starting from another end: what if you actually have timestamp per each daily min and max? You have array of not equally spaced values and you can somehow interpolate. Not splendid, but smaller error. Now, remove the timestamp? More uncertainty and bigger error then.

If you’ve got 2 measurements per day not knowing their exact time what options have you got? One is to assume equal spacing and interpolate. Another option is to assign both values to the same point (a day) and then calculate average. Then you’ve got regular spacing between samples. So, for me not knowing exact sample timing causes further degradation of the signal recovery procedure and bigger error. All aligned with Nyquist.

For example, in the case where you HAVE actually sampled to 288 samples/day, is a particular value of Tmax reported at n=0 or n=287 or in-between? Potentially HUGE obvious errors, and unrelated to Nyquist or to “jitter”.

From where the error between true mean and daily midrange value comes from? It comes from the fact midrange value is often poor estimator. Why midrange value is often poor estimator? Because for certain variable changes, as temperature, shape of those changes (or signal form) determines that midrange value is dragged away from a true mean value. Because you cannot reliably recover a signal, in consequence you introduce an error. Precisely because of Nyquist. For me it is simply like that.

Well no – Actually I am talking about recovering the full signal EXACTLY from bunched samples.

Yes, under certain circumstances such interlaced or ‘bunched’ samples would work. Unfortunately, not for all. For example, if you’ve got bursts of dense sampling followed by gaps – all at the irregular intervals signal recovery becomes highly problematic.

Reply to  Paramenter
January 23, 2019 6:10 pm

Paramenter at January 23, 2019 at 2:01 pm; excerpt here:

“ . . . . .
[Bernie] Well no – Actually I am talking about recovering the full signal EXACTLY from bunched samples.
[Paramenter ] Yes, under certain circumstances such interlaced or ‘bunched’ samples would work. Unfortunately, not for all. For example, if you’ve got bursts of dense sampling followed by gaps – all at the irregular intervals signal recovery becomes highly problematic. . . . . . ‘‘

Reply

WRONG – it works for all. Do you suppose it works for [ . . . . 1 1 0 0 1 1 0 0 . . . . .] (burst followed by gaps to use your words), but not for [. . . . . 1 1 1 0 0 0 1 1 1 0 0 0 . . . . .] or for [ . . . . . 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 . . . . .] or even for [ . . . . . 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0. . . . .]. It in fact always works as long is the bandwidth is reduced by the fraction: (samples kept)/(total samples), which is ½ in these examples. At some point you need to BELIEVE THE MATH! What have I not made clear? Sampling theory can be subtle, but is not subject to what anyone “thinks should be true”. Enjoy it.

-Bernie

Bright Red
Reply to  Bernie Hutchins
January 23, 2019 6:34 pm

Bernie Said”It in fact always works as long is the bandwidth is reduced by the fraction: (samples kept)/(total samples), ”

Something we can agree on

Clyde Spencer
Reply to  Bernie Hutchins
January 23, 2019 8:15 pm

Bernie
You said, “At some point you need to BELIEVE THE MATH! What have I not made clear?” My first naive reaction is that you are peddling a perpetual motion machine. You say if you are missing data, the solution is to throw away some data. That sounds to me like sampling a ‘bunch’ at the beginning of a transmission, sampling a ‘bunch’ at the end, and then by throwing away some more data you can fill in the missing middle part of the transmission. OK, I’m being a little facetious, but I’ve never been comfortable with the “Less is more” mantra.

I do need to have it explained in more detail.

Paramenter
Reply to  Clyde Spencer
January 24, 2019 7:01 am

Hey Clyde,

I reckon Bernie refers here to interlaced sampling where, as far as I understand that, we can choose any number distinct points within Nyquist intervals. If samples are then taken from this pool, throwing away some of them still allows you to restore signal reliably. But that only works under particular circumstances and still has to follow strict rules. I don’t know why Bernie brought this concept – except for proving that non uniform sampling is still sampling I cannot see any relevance to daily max/min. Those bad boys have nothing to do with interlaced sampling.

Clyde Spencer
Reply to  Paramenter
January 24, 2019 7:29 am

Paramenter

It is obvious to me that if a signal has been over-sampled, then one has the luxury of compressing it by sub-sampling. Or, alternatively, if one is willing to live with aliasing or other loss of fidelity, then I can understand throwing away some data. But, in general, Bernie did not make a convincing case (at least to me) that you can get something for nothing.

Most of this discussion has centered around capturing an accurate representation of a temperature time-series, using audio recording as examples of how it should be done to prevent distortion. My experience with FFTs comes from the image processing field. I’m not sure whether the ear or eye is better at detecting corruption in the signal, but I do know from personal experience that aliasing or ringing is evident in images when signal processing rules are violated.

1sky1
Reply to  Bernie Hutchins
January 24, 2019 5:41 pm

At some point you need to BELIEVE THE MATH!

Amen! Those who patently fail to grasp the math invent the most tediously tortured arguments for disbelieving it.

Reply to  Paramenter
January 23, 2019 6:44 pm

Paramenter said January 23, 2019 at 2:01 pm: “Hey Bernie,
[Bernie] We are talking here of knowing (or NOT knowing) the sample time RELATIVE to a regular spacing. That is the “time” we don’t know. [Paramenter] Quite correct but that only makes things worse from the signal recovery point of view. Or starting from another end: what if you actually have timestamp per each daily min and max? You have array of not equally spaced values and you can somehow interpolate. Not splendid, but smaller error. Now, remove the timestamp? More uncertainty and bigger error then.”

(1) You say “what if you actually have timestamp”. If you don’t – you don’t – no “what if”.

(2) What is “somehow interpolate” (bandlimited, polynomial, min norm)?

(3) You say “Now, remove the timestamp? More uncertainty and bigger error then.” True. But in what sense are you “removing” that which you never had? So why not just compute the true mean. Where is the “glory” in doing things wrong?

-Bernie

William Ward
Reply to  Bernie Hutchins
January 23, 2019 8:29 pm

Hello Bernie,

The following statement is not meant to be a criticism. I’m stating it as a possible explanation of our disagreements. I have found on almost every post of yours I have read that I come away confused by what you are intending to say. If there is a problem here it may be mine as it relates to my ability to track what you are intending. (So please, no offense meant here). Maybe if we were talking in the same room this would resolve itself because of aspects of communication that are hard to fit into writing. I just mention it to offer an possible explanation as to why we seem to be at odds over some of this. I have the repeated experience of feeling like you are disagreeing and then you say something that sounds like you are agreeing or vice versa. Here is an example: Above in (3) you said: “(3) You say “Now, remove the timestamp? More uncertainty and bigger error then.” True. But in what sense are you “removing” that which you never had? So why not just compute the true mean. Where is the “glory” in doing things wrong?”

My thrust on this subject – and at the risk of speaking for Paramenter, his too, is to criticize the way max and min data are used. We are debating with you (I think) whether or not the methods used with max and min are violations of signal analysis or something else. You seem to agree max and min are not good for determining mean, but you think it is not because of signal analysis violations. When you say “where is the glory in doing things wrong?” I’m confused by why you say this. We are not recommending that anything is done wrong. We are recommending that they are done right. We are trying to illustrate how the use of max and min are samples and violations of sampling requirements. So let me try another approach.

Here is a hypothetical to illustrate: #1: We have a data acquisition system that delivers 288-samples/day of a real world analog signal. This analog signal can be recorded on magnetic tape. The analog signal can be simultaneously recorded on a chart recorder. So we have magnetic tape, chart recorder and digital samples. The chart recorder is not of much use, but it captures the signal with an analog representation. The magnetic tape could be played back through an amplifier to recreate the electrical signal that represents the original. The digital samples can be played back through a DAC to recreate the electrical signal that matches the original. You could do an experiment whereby (if levels are set correctly) you could subtract one of these from the other to null them out, proving that they are equivalent. You could take the recording from the magnetic tape and run it through an amplifier with a gain of -6dB, cutting the amplitude in half. In the digital domain, you can digitally divide the samples by 2 then feed this through the DAC. Aside from quantization error from the math operation, this signal matches the one from the tape. If the operations we do on the digital signal comply with allowed signal analysis operations then the digital version will always equal a similarly processed analog signal. Now what if we start to do DSP operations that are not supported by signal analysis theorems? What if we digitally search through the samples and identify the maximum and minimum values? (Of course, the max and min values may occur more than once each day, but for this exercise assume they only appear once each day.) Your algorithm can then discard all samples except these 2 (max and min). You have just done something that is not supported by signal processing. Your operation results in a digital signal that no longer represents the original. I call this a signal analysis violation. I think it is appropriate to call it a Nyquist violation because we don’t have periodic samples or enough samples that are required to reconstruct the signal. Now, continuing. The timing of these 2 samples is known, since they came from our 288-samples/day. But what if in parallel we also had a max/min thermometer there capturing the same event and suppose the max/min thermometer and the ADC system are matched such that they yield the same values. Someone could have been paid to stand there and watch the thermometer and write down the times that the max and min occurred. So we have 2 different methods of obtaining the same sample values and timing. Are these 2 scenarios signal analysis violations? Can either of them be used to get to the original signal? Will mathematical operations on these 2 samples with their corresponding timings ever relate to the original signal? Now, what if no one was there to capture the time max and min were reached? How does this now differ from the scenario of starting with 288-samples and discarding all but max and min. Furthermore, what if we discard or don’t use the timing from the 288-samples? Where in this process do you say we are not dealing with a signal analysis/Nyquist problem? And what is the mathematical or technical explanation that justifies it?

Additionally, sample clock jitter does not invalidate Nyquist. Do you agree? It adds error to the sampling process. Modern ADC jitter is very small: picoseconds. If we extract max and min from 288-samples we have sampled values and sample times. The timing of the max and min values take place 2x/day, so there is a periodic rate, but there is variability in that rate. While the magnitude of the variability between max and min and the magnitude of variability of a jittered clock are many, many orders of magnitude different, conceptually they are the same. At what time limit does jitter invalidate Nyquist? From a signal analysis perspective, what is the math that differentiates the scenarios? Could you differentiate a set of samples with max and min vs. a set with 2-samples/day and an irregular clock?

Unless these can be addressed then I conclude that the practice of using max and min to accurately calculate anything about the original signal is a violation of signal analysis/Nyquist requirements.

I think it is rational to say that if you are working with discrete values that come from an analog signal then you have samples. The following things will mean your samples will not represent or will not well represent your original analog signal: 1) Not enough samples/unit time, 2) deviation from regular sampling interval or 3) loss of timing information (or failure to record it).

Please only bring in bunched samples if you think bunched sampling can validate the use of max and min.

Ps – I will try to look through your Electronotes. Bernie, this is an impressive body of work!!! Nearly 50 years of publishing I see! (It has been a while since I have been reminded of the typewriter… seeing some of your older notes, I see you probably wore out many ribbons.)

Paramenter
Reply to  Bernie Hutchins
January 24, 2019 3:05 am

Hey Bernie,

(3) You say “Now, remove the timestamp? More uncertainty and bigger error then.” True. But in what sense are you “removing” that which you never had?

To illustrate a difference between better and worse situation. If have timestamp per each daily min/max your approximation of the recovered signal will be better. If you don’t have (and we don’t) your approximation will be worse. Not knowing timestamps per daily min/max makes situation worse form the signal recovery standpoint.

So why not just compute the true mean.

Because for most of the instrumental temperature record we have only daily min and max thus we cannot compute the true mean.

Where is the “glory” in doing things wrong?

Indeed.

(2) What is “somehow interpolate” (bandlimited, polynomial, min norm)?

Whatever you fancy and is suitable for this purpose. When I had a contact with this stuff I liked cubic splines interpolation-smooth first and second derivatives and Akima interpolation.

WRONG – it works for all.

Probably I was unclear. And because I want to enjoy it lets consider this use case: I synthesize a signal and then sample it using non-uniform fashion where each section of dense sampling can have variable length. Distances between sections of dense samples also vary. Between high density sampling sections there is sparse sampling or not at all. Now, because I’m a bad boy in the segments of the signal where sampling is sparse or none I generate high amplitude and high frequency signal. Where sampling is dense I introduce quiet signal with low amplitude and frequency. Now I’m sending you sampled this way signal asking to recover the original one. And, I’m afraid, there is not a slightest chance to do so.

If you’re really desperate I can make a graph illustrating this situation but I believe you’ve got the picture.

Paramenter
January 24, 2019 8:33 am

Hey Clyde,

Also for me errors associated with traditional methods of recording temperatures by daily min/max come from poorly restored original signal. Bigger distortion, bigger error we’ve got. Furthermore, relatively poor underlying data forces us to talk mainly about trends. But that’s only one aspect. Climatic models are very complex taking into account energy transfers, dynamic feedback mechanism and so on. Here, reliable signal is vital to get a clear picture. Marrying those two worlds and assuming that they are fully compatible may be wishful thinking.

Clyde Spencer
Reply to  Paramenter
January 24, 2019 10:46 am

Paramenter
I hope it isn’t just wishful thinking! To get a better estimate of the daily net energy transfer we need more than just two temperatures. If all stations were to ultimately transition to high-temporal resolution we could treat the historical data as proxy data for the true mean. The transition period could be used to confirm what, if any, error exists in historical trends and to then correct them in the long-term analysis.

William Ward
Reply to  Clyde Spencer
January 24, 2019 8:59 pm

Hi Clyde, Paramenter,

Until climate science graduates to feeding full signals into transfer functions and then getting back and analyzing resulting full signals I think our understanding is stagnant.

I also think this needs to build up from smaller regional transfer functions. But the goal of modeling climate may never be realized. Some who have studied climate modeling their entire lives have come away with the belief that it is not possible to model climate as climate is a non-linear and chaotic system of coupled feedbacks. (I think one of the IPCC reports admits as much). Today there are dozens of climate models – perhaps over 100. The exact number of them that backtest is zero.

January 24, 2019 6:09 pm

1sky1 said in part January 23, 2019 at 4:02 pm: “ A singular property of the much-maligned diurnal mid-range metric is the total absence of ANY aliasing, because that metric is determined not from discrete samples, but from the CONTINUOUS signal. While the (usually positive) offset from the true mean is a significant discrepancy, it can be greatly reduced by empirically determining for each station (and for each of 12 months) the coefficient 0 < eta <0.5 in a much more effective estimate, (1 – eta)Tmin + eta Tmax, of the true signal mean. ”

Bernie replies: Very possibly, you have the best answer.

William Ward said in part January 23, 2019 at 8:29 pm: “ Please only bring in bunched samples if you think bunched sampling can validate the use of max and min. “

Bernie replies: That is the reason for, and the implication of the suggestion.

Paramenter said in part at January 24, 2019 at 7:01 am: “ I reckon Bernie refers here to interlaced sampling where, as far as I understand that, we can choose any number distinct points within Nyquist intervals. If samples are then taken from this pool, throwing away some of them still allows you to restore signal reliably. But that only works under particular circumstances and still has to follow strict rules. I don’t know why Bernie brought this concept – except for proving that non uniform sampling is still sampling I cannot see any relevance to daily max/min. Those bad boys have nothing to do with interlaced sampling. “

Bernie replies: The ”strict rules” are just proper basic sampling. ” If you HAD the times of Tmax and Tmin, you can (I think) recover the mean exactly (not the entire record!).

Clyde Spencer said in part at January 24, 2019 at 7:29 am : “ It is obvious to me that if a signal has been over-sampled, then one has the luxury of compressing it by sub-sampling. Or, alternatively, if one is willing to live with aliasing or other loss of fidelity, then I can understand throwing away some data. But, in general, Bernie did not make a convincing case (at least to me) that you can get something for nothing. “

Bernie replies: It’s not something for nothing (last sentence). You paid up-front (your first sentence).

* * * * * * * * * *
GENERAL IDEA I HAVE IN MIND
You want the true mean. All you have is Tmax and Tmin. Above 1sky1 pointed out that you can make a useful correction to the trial value: (Tmax+Tmin)/2, based on empirical observations. Can we do better – Would it make a difference if we actually knew the times for these measurements? That is, we have nmax and nmin in addition to Tmax(n=nmax) and Tmin(n=nmin) for some supposed T(n). Let’s suppose we have a dense set of potential sample positions for any one day – perhaps n=0,1,2,3,. . . 47, which we could start with. We install Tmax at nmax and Tmin at nmin (0 for all other n). From these two non-uniform samples we “reconstruct” the full 48 samples, and filter back to DC.

Perhaps –

– Bernie

William Ward
Reply to  Bernie Hutchins
January 24, 2019 8:50 pm

Bernie,

Bernie said: “GENERAL IDEA I HAVE IN MIND…”

Would you like some USCRN data files to run your proposed algorithm on? The website that stores the data is not available due to the government shutdown, but one of us can get files to you. I recommend the “Sub-Hourly” data. You can extract Tmax and Tmin for each day and also get the timestamp of those samples. It would be interesting to see if you can do any DSP on those max & min samples and get a mean value that matches the 288-samples/day and furthermore see if you can reconstruct the original signal that the 288-samples/day matches. Just let us know how to reach you by email or if you prefer I can load some files on DropBox and give you a link.

Reply to  William Ward
January 26, 2019 7:53 pm

William –

Thanks for the offer – but in truth I am very far from trying the method on actual temperature data. I would have to work out the computational details first, and test them on some “toy” sequences.

Since this thread has quieted down, this is probably a convenient place to park the idea. This is based on a previous reference: http://electronotes.netfirms.com/EN200.pdf and a one-page “tape-up” summary is here:

http://electronotes.netfirms.com/BunchedSamples.jpg

The basic goal is to obtain a theoretical basis for any useful correction factor from (Tmax+Tmin)/2 to the true mean. 1sky1 suggested above such a correction (he/she called it “eta”) based on empirical data. I was hoping for a value for this factor based on the “offset” of Tmax and Tmin relative to a nominal ½ day separation.

The top portion off the single page is ordinary (uniform) sampling and the corresponding use of sinc functions as time-domain interpolators (shown for three samples (1, 0.6, and -0.4). Nothing unusual here. Suppose the sampling rate is 1/T = 288 (samples per day, say). Such sampling could support a bandwidth approaching 144 cycles/day. The temperature curve might be much more bandlimited – perhaps to 3 or 4 cycles/day. We don’t need 288 samples/cycle – perhaps 24 samples/day would do.

But we are kind of working right here toward a rate as low as 2 samples/day.

If we wanted to use two equally spaced samples, we would have to bandlimited to perhaps 0.5 cycles/day. This would surely cut into the fundamental (1 cycle/day) and we would have no hope of recovering the original temperature curve (as we could have with 288 or even 24 samples). However the mean (DC) would be recoverable.

Note that if we had the full temperature curve, and we just threw out all except two samples, we would have severe aliasing (including, overlap of images*). But we do not need to resamples with an analog antialiasing filter at 0.5. We just reduced the bandwidths with a basic low-pass digital filter (usually called, in this use, a “pre-decimation” filter.

Onward to unequal spacing (non-uniform or “bunched” sampling). What we have said is that if the bandwidth is sufficiently low, a much lower (even in average) sampling rate is what matters. The two samples per cycle (per day here), perhaps Tmax and Tmin, ALONG WITH THEIR TIME INDICES, would be enough, not to recover the full temperature curve of course, but likely the DC (mean) as with the equally spaced samples. That is, we recover the bandlimited (to 0.5) curve with the expectation that except close to DC, it is aliased beyond use.

How hard is it to handle the non-uniform case? Well there is a fair amount of discussion in my app note:
http://electronotes.netfirms.com/BunchedSamples.jpg
although actually coding it in general, or up to the size of just two arbitrary samples in 288, seems tedious. No theoretical issues however. The discussion in the note is entirely frequency (spectrally) based. I’m not sure that is a problem if we just want the value of the spectrum at zero. More illuminating at the moment is perhaps the bottom portion of the jpg here – the bunched sampling in continuous time. The graph there shows the interpolation functions for the bunched case. Compare to the sinc of the uniform case – there are two such functions, a(t) and b(t) as shown. One for the even samples, and the other for the odd samples as displaced. Note that these two interleaved sub-sequences are generally called “polyphases”.

Note three things: (1) The interpolation functions are not sinc functions but are built from sinc(2t) and t sinc^2(t); (2) These are for continuous time t, not for discrete time n, but it might basically involve the corresponding “periodic sincs”; (3) The top shows the weighted sum of three sinc function while the bottom shows just he interpolation functions themselves – since there are two of them.

PROSPECTS: I might hope to find that for some typical range of the time indices of Tmax and Tmin that the true mean can be estimated from (Tmax+Tmin)/2 by some theoretical factor. Myself, I am retired (with some emphasis on “tired”). If I were still working I would look at offering this as a proposed “senior project”, likely for 3 or 4 credits.

-Bernie hutchins@ece.cornell.edu

*the term “aliasing” sometimes refers to the case where we are really talking about normal “spectral images” about multiples of the sampling rate which may not actually overlap populated regions of the original spectrum and/or other images, and hence still offer a recovery opportunity (e.g. bandpass sampling).

Reply to  Bernie Hutchins
January 26, 2019 8:20 pm

Link to App Note should have been – sorry

http://electronotes.netfirms.com/AN356.pdf

William Ward
Reply to  Bernie Hutchins
January 27, 2019 9:41 pm

Bernie,

Thanks for your detailed reply. I agree that maybe we can park this discussion for now. It is not always easy to completely understand technical detail in these kind of exchanges and that misunderstanding can sidetrack what would be an otherwise stimulating discussion if we were in the same room interacting. Although we have bumped horns a few times in the discussions I have enjoyed talking with you. I’m fascinated by your massive library of Electronotes – I can see you have had a rich career! I hope you are enjoying retirement. I too retired – at least from my primary industry – but “retire” is not really in my vocabulary – I call it “Phase 2”. Now I focus on the projects and work of my choosing – if and when I want. I appreciate you sending your contact information. I’ll email you to reciprocate. Maybe one day I’ll look you up to pick your brain about a subject aligned with your experiences. Ps – one of my businesses is an audio engineering company. This grew into a record company with in-house engineering. I was a full member of AES for years but let my membership lapse. I used to enjoy the AES shows, especially in NYC.

1 3 4 5