By William Ward, 1/01/2019
The 4,900-word paper can be downloaded here: https://wattsupwiththat.com/wp-content/uploads/2019/01/Violating-Nyquist-Instrumental-Record-20190112-1Full.pdf
The 169-year long instrumental temperature record is built upon 2 measurements taken daily at each monitoring station, specifically the maximum temperature (Tmax) and the minimum temperature (Tmin). These daily readings are then averaged to calculate the daily mean temperature as Tmean = (Tmax+Tmin)/2. Tmax and Tmin measurements are also used to calculate monthly and yearly mean temperatures. These mean temperatures are then used to determine warming or cooling trends. This “historical method” of using daily measured Tmax and Tmin values for mean and trend calculations is still used today. However, air temperature is a signal and measurement of signals must comply with the mathematical laws of signal processing. The Nyquist-Shannon Sampling Theorem tells us that we must sample a signal at a rate that is at least 2x the highest frequency component of the signal. This is called the Nyquist Rate. Sampling at a rate less than this introduces aliasing error into our measurement. The slower our sample rate is compared to Nyquist, the greater the error will be in our mean temperature and trend calculations. The Nyquist Sampling Theorem is essential science to every field of technology in use today. Digital audio, digital video, industrial process control, medical instrumentation, flight control systems, digital communications, etc., all rely on the essential math and physics of Nyquist.
NOAA, in their USCRN (US Climate Reference Network) has determined that it is necessary to sample at 4,320-samples/day to practically implement Nyquist. 4,320-samples/day equates to 1-sample every 20 seconds. This is the practical Nyquist sample rate. NOAA averages these 20-second samples to 1-sample every 5 minutes or 288-samples/day. NOAA only publishes the 288-sample/day data (not the 4,320-samples/day data), so to align with NOAA the rate will be referred to as “288-samples/day” (or “5-minute samples”). (Unfortunately, NOAA creates naming confusion with their process of averaging down to a slower rate. It should be understood that the actual rate is 4,320-samples/day.) This rate can only be achieved by automated sampling with electronic instruments. Most of the instrumental record is comprised of readings of mercury max/min thermometers, taken long before automation was an option. Today, despite the availability of automation, the instrumental record still uses Tmax and Tmin (effectively 2-samples/day) instead of a Nyquist compliant sampling. The reason for this is to maintain compatibility with the older historical record. However, with only 2-samples/day the instrumental record is highly aliased. It will be shown in this paper that the historical method introduces significant error to mean temperatures and long-term temperature trends.
NOAA’s USCRN is a small network that was completed in 2008 and it contributes very little to the overall instrumental record. However, the USCRN data provides us a special opportunity to compare a high-quality version of the historical method to a Nyquist compliant method. The Tmax and Tmin values are obtained by finding the highest and lowest values among the 288 samples for the 24-hour period of interest.
NOAA USCRN Examples to Illustrate the Effect of Violating Nyquist on Mean Temperature
The following example will be used to illustrate how the amount of error in the mean temperature increases as the sample rate decreases. Figure 1 shows the temperature as measured at Cordova AK on Nov 11, 2017, using the NOAA USCRN 5-minute samples.

Figure 1: NOAA USCRN Data for Cordova, AK Nov 11, 2017
The blue line shows the 288 samples of temperature taken that day. It shows 24-hours of temperature data. The green line shows the correct and accurate daily mean temperature that is calculated by summing the value of each sample and then dividing the sum by the total number of samples. Temperature is not heat energy, but it is used as an approximation of heat energy. To that extent, the mean (green line) and the daily-signal (blue line) deliver the exact same amount of heat energy over the 24-hour period of the day. The correct mean is -3.3 °C. Tmax is represented by the orange line and Tmin by the grey line. These are obtained by finding the highest and lowest values among the 288 samples for the 24-hour period. The mean calculated from (Tmax+Tmin)/2 is shown by the red line. (Tmax+Tmin)/2 yields a mean of -4.7 °C, which is a 1.4 °C error compared to the correct mean.
Using the same signal and data from Figure 1, Figure 2 shows the calculated temperature means obtained from progressively decreased sample rates. These decreased sample rates can be obtained by dividing down the 288-sample/day sample rate by a factor of 4, 8, 12, 24, 48, 72 and 144. Therefore, the sample rates will correspond to: 72, 36, 24, 12, 6, 4 and 2-samples/day respectively. By properly discarding the samples using this method of dividing down, the net effect is the same as having sampled at the reduced rate originally. The corresponding aliasing that results from the lower sample rates, reveals itself as shown in the table in Figure 2.

Figure 2: Table Showing Increasing Mean Error with Decreasing Sample Rate
It is clear from the data in Figure 2, that as the sample rate decreases below Nyquist, the corresponding error introduced from aliasing increases. It is also clear that 2, 4, 6 or 12-samples/day produces a very inaccurate result. 24-samples/day (1-sample/hr) up to 72-samples/day (3-samples/hr) may or may not yield accurate results. It depends upon the spectral content of the signal being sampled. NOAA has decided upon 288-samples/day (4,320-samples/day before averaging) so that will be considered the current benchmark standard. Sampling below a rate of 288-samples/day will be (and should be) considered a violation of Nyquist.
It is interesting to point out that what is listed in the table as 2-samples/day yields 0.7 °C error. But (Tmax+Tmin)/2 is also technically 2-samples/day with an error of 1.4°C as shown in the table. How can this be possible? It is possible because (Tmax+Tmin)/2 is a special case of 2-samples per day because these samples are not spaced evenly in time. The maximum and minimum temperatures happen whenever they happen. When we sample properly, we sample according to a “clock” – where the samples happen regularly at exactly the same time of day. The fact that Tmax and Tmin happen at irregular times during the day causes its own kind of sampling error. It is beyond the scope of this paper to fully explain, but this error is related to what is called “clock jitter”. It is a known problem in the field of signal analysis and data acquisition. 2-samples/day, regularly timed, would likely produce better results than finding the maximum and minimum temperatures from any given day. The instrumental temperature record uses the absolute worst method of sampling possible – resulting in maximum error.
Figure 3 shows the same daily temperature signal as in Figure 1, represented by 288-samples/day (blue line). Also shown is the same daily temperature signal sampled with 12-samples/day (red line) and 4-samples/day (yellow line). From this figure, it is visually obvious that a lot of information from the original signal is lost by using only 12-samples/day, and even more is lost by going to 4-samples/day. This lost information is what causes the resulting mean to be incorrect. This figure graphically illustrates what we see in the corresponding table of Figure 2. Figure 3 explains the sampling error in the time-domain.

Figure 3: NOAA USCRN Data for Cordova, AK Nov 11, 2017: Decreased Detail from 12 and 4-Samples/Day Sample Rate – Time-Domain
Figure 4 shows the daily mean error between the USCRN 288-samples/day method and the historical method, as measured over 365 days at the Boulder CO station in 2017. Each data point is the error for that particular day in the record. We can see from Figure 4 that (Tmax+Tmin)/2 yields daily errors of up to ± 4 °C. Calculating mean temperature with 2-samples/day rarely yields the correct mean.

Figure 4: NOAA USCRN Data for Boulder CO – Daily Mean Error Over 365 Days (2017)
Let’s look at another example, similar to the one presented in Figure 1, but over a longer period of time. Figure 5 shows (in blue) the 288-samples/day signal from Spokane WA, from Jan 13 – Jan 22, 2008. Tmax (avg) and Tmin (avg) are shown in orange and grey respectively. The (Tmax+Tmin)/2 mean is shown in red (-6.9 °C) and the correct mean calculated from the 5-minute sampled data is shown in green (-6.2 °C). The (Tmax+Tmin)/2 mean has an error of 0.7 °C over the 10-day period.

Figure 5: NOAA USCRN Data for Spokane, WA – Jan13-22, 2008
The Effect of Violating Nyquist on Temperature Trends
Finally, we need to look at the impact of violating Nyquist on temperature trends. In Figure 6, a comparison is made between the linear temperature trends obtained from the historical and Nyquist compliant methods using NOAA USCRN data for Blackville SC, from Jan 2006 – Dec 2017. We see the trend derived from the historical method (orange line) starts approximately 0.2 °C warmer and has a 0.24 °C/decade warming bias compared to the Nyquist compliant method (blue line). Figure 7 shows the trend bias or error (°C/Decade) for 26 stations in the USCRN over a 7-12 year period. The 5-minute samples data gives us our reference trend. The trend bias is calculated by subtracting the reference from the (Tmaxavg+Tminavg)/2 derived trend. Almost every station exhibits a warming bias, with a few exhibiting a cooling bias. The largest warming bias is 0.24 °C/decade and the largest cooling bias is -0.17 °C/decade, with an average warming bias across all 26 stations of 0.06C. According to Wikipedia, the calculated global average warming trend for the period 1880-2012 is 0.064 ± 0.015 °C per decade. If we look at the more recent period that contains the controversial “Global Warming Pause”, then using data from Wikipedia, we get the following warming trends depending upon which year is selected for the starting point of the “pause”:
1996: 0.14°C/decade
1997: 0.07°C/decade
1998: 0.05°C/decade
While no conclusions can be made by comparing the trends over 7-12 years from 26 stations in the USCRN to the currently accepted long-term or short term global average trends, it can be instructive. It is clear that using the historical method to calculate trends yields a trend error and this error can be of a similar magnitude to the claimed trends. Therefore, it is reasonable to call into question the validity of the trends. There is no way to know for certain, as the bulk of the instrumental record does not have a properly sampled alternate record to compare it to. But it is a mathematical certainty that every mean temperature and derived trend in the record contains significant error if it was calculated with 2-samples/day.

Figure 6: NOAA USCRN Data for Blackville, SC – Jan 2006-Dec 2017 – Monthly Mean Trendlines

Figure 7: Trend Bias (°C/Decade) for 26 Stations in USCRN
Conclusions
1. Air temperature is a signal and therefore, it must be measured by sampling according to the mathematical laws governing signal processing. Sampling must be performed according to The Nyquist Shannon-Sampling Theorem.
2. The Nyquist-Shannon Sampling Theorem has been known for over 80 years and is essential science to every field of technology that involves signal processing. Violating Nyquist guarantees samples will be corrupted with aliasing error and the samples will not represent the signal being sampled. Aliasing cannot be corrected post-sampling.
3. The Nyquist-Shannon Sampling Theorem requires the sample rate to be greater than 2x the highest frequency component of the signal. Using automated electronic equipment and computers, NOAA USCRN samples at a rate of 4,320-samples/day (averaged to 288-samples/day) to practically apply Nyquist and avoid aliasing error.
4. The instrumental temperature record relies on the historical method of obtaining daily Tmax and Tmin values, essentially 2-samples/day. Therefore, the instrumental record violates the Nyquist-Shannon Sampling Theorem.
5. NOAA’s USCRN is a high-quality data acquisition network, capable of properly sampling a temperature signal. The USCRN is a small network that was completed in 2008 and it contributes very little to the overall instrumental record, however, the USCRN data provides us a special opportunity to compare analysis methods. A comparison can be made between temperature means and trends generated with Tmax and Tmin versus a properly sampled signal compliant with Nyquist.
6. Using a limited number of examples from the USCRN, it has been shown that using Tmax and Tmin as the source of data can yield the following error compared to a signal sampled according to Nyquist:
a. Mean error that varies station-to-station and day-to-day within a station.
b. Mean error that varies over time with a mathematical sign that may change (positive/negative).
c. Daily mean error that varies up to +/-4°C.
d. Long term trend error with a warming bias up to 0.24°C/decade and a cooling bias of up to 0.17°C/decade.
7. The full instrumental record does not have a properly sampled alternate record to use for comparison. More work is needed to determine if a theoretical upper limit can be calculated for mean and trend error resulting from use of the historical method.
8. The extent of the error observed with its associated uncertain magnitude and sign, call into question the scientific value of the instrumental record and the practice of using Tmax and Tmin to calculate mean values and long-term trends.
Reference section:
This USCRN data can be found at the following site: https://www.ncdc.noaa.gov/crn/qcdatasets.html
NOAA USCRN data for Figure 1 is obtained here:
NOAA USCRN data for Figure 4 is obtained here:
https://www1.ncdc.noaa.gov/pub/data/uscrn/products/daily01/2017/CRND0103-2017-AK_Cordova_14_ESE.txt
NOAA USCRN data for Figure 5 is obtained here:
NOAA USCRN data for Figure 6 is obtained here:
https://www1.ncdc.noaa.gov/pub/data/uscrn/products/monthly01/CRNM0102-SC_Blackville_3_W.txt
Quite a few comments, mostly from the Stats people, suggest that we are looking at the temperature record from the viewpoint of “But the only result that matters is the monthly average. “. If one only cares about a monthly average of Min/Max temperatures — just wants to have a number to play with and is willing to be honest about the uncertainty ranges, then all this is moot.
But for Climate Science, we are interested really in the energy in the atmospheric system for which temperature (sensible heat) is used as a proxy.
“Just a monthly average” does not inform us accurately or precisely about the energy — not even the sensible heat — when it is calculated from Min/Max. For rough back of the envelope figuring, it is probably accurate enough but comes with large uncertainty bars — uncertainty bars greater than the posited change in global average surface temperatures of the 20th century.
Claims that this years GAST is x.x degrees C over the 1890 values are not scientifically sound. We only can guess at the GAST of 1890, with an uncertainty range as large as the “calculated” change since then. Only in the post-WWII era, were there finally enough weather stations operating in enough diverse places to get some kind of scientific idea of a global average — but even then the uncertainty is wide.
And certainly, going forward, automated weather station data, with its 5-min averages, are obviously superior to the old (but necessary) Min/max method. Min/Max should be discontinued altogether (regardless of one’s opinion about Nyquist).
+1000
“If one only cares about a monthly average of Min/Max temperatures — just wants to have a number to play with and is willing to be honest about the uncertainty ranges, then all this is moot.”
It doesn’t matter what wishes you may have. What matters is what calculations are actually done with the numbers. And what happens is that these min/max numbers are aggregated into at least monthly averages. No-one is trying to use them to reconstruct a high frequency signal, which is what the Nyquist talk is about.
“Min/Max should be discontinued altogether (regardless of one’s opinion about Nyquist).”
It’s done where needed for consistency with the older record. With modern data, you can do whatever you feel is best.
Kip,
The Stats People are so quick to dive into stats and averages that they miss the opportunity to study what is going on at one station. Analysis of individual stations may provide some interesting information. Averaging everything means you lose anything unique.
Speaking of averages, do you know where I can find the scientific (thermodynamic) justification for averaging temperatures from multiple locations? Also, what is the equation this average temperature is fed into? I have been looking and can’t find it.
KH,
So should diurnal temperature range. Too much apples subtracted from oranges.
Sucking on an orange, summarising years of looking at Australian numbers, I would put a 2 sigma error envelope for DTR and Taverage(Tmedian?) at something more than +/- 2 deg C. when all imaginable sources of error are included for the 1910 onwards historic raw temperature data.
There are many exercises that you cannot do when errors are as large as that.
The term “unfit for purpose” is used by some. Geoff.
@WW
” I think Fig 4 seems to favor error towards warming ( positive error).”
An error estimation is considered unbiased if its expected value (mean) is zero. In theory. In practice, it will never be exactly zero. My eyeball says its unbiased overall, especially since you also said the errors from different cities vary from slightly positive to slightly negative.
Do you still believe that undersampling causes errors in the temperature measurements or their statistics? I don’t see how that can change any measurement values, in the sense that the inverse Fourier transform always returns the same data which was input to the Fourier transform, even if the sample were “undersampled” or otherwise random values. And the _signal energy_ (sum of squared amplitudes) is not affected by aliasing. (See my post above for proof of that).
Hey Johanus,
Do you still believe that undersampling causes errors in the temperature measurements or their statistics?
You should refer then to the figure 2 of the article and comment it appropriately. According to it decreasing sampling rate increases error magnitude and vice versa.
And the _signal energy_ (sum of squared amplitudes) is not affected by aliasing.
How that can be? If you undersample you’re loosing high frequency components and energy associated with that, in our case fast changes in the daily temperature signal.
“decreasing sampling rate increases error magnitude and vice versa”
Yes, those are digitization errors, i.e. the residual errors between the actual temperature curve and the sampling intervals, which act as a series of linear approximations to the actual curve. These errors can be made arbitrarily small (up to quantization errors) by increasing the sampling rate.
This has nothing to do with undersampling, which has no effect on individual sample values, right?
“If you undersample you’re loosing high frequency components and energy associated with that, in our case fast changes in the daily temperature signal.”
The high frequencies are not lost, merely shifted in frequency. No change in amplitudes, so energy is conserved. (See my post above for proof of this).
So I still don’t see how undersampling causes erroneous temperature measurements, as WW claims:
“Sampling at a rate less than [the Nyquist limit) introduces aliasing error into our measurement.”
Hey Johanus,
The high frequencies are not lost, merely shifted in frequency. No change in amplitudes, so energy is conserved. (See my post above for proof of this).
I’ve got a strange feeling that is not quite correct. Could you actually run powers spectrum density against few examples of real temperature data and share that? I bet that aliased (heavily undersampled, say 2 per day) signal writes in higher frequency energy into lower ones therefore total energy between an original and aliased signal will differ.
Deliberate undersampling, for the purpose of shifting signals from a passband to baseband, is the basic idea behind “passband sampling”:
https://en.wikipedia.org/wiki/Undersampling
Note that the undersampling is done on signals that are band-limited in both max and min frequencies, such that the aliased signal envelopes are not distorted, merely shifted down in frequency.
William Ward January 16, 2019 at 2:49 pm
Actually, you are both wrong. 1sky1 is wrong that Shannon only applies to strictly periodic samples.
And William is wrong about max and min being just jittered signals.
Let me demonstrate. I have five years of hourly data for 30 US cities. Taking one at random, San Francisco, I calculate the monthly average of maxes and mins, as well as the true daily means. The error in the result is that the average monthly (min+max)/2 is 0.52°C warmer than the true monthly means.
Next, I do the same, but instead of taking the two daily temperatures as the maxes and mins, I take two hourly measurements at random from each day. When I repeatedly take the average of those, instead of 0.52°C, the average monthly error is 0.007°C.
This good result from the random picks is in agreement with Nyquist, as we are interested in monthly data and we have about sixty samples per month. Plenty of oversampling.
But that does NOT work with the (min+max)/2.
The problem that William appears not to see is that the min and max values are NOT just jittered samples. They are specially chosen samples, and as a result of the nature of the choosing and of the signal, additional error is introduced.
And this exemplifies what I have been saying. (Max + Min)/2 is a poor estimator of the true mean of a signal, and that is NOT a result of Nyquist. In my example above, in both cases we are sampling at sixty times the frequency of interest (sixty samples per month), but the (Max + Min)/2 still has huge errors and a true jittered sampling does not.
As I said at the start … William, you have the right problem (inaccuracy of the (Max + Min)/2) but the wrong reason (Nyquist).
w.
“the min and max values are NOT just jittered samples. They are specially chosen samples, – Willis Eschenbach”
I totally agree with Willis and I’m also starting to realise what very special samples* they are! I think I may have been too hasty in bagging* Tmean now!
Min/Max is a very special kind of sample/selection because they are self “clocked” as it were – I think Nick Stokes said as much but with more precise language – because you are not taking two random samples your are picking the peak and trough of a“signal” that just happens to have, on average, a 12 hour half cycle (In the real world) and you are doing this twice a day – at its frequency! So you are actually deliberately and perfectly accurately, measuring wave height. It doesn’t matter if they came from hourly samples or min/max thermometers, this “selection” of just two discrete values is the same process, of course.
By this simple triangle a “diurnal wave” is very well defined or should I say confined and it’s not quite as hard to imagine why the errors in Tmean are not as large as I first thought they should be. I’m not saying Tmean is right! I’m just wriggling a little bit! 😉
Anyway, it is more to think about or have explained to me by the better qualified, on this long and intriguing post!
*Aussie slang for denigrate it, i.e. I might have been rrrrrrr, rrrr…wrong!
**Tmean(min+max)/2
I said nothing of the kind! What I said is that “Shannon’s sampling theorem applies to strictly periodic (fixed delta t) discrete sampling of a continuous signal.” This applies to all continuous signals, not just strictly periodic ones. Without strictly periodic sampling there can be no defined Nyquist frequency, 1/(2*delta t) and no bandlimited signal reconstruction, which is what Shannon’s Theorem is all about.
Of course random sampling of discrete ordinates will produce a closer estimate of the true mean of the signal than the mid-range value (Tmax + Tmin)/2, simply because the latter is a demonstrably BIASED estimator of the mean, due to the typically asymmetric wave-form of the diurnal cycle. The claim that “[t]his good result from the random picks is in agreement with Nyquist” is analytically nonsensical. Even a severely aliased data series will produce close, randomly sampled estimates of the signal mean, as long as that aliasing doesn’t extend into zero-frequency. Nyquist has little to do with the quality of UNBIASED estimates.
BTW, what no one noted throughout this entire discussion is that the true daily extrema also define the daily range Tmax – Tmin. This physically significant, practically useful metric is not readily available from any but the most highly oversampled data series.
Oops! The mid-range value is really (Tmax + Tmin)/2.
1sky1 January 17, 2019 at 1:06 pm
Fixed.
w.
1sky1 January 17, 2019 at 12:48 pm
My bad, I wrote “signals” when I meant “samples”. I meant to say:
“1sky1 is wrong that Shannon only applies to strictly periodic samples.”
Regards,
w.
What you wrongly meant to say I already covered, to wit:
Willis you said: “As I said at the start … William, you have the right problem (inaccuracy of the (Max + Min)/2) but the wrong reason (Nyquist)”
Quite so my friend.
I have pointed out to him that we all agree that (Tmax+Tmin)/2 as a substitute for mean is quite silly (your “right problem”), and that this Fundamental Flaw occurs in cases where we have not even sampled yet and may not ever, so aliasing as a cause would be – I guess, non-causal (your “wrong reason”).
Now he seems to say he has given up on both of us! (Agree to disagree?) Ha – that never got me out of a jam! He doesn’t seem to appreciate a friendly lifeline being tossed his way.
Stay well.
-Bernie
Sorry about the wrong place again.
Bernie said “To be even more clear, you would take two analog “peak detectors” (a diode, a capacitor, and an op-amp or two for convenience), one for (+) polarity relative to start and the other for (-) polarity. Reset both at midnight, and come back at 11:59:59 PM and read the outputs. There is no sampling.”
Have you written down a value or caused a value to be stored in a computer memory if so then you have sampled the signal. It really is that simple. Using an analog hold circuit to delay the point the sample is taken changes nothing other than makes it convenient for a human to read while still doing other tasks. I think the world has moved on with the invention of the microprocessor.
Bright Red,
You said: “Have you written down a value or caused a value to be stored in a computer memory if so then you have sampled the signal. It really is that simple.”
My reply: I wrote something similar to Bernie earlier but then deleted it. I already signed off with him and thought maybe it was bad form for me to come back with that. I was hoping you would! Thanks. It is so obvious. A peak detector is not much different that a sample and hold circuit but with the S&H it needs something to trigger it. An S&H is the first stage of many older ADC architectures. It was almost poetry that Bernie suggested his circuit, because he spelled out what we have been saying but he doesn’t seem to know it. Just like with the sample reduction point.
This word “sample” has people all spun up, but if you measure a signal you sample it.
Bright Red said just above: “Have you written down a value or caused a value to be stored in a computer memory if so then you have sampled the signal.”
Well – No. That would be the output of a peak detector, an extracted PARAMETER of the analog signal, quite a different thing than the samples of a proper time-series.
* * * * * * * * *
Bright Red also said on January 16, 2019 at 6:23 pm “What you have described is a classic case of aliasing that could be used as an example in 101.”
Can we possibly stop this insulting pseudo-condescending “signals 101” (I know it was William who started it). Having taught signal processing in a top-10 engineering school for over 40 years, and living in a college town, I have found it risky to assume that you are the smartest one in the room! Likewise for all posting on WUWT. [I myself am happy to try to explain to others material which I know VERY well.]
* * * * * * * * * *
On January 16, 2019 at 6:45 pm, I asked you to explain your position/thinking, and you didn’t even try. In fact, why don’t YOU just move on and answer my comment to William at Bernie Hutchins January 16, 2019 at 8:57 pm ?
Bernie – its nearly 3AM where I am, can I get a little time to sleep, work and come back to you tomorrow night? I have not give up on you. Or Willis – I’m going to try to come back to both of you tomorrow night. I’m at a critical point in a project and I have to make time to do this and do it with quality thinking. There is also the question of how prudent is it to continue when there is such a large gap in understanding. I have been thinking all night about a creative way to try to connect with you both on this. I’m not sure I’ll be successful and I think we should all reserve the right to pull away if we think further communication is not constructive but destructive. Standby for tomorrow evening guys. Thanks and good night.
William, please reply at your convenience. We all have other things to do.
Best regards, get some sleep, we’ll pick this up again.
w.
OK, new day. Lying sleepless last night, I realized that I could demonstrate that the problem with the traditional way of determining the mean is NOT the Nyquist limit.
Here is a plot of RMS errors of monthly temperature averages for 30 US cities with respect to the accurate values we get from sampling every 5 minutes (288 samples per day). As you may recall from above, averages of hourly data are only very slightly less accurate than averages of 5-minute data. Here’s that comparison:
So instead of comparing to 5-minute samples, which I only have for one location, I compared to hourly samples, which I have for 30 US cities. As shown above, this will lead to only negligible error.
First, here are the results for evenly spaced samples:
A few comments. First, if the frequency is an even divisor of 24 hours, the error is larger because every day the samples are taken at the same time. These are indicated by the dashed line. So if we take samples every twelve hours, the error is larger than if we sample at either 11-hour or 13-hour intervals.
Next, note the size of the traditional calculation of the mean, which is (minimum temperature + maximum temperature) divided by two. It’s the dot way up at the top of the graph … as I’ve said all along, the problem is NOT the Nyquist limit. It is that (min+max)/2 is a poor estimator of the true mean.
Now, could the problem with (min+max)/2 merely be because the samples are taken at different times during each day? Well, we can examine that too, by taking samples at random times. The figure below shows that result.
Once again, you can see that the problem is NOT the Nyquist limit. The (min+max)/2 error is still way larger than just taking two random samples every day. In fact, if we just took two random samples per day, that is about the Nyquist limit for monthly temperature averages … go figure. But the (min+max)/2 error is much worse than that.
My best to all, my thanks to William for both posting and defending his ideas.
w.
Hey Willis,
Its 7:00 PM EST, I’m back at my computer. I have been thinking for the past 18 hours about another way to try to bridge the gap. I’m going to go off now and try to write it up. I don’t know how long it will take. Are you wiling to continue? If not, send up a flag.
Here is one problem you are introducing with your analysis. You are looking at multiple stations and doing more averaging and statistics. Maybe not a perfect analogy, but you are studying the tree by looking at the forest. I want to focus on the tree first. Then move on to the forest. If you leave now, you will sadly not learn some important things about signal analysis – that might come in handy later. If the error from sampling problems diminishes or disappears when looking at longer periods of time or when averaging a large sample of stations this is different from no error being there in the first place. I’m not saying that is where we are headed, just trying to persuade you to consider that angle.
Note: I’m not a statistics guy. Maybe you can try to return the favor in the future with statistical analysis.
The first thing I thought about when saw William’s first WUWT graph (his Fig. 1) was if it was the case that this was in some sense a typical warm-up-cool-down cycle. The second thing was that it was OBVIOUS even from his one example that (Tmax+Tmin)/2 could not possibly be a reasonable estimate of the mean. This was a Fundamental Flaw, and since they had some 288 actual samples, why would they NOT calculate the mean as the (sum-of-samples)/288 – the right way. OH – bureaucratic inertia – I forgot that.
Thirdly, I wondered what a typical daily temperature curve might look like. Willis obligingly (as is usual for him) crunched tons of data with averages provided for this thread here at January 15, 2019 at 9:40 pm. It clearly shows that the typical curve is more regular than we might have suspected, and that the (Tmax+Tmin)/2 mean is significantly above the true mean, and why (values on the positive side approaches 1.5 while those on the negative side approach only -1.3)
What about the shape of the curve (Willis shows us two full cycles for clarity)? It looks a lot like a sine wave; but noticeably different. Here we could use a DFT (FFT) but since we already have a well-defined natural period (24 hours) a Fourier Series discussion is somewhat more familiar and easier to understand. http://electronotes.netfirms.com/AN364.pdf
Since it is clearly not a pure sine wave, but is periodic with a period of 24 hours, it has harmonics. Willis has also shown us here (at January 14, 2019 at 3:49 pm), using his preferred periodogram, that this is mostly fundamental, with a bit of 2nd and 3rd harmonics. That’s about it. So as he says, you would need to sample this at minimum sampling rate greater than 6 times/day (perhaps 24 times/day would satisfy most everyone).
So Willis has shown us the animals in the zoo. It could scarcely be more clear that any sampling issues can be controlled. It is further clear that the major error in using (Tmax+Tmin)/2 to represents the mean is a Fundamental Flaw (FF) that is the same, with or without sampling, so the FF is not (CAN NOT) be CAUSED by aliasing.
So, to be completely clear, the error in William’s aliasing column (Tmean C) of his Fig. 2 ARE due to aliasing, since he downsamples without first doing the required pre-decimation filtering. If done properly, the numbers would be -3.3, the mean being preserved exactly, for the 8 rows above the -4.7.
I understand perfectly what William was trying to show (thinks is true!) but he was up against an apples/orange (FF vs. aliasing) problem at the very start.
Bernie,
You said: “Since it is clearly not a pure sine wave, but is periodic with a period of 24 hours, it has harmonics. Willis has also shown us here (at January 14, 2019 at 3:49 pm), using his preferred periodogram, that this is mostly fundamental, with a bit of 2nd and 3rd harmonics. That’s about it. So as he says, you would need to sample this at minimum sampling rate greater than 6 times/day (perhaps 24 times/day would satisfy most everyone).”
My reply: When sampling occurs at 2-cycles/day, where does the spectral image shift? Answer up and down by 2 cycles. What spectral image components land on (alias to) the trend components (near 0Hz or 0-cycles/day)? Answer: the near 2-cycle/day components.
Image from my Full paper (Fig8): https://imgur.com/DmXCBOt
What spectral image components alias to the daily trend (1-cycle/day)? Answer: the 1- and 3-cycle/day components. What did you say above about content at 2nd and 3rd harmonics?? How is it that you keep agreeing with me and then disagree with me? Perplexing.
Did you look at any FFTs of any signals? How far out in frequency do these signals go. Answer: infinity. But how much of that seems to affect things like mean calculation below 0.1C variability? Answer: 288-samples/day seems to be the rate at which we reach this limit so that means 144-cycles/day. If you just can’t handle that then lets divide that by 4 and we have 36-cycles/day. That would mean we have to sample above 72-cycles/day. Why do you think you can ignore the energy above the 3rd harmonic?
This image shows the overlap at a sample rate of 12-cycles/day. Can you visualize how much more overlap there is at 2-cycles/day?
https://imgur.com/xaqieor
Its not about “satisfying most everyone”, it is about capturing the frequency content that any day at any station can produce. Looking at what the average is doesn’t tell you what the maximum requirement is. Through experimentation that value can be decided upon and anti-aliasing filters built into the sampling system to match.
Bernie said: “So, to be completely clear, the error in William’s aliasing column (Tmean C) of his Fig. 2 ARE due to aliasing, since he downsamples without first doing the required pre-decimation filtering. If done properly, the numbers would be -3.3, the mean being preserved exactly, for the 8 rows above the -4.7.”
Bernie, you have this magical way of disagreeing with me while simultaneously proving my point. So you agree that my table in Fig 2 shows the aliasing error increasing as sample rate decreases to 2-samples/day. What I show is what would happen if you sampled all of the spectrum without using an anti-aliasing filter and a low sample rate. What if the max and min temps just line up perfectly with those 2 samples spaced at 12 hours apart? Is it aliasing?
If a SRC was used to downsample to 2-samples/day, you would be filtering out content. There would be no aliasing but you would not have the same signal. You would have a sine wave. Are you sure you would preserve the -3.3V mean through all steps? Would it even be close?
Finally, what is the “OBVIOUS” Theorem? How does one implement it? And what is the Fundamental Flaw (FF) identity? How does one go about learning it and using it?
Continuing questions for William
William Ward said January 17, 2019 at 12:02 am ” It is so obvious. A peak detector is not much different that a sample and hold circuit but with the S&H it needs something to trigger it. An S&H is the first stage of many older ADC architectures. It was almost poetry that Bernie suggested his circuit, because he spelled out what we have been saying but he doesn’t seem to know it. “
[5] Of course I don’t “know it” – only you know things! Oh wait – I do know one thing: a PD and a S&H are quite DIFFERENT animals. I suggested two parallel-polarity PDs as being the same idea as a classic min/max thermometer. Did you miss that point, or is the hole in your knowledge a bit larger than you suppose?
William Ward said January 17, 2019 at 10:23 pm
“Finally, what is the “OBVIOUS” Theorem? How does one implement it?
[6] You do know the meaning of “obvious” I assume – you used it yourself (in error) as quoted above. I use it as in “obviously estimating a mean as (Tmax + Tmin)/2 is asking to be wrong whether the signal is continuous or discrete.”
”””””””””””””””””””””
“And what is the Fundamental Flaw (FF) identity? How does one go about learning it and using it?”
[7] It is recognizing that (Tmax+Tmin)/2 is already bogus as a continuous-time signals and remains essentially flawed for the same reason if sampled.
-Bernie
Willis
You said, “Lying sleepless last night, I realized …” I’m glad to see that I’m not the only one afflicted with the inability to shut my mind down. 🙂
Willis,
You said, “Next, note the size of the traditional calculation of the mean, which is (minimum temperature + maximum temperature) divided by two.” PLEASE don’t continue to call it a mean. Call it an average if you must, but “mid-range value” is more accurate.
https://sciencing.com/calculate-midrange-7151029.html
The truly effective reason is because such sampling rates alias some harmonic components of the asymmetric diurnal wave-form into zero-frequency, i.e. into the mean value.
William: Interesting work, but far from demonstrating that a real problem exists.
You have demonstrated a significant difference between the mean temperature recorded every five minutes and the conventional average of Tmax and Tmin. However that difference will vanish when you take temperature anomalies. What we want to know is the temperature trend, not absolute temperature.
In Figure 7, you showed the bias in the trend removed by using the true average temperature rather than the convention average of Tmin and Tmax. That bias averaged 0.066 K/decade over 26 stations. Is that a significant difference? You avoided showing us any useful information about trends at all.
If I go to Nick Stoke’s trend viewer and look at NOAA’s GLOBAL land temperature trends from 1/2007 to 12/2017, I find a trend of 0.43 K/decade with a 95% ci of 0.14 to 0.71 K/decade. Now the trends for the US could be very different, but local fluctuations are likely to be bigger than global fluctuations and the confidence interval would likely be wider. So you appear to have found a bias of 0.066 K/decade in trends with a confidence interval that is at least 0.5 K/decade wide. From this perspective, you haven’t come close to identifying a significant problem.
In Figure 6, if you had plotted individual annual temperatures, your readers would have seen year-to-year temperature changes of perhaps 1 K – in addition to the trend lines you presented. Readers would have immediately seen that the difference in trend was trivial compared to the noise in the data and uncertainty in the trend. Hopefully, the absence of such relevant information was an oversight. Note that the trend was greater than +1 K/decade at the location where you found the greatest bias – 0.24 K/decade; a bias that was a small fraction of the trend. And we probably would have seen some years where the average of Tmax and Tmin was lower rather than higher than the continuous average. This would be even more likely with monthly data, but using monthly data might require dealing with temperature anomalies.
If we had 2-4 decades of USCRN data, a bias of 0.066 K/decade a trend could be comparable to the uncertain in the trend and therefore significant. Over a century, the bias could amount to 0.7 K more warming when measured by the average of Tmax and Tmin. That would certainly be non-trivial, but extrapolation is somewhat absurd given the uncertainty. Which leads to the interesting question: How do AOGCMs calculate warming. They have 96 readings per day (every 15 min). Do they average all 96 readings per day (my guess) or average the highest and lowest?
Frank,
” Now the trends for the US could be very different”
I did some analysis of OLS trends for USCRN here. The sd of the trends is about 8 °C/Cen, giving an uncertainty of mean trend of about 0.8°C/Cen, on an OLS basis. They aren’t independent, so the true uncertainty would be higher. The uncertainty σ is higher than the trend difference, so the sign of that difference (bias) is not significant.
Frank,
You said, “What we want to know is the temperature trend, not absolute temperature. ” Well, actually, with the recent concern shifting to energy accumulation (and where it might be “hiding”), we do need to know about ‘absolute’ temperatures.
William Ward January 17, 2019 at 4:13 pm
I’m willing to continue … kinda …
Let me review the bidding here.
You came in and your first claim was that to sample temperature, Nyquist said that we have to sample it at 2X the highest frequency, viz:
I pointed out that the climate is a chaotic signal with frequencies that have periods all the way down to seconds … so how are we supposed to sample that? And Nick Stokes objected as well, saying:
Despite that, you have not yet admitted that your initial claim was wrong.
You next claimed that we need to sample at 288 cycles per day to be above the Nyquist limit. Then you said well, no, we only need to sample at a frequency where further increases don’t provide a significant improvement
I showed that 5-minute sampling is only trivially better than hourly sampling.
You have not yet admitted that your claim about needing to sample at 288 cycles per day was wrong.
You then claimed that Nyquist applied to the (max+min)/2 because it was just a jittered sample.
This morning I posted up a graph showing that no, jittered samples do much, much better than (max+min)/2. Not only that, but I also showed that two regularly spaced samples per day also do much, much better than (max+min)/2.
Which totally supports my claim, which was that the problem is real ((min+max)/2 is a poor estimator of the mean) but it is NOT because of your imaginary violation of Nyquist.
No reply.
And now you claim you have something to teach me?
Perhaps you do, I can learn something from most people …but so far all I’ve learned from you is that you are an arrogant man who thinks he knows more than everyone else, and who is unwilling to admit it when he makes an error.
And so far, I haven’t learned one damned thing from you about signal analysis. EVERYTHING that you’ve said so far I knew already. No surprises, nothing new.
You continue …
‘
I know that, William … but again, we’re looking at CLIMATE here, which is generally taken to be the average of weather over 30 years or more. So yes, in general, we are indeed averaging a large number of stations over a long period. Which means that once again, your objection is a difference that makes no difference.
Both Nick Stokes and I tried unsuccessfully above to point out that we’re not trying to reconstruct a temperature signal. Reconstructing a signal is a very common purpose of A/D analysis-we want to sample a song so that we can reproduce it in a digital format.
But that’s not what we’re doing in climate. In climate, we are simply trying to get averages and trends out of a mass of error-ridden data with heaps of gaps and problems.
And as such, we don’t care much, and we don’t need to care much, about a lot of things that are critically important when you’re trying to reconstruct a signal.
I see this problem with signal guys all the time. You think that climate signals are like a simple superposition of sine waves. Nothing could be further from the truth. Climate signals have pseudo-cycles that appear and disappear at random, only to be replaced by some other pseudo-cycles. In addition, they are hugely damped, so the usual kinds of things like aliasing and resonance and cross-talk are either greatly reduced or absent altogether.
For example, in the lab you have signal amplifiers, and notch filters, and frequency doublers, and multiplexers, and regenerative circuits, and bandpass amplifiers, and beat-frequency oscillators, and heterodyne receivers.
But in nature, those are very, very rare.
And as a result, you waltz in here and start babbling about sampling at 2X the highest frequency in a temperature signal, and I just roll my eyes and think “Here we go again, another signals guy who thinks he’s God …”.
So yes, William, I’m willing to proceed … but only if you start by admitting that your claims to date have often been wrong. No, for our purposes we don’t have to sample at 2X the highest frequency in a temperature signal. No, the min plus max over two is not just another jittered signal. No, we don’t need to sample at 288 cycles per day, once per hour is quite adequate.
And finally, (min+max)/2 is a poor estimator EVEN IF YOU ARE SAMPLING WELL ABOVE THE NYQUIST RATE! Your fundamental claim is wrong—the problem with (min+max)/2 has nothing to do with Nyquist. It is inherent in the sampling method.
So … are YOU willing to continue? If not, send up a flag …
And as always, my thanks for your willingness to put forward your ideas and defend them,
w.
1200 words below – the shortest I could make this explanation.
Hey Willis, Bernie,
My simple request is to open your mind to what I present here and let’s focus on the following before taking these concepts back to the bigger picture.
By definition: if you are working with an analog signal and you take a measurement of it that is a sample. It doesn’t matter how you get that discrete number. Analog signals are continuous and digital signals are comprised of samples. I see a lot of struggle with the definitions.
There has been a lot of talk about strict periodicity. Let’s discuss that further. Nyquist is about the equivalency of 2 domains: analog and digital. Specifically, it is about the equivalency of signals in the analog and digital domain. Nyquist is the bridge. What I think people are losing sight of – or better said: have never gained sight of, is that the digital samples are representative of a signal. The folly is thinking that a “frame of mind” (“…I’m just looking at extrema…”) dissolves that bond the digital samples have with a signal. Those samples will forever be associated with a signal. Math on signals MUST comply with laws governing signals – if you want that math to apply to the signal. If the samples are obtained through complying with Nyquist, then the samples will represent that signal in the digital domain equivalently to the analog signal in the analog domain. HOWEVER, if those samples were obtained in violation of Nyquist, then the samples DO NOT represent the original signal from the analog domain. They CANNOT. Let me develop this further.
When you have samples and you do any mathematical operation on them (adding, subtracting, dividing, multiplying, integrating) you are doing DSP on them. (I know, fancy word for just adding…) Your DSP in the digital domain must ALSO adhere to Nyquist! Nyquist isn’t just about getting from analog to digital. If you start with a signal, properly sample it, then in the digital domain you MUST use all of those samples in the same timing relationship or you introduce Nyquist related error. If you want to reduce your samples, then you are reducing your sample-rate. You must do so properly according to Nyquist, using a sample-rate converter. This involves filtering out frequency content that cannot be supported by your new sample-rate via digital filtering and then you can reduce the samples properly. Example: You have a signal composed of a 1Hz and a 2Hz sine wave mixed. You sample it at 20Hz (20sps) Then you have 20 samples in 1 sec. The bandwidth is set by the 2Hz signal with a period of 0.5s. You have 10-samples for each half-second period. You are complying with Nyquist. Your samples represent your analog waveform. With high quality converters you can convert back and forth between the domains multiple times before converter related performance starts to degrade the signal quality. For each half-second period, if you keep the 1st and 6th samples and simply discard the others and then try to do mathematics on those samples then you have just violated Nyquist. Those 2 samples per half-second period CANNOT and WILL NOT represent your original signal. If you digitally filter out the 2Hz tone, then you can reduce samples according to a process called decimation. You will need to stay at or above 3sps to not violate Nyquist, if your bandwidth is 1Hz.
Using the same example (1Hz + 2Hz sampled at 20sps) you discard all samples per half-second period except samples 1, 4, 5, 6 and 7. If you take this new set of samples and run them through a DAC, then you would get the equivalent signal that this aliased signal represents. Any math you do on the modified digital signal will give you the results for the analog signal you created with the DAC. The resulting new analog waveform could be fed into a spectrum analyzer and you could see what new frequencies and amplitudes you created. If you do math on this asymmetrical sample set, you will get mathematical results, but they will not have an accurate relationship to the original signal.
If we take 288-samples/day from USCRN and discard all but the max and min samples for each day, that step alone is a violation of Nyquist! Whether people reading this like the language, are comfortable with the language or not, your 288-samples/day are a signal. The 288-samples/day represents the actual event that took place on Earth in the analog domain. The moment you select the max and min samples you have just changed the signal you are working with! If you took those 2-samples/day and fed them through a DAC, then you would reveal to yourself the new signal you are working with!
Now, you might say, but I didn’t start with the USCRN data. I just read a max/min thermometer. Ok, let’s explore this. The thermometer did the work of “effing up” your signal so you didn’t have to! You might say, well how could I pass it through a DAC to find out what it looks like? What frequency would I use?!? Good question! You can’t do this experiment because the information required is forever lost. But that fact doesn’t erase the problem. The experiment I did with USCRN allows you to compare the results using examples from that database. You can take the 288-samples/day data and feed it into a DAC and see what the temperature signal looks like. You can analyze this with a spectrum analyzer. You can also analyze spectrum in the digital domain. In this case you know what the ADC is set to. You can take the 2 max min samples and run them back through a corresponding DAC and see the difference. The analog signals you get with 288-samples and the 2 max/min samples are very different. Their spectrums are different. DSP done on the samples give you different results. My study gives you mathematical proof of this. Figures 1, 4 and 5 speak to this. Integrate the error over any timeframe you wish, 1 day, 1 week, 1 month or 1 year. You will see the accumulated error. This error varies over time in the integration. The error on a daily basis can swing between +/-4C, at least for what I saw at a few stations. [Start with individual stations and don’t go for averaging everything from the start.]
Some have said that it isn’t Nyquist! It’s just a bad method to use (Tmax+Tmin)/2! Well then why is it bad? I have not seen a competing answer. It is bad because the act of discarding the samples except the max and min are violations of Nyquist. Calculating the mean value of the properly sampled signal and comparing it to (Tmax+Tmin)/2 shows this clearly. The Nyquist compliant method is unassailably correct. It represents the original signal. The historical method is not correct because it represents some other signal.
When we plot long term trends using both methods and USCRN data we see absolute value differences and trend differences. Some stations have larger error than others. There is a correlation between the shape of the time domain signal and the size and sign of the error. Signal shape means spectral content. Now, it might be the case that over time this error averages out or averages down in absolute value. I propose that it might also be that the error acts as a dithering signal (adding broadband Gaussian nose in return for reduced quantization error). And/or the spectral content of some/many signals doesn’t have enough content where it can cause aliasing damage to the long term signal and/or the phase relationship of the aliasing results in minimal impact.
Summary: By definition, a digital value extracted from an analog signal is a sample. Nyquist is about signal equivalency across digital and analog domains. Digital signals must comply with Nyquist just as analog signals do. Discarding samples is a violation of Nyquist. All math on samples is DSP. (Tmax+Tmin)/2 is a violation of Nyquist. There is ample support from the 26 stations that aliasing error is present. Impact and significance of this error across the 26 stations and more stations is up for further study.
Comment added after reading Willis’ long complaint about my essay: Let’s see if this information helps. I didn’t write everything possible about signal analysis in the 1900 word essay. All that I said is correct and can be clarified if we get a basis of understanding and slow it down enough for me to address with quality responses. If the above doesn’t help we can conclude. Thank you.
“When you have samples and you do any mathematical operation on them (adding, subtracting, dividing, multiplying, integrating) you are doing DSP on them. (I know, fancy word for just adding…) Your DSP in the digital domain must ALSO adhere to Nyquist! Nyquist isn’t just about getting from analog to digital. “
Just not true. Especially the last sentence. Nyquist is about getting from digital to analogue, only. Not analog to digital. You have a bunch of numbers derived from periodic readings of…something. You can add, multiply or whatever. Nothing yet about Nyquist. That comes when you try to derive some property that depends on relating those discrete values to a continuous function.
Now this is where your EE tunel vision comes in. You want to demand that the conversion to continuous (analogue) happen via fitting periodic (trig) functions. Well, you can do that, but it’s far from the only way. If you do do it, you’ll probably do a DFT, and take the numbers to be coefficients of trig functions, inverting the DFT into the continuous trig function domain. Then you get the complication that the DFT maps into the sub-Nyquist domain, and so might alias.
But what is actually done? People calculate a month average of the discrete data. There is no direct assumption of conversion to analogue there. There is an indirect assumption that it isn’t just a result for those samples, but is a measure of temperature that would persist if you sampled some other way.
I disagree with Willis and others saying that (Tmax+Tmin)/2 is wrong because it is not a good estimate of mean T. No-one claimed it was. They are different indices of temperature. The point of my continuous linking to my Boulder analysis is that it is well understood that min/max isn’t even one index. There is a spread of answers depending on when the thermometer is read. So of course they can’t all agree with the integrated average. But the point is that they are all offset by fairly constant amounts. So they are pretty much equivalent when you take anomalies. And those offsets do not bias trends, or other things you might calculate from anomaly.
In fact, the task of getting that consistent monthly average is just numerical integration. And your Fourier method can do it, but is not the only way, nor even a commonly used one.
As 1sky1 keeps pointing out, there are two other big issues that you neglect:
1. The sampling isn’t periodic. In fact, we don’t even know the sample times, except within 24-hr bounds
2. The samples aren’t chosen by time, but by value. This completely messes up your “signal analysis” anyway.
I might add that strict periodicity is important. It creates the ambiguity by always sampling at the same point in the cycle, which is a special case. You actually cumulatively can get a lot more information from the same number of points if you jitter the sample time, providing there is periodicity in the signal.
“is not a good estimate of mean T. No-one claimed it was.”
OK, I’d better clarify that. mean T means mean derived from integration – the limit of frequent sampling. The latter is an index, min/max is a different index. You need to see what their properties are, especially after taking anomalies.
Nick said: “Just not true. Especially the last sentence. Nyquist is about getting from digital to analogue, only. Not analog to digital.”
My reply: Nick, did you actually say that Nyquist isn’t about getting from analog to digital? Did you have a typo? Did you mean to say that? Do you mean that when we sample analog signals we don’t have to comply with Nyquist?
Nick said: “You have a bunch of numbers derived from periodic readings of…something. You can add, multiply or whatever. Nothing yet about Nyquist. That comes when you try to derive some property that depends on relating those discrete values to a continuous function.”
Sure you can pull samples from a signal in a way that violates Nyquist and do math on them – just don’t expect your results to apply accurately to the original signal. If the aliasing that results is “small” then this is because the content+phase that aliases is “small”. It isn’t because the process is a correct one. If you tried the same process with another type of signal where the content at the potential alias frequencies was “large” then you would run into reality of sampling.
“Nick, did you actually say that Nyquist isn’t about getting from analog to digital? Did you have a typo?”
I was going to ask that of you. Nyquist applies to the mapping from a set of sampled numbers to a set of continuous functions, and not before. If you choose trig functions as your basis functions, you may run into a situation that what is really (based on information external to what you observe) a sinusoid beyond the Nyquist frequency, is indistinguishable from an alias – a sin of lower frequency. That is a problem with that particular mapping.
But there are many other ways you could do the mapping (discrete to continuous). You could use finite element basis functions. You could use LOESS. You could use wavelets. It’s true that all will have limitations in representing possible rapid changes between samples. The question then is, what harm does that limitation do to whatever it is you are trying to estimate.
“just don’t expect your results to apply accurately to the original signal”
There is no original signal. One indicator of the fantasy of the EE approach here is the notion that you should pre-filter the analogue signal. How on Earth do you do that? You don’t have an analogue signal running down wires. The sampled values are your starting point.
Hi William
I have found your arguments sound and words well said but it seems to no avail.
I had to have a chuckle about how special some seem to think the air temperature is with the word chaotic being used. So I offer this example of a vertical accelerometer in an off road racing buggy. Some of the inputs to the sensor are engine vibration (rpm and throttle based with lots of harmonics), terrain undulations at all 4 wheels applying what I would call chaotic input (0-10g, 0-50hz) based on the terrain profile, lateral and longitudinal change in motion plus many more. Now let’s go for a drive around a track. The waveform will look like it repeats but there are lots of variations as the driver takes a different line hitting different bumps and being off road the surface itself changes every lap.
/sarc on
Now let’s use this vertical G data to work out if there is a change in gravity and rather than use the full Nyquist compliant data we will do it with just the min and max per lap without knowing where they occurred and expect the same result as using the full properly sampled data.
/sarc off and any replies to its content will be ignored.
Hey Bright Red,
Thank you so much for your comments. If your were nearby I’d buy you a beer! (or whatever you prefer to drink.) Yes, an air temperature signal would be one of the least challenging signals from the natural world that I can think of regarding sampling or processing. I love your example. Especially the sarc modulated comment! Is racing a hobby of yours or do you work on electronics for that application? Either way it sounds like fun. Here is another real world application: Hybrid-Fiber Coax Cable (Cable TV). Take a coax cable with a 1GHz signal bandwidth, comprised of 6MHz channels, variable modulation profiles of 256-QAM to 8192-QAM. Add multi-carrier transmission support (OFDM) and then with 1 input and 1 integrated circuit, sample the entire spectrum well enough such that you can simultaneously tune, demux, demodulate and transcode 128 channels. What could be more chaotic that a scene change from day to night or a bunch of football players in motion on the screen.
Hi William
Appreciate the thought but I doubt I am near you.
Unfortunately I am unable to use my real name or identify my industry as I have sold my interest in an international company I co founded and have agreements in place as the new owner was paranoid about any comments I make online. Fortunately the restrictions expire latter this year. I designed almost all the hardware and programmed a lot of the firmware of the products the company manufactured.
Another good example of real world signals. Yep temperature is at the yawn end of data acquisition.
Hi William
It should be me that buys you a quantity of your favourite beverage for the considerable effort you have put in to this important topic.
Hi Bright Red,
Congratulations on that! I hope it was a good deal for you and either set you up for your next company – or lots of time doing your favorite hobbies.
I hope to learn more about your work after you can “decloak” when the restrictions expire – should you chose to reveal more information.
Either way, I hope to see more of you here, and get the chance to interact with a fellow engineer.
I’m working on another paper (or 2) using basic thermodynamics to dispel Alarmists’ concerns about catastrophic ice caps melt.
Now I know I am mad for posting this but as I was beaten by William in replying to the post that had the quote and have no skin in the game I thought why not as it is this statement and the response that seems to be getting in the way of a robust debate about the actual topic.
Willis said: “Actually, the Nyquist theorem states that we must sample a signal at a rate that is at least 2x the highest frequency component OF INTEREST in the signal.”
Which as it stands and without any further clarification is not correct and if data was collected in this way would result in aliasing if the signal has frequency components greater than the highest frequency of interest. Adding a proviso would fix it and this is my addition to what Willis said.
“Actually, the Nyquist theorem states that we must sample a signal at a rate that is at least 2x the highest frequency component OF INTEREST in the signal” provided we use a suitable ANTI-ALIAS filter to remove all frequencies above the highest frequency component OF INTEREST in the signal.
Now in reality there are always higher frequencies present in any signal so a good design would start with EMI (Electro Magnetic Interference) filtering to get the frequencies down to where normal filters will actually work. From there set the specification for your maximum frequency of interest and acceptable anti-aliasing level.
Flame me if you like but it would be good to see this topic back on track.
Hi Bright Red,
I’m replying to your post where you start: “Now I know I am mad for posting this…”
Thank you for the very clear summary. I agree with all that you said.
Hey Bright Red,
I have found your arguments sound and words well said but it seems to no avail.
I reckon part of the issue comes from semantic exercises. Couple people argue that the error has nothing to do with Nyquist but only with the particular way of averaging, namely calculating daily midrange values (Tmin+Tmax)/2. Well, you can say that this way of choosing daily samples and averaging them creates distortion to the original temperature signal. And this distortion is the source of error whereas using daily midrange values is the method by which the error is induced. So obviously it has something to do with Nyquist. Below example of the temperature readings every 5-min (red curve) superimposed on the daily midrange values (Tmin+Tmax)/2 – blue curve. Midrange curve somehow retains some attributes of the reference signal but is also clearly distorted compared with the original signal:
Midranges vs subhourly
So for me, acclamation that ‘this error has nothing to do with Nyquist’ is not very convincing.
I would not say that the error has “nothing to do with Nyquist”. However, it is also not, as William claimed, just “jittered” samples.
The error comes from a couple sources, both Nyquist and the curious nature of the taking of the signals.
w.
Hi Paramenter,
I have to say I’ve enjoyed reading your post on this topic.
There is no doubt in my mind that Nyquist comes into play when sampling a signal. The only question is by how much and is that amount at any time or location going to be a problem.
As an Electronic Design Engineer I find it unacceptable that there be any issue at all around meeting Nyquist when collecting or processing the data as the additional cost of doing so is for all practical purposes zero. It also seems to me that if you can, at no additional cost, reduce one of the many potential errors involved in recording temperate data to zero why wouldn’t you just do it. It seems that with 288 samples/day USCRN have the same thoughts.
William – you misunderstand so many things I will just try to handle them a few at a time. Four here.
[1] William Ward January 17, 2019 at 10:23 pm said in part: “Did you look at any FFTs of any signals? How far out in frequency do these signals go. Answer: infinity”
You said “any signals”. No: the FFT (DFT), X(k), length N, is bandlimited to half the sampling frequency, and the time-series x(n) from which it is calculated is exactly periodic with period N. This is all it CAN do. For all its limitations, the FFT, when understood, is at least a pretty good estimation of spectrum. I suspect this is what you used. For a very short comparison of FFT to five other transform pairs, please see:
http://electronotes.netfirms.com/AN410.pdf
[2] William also said in the same comment: “If a SRC was used to downsample to 2-samples/day, you would be filtering out content. There would be no aliasing but you would not have the same signal. You would have a sine wave. Are you sure you would preserve the -3.3V mean through all steps? Would it even be close?”
Of course it would, as I already told you at Bernie Hutchins Jan 16, 2019 at 10:32 am, it is the DC value and is EXACTLY preserved; even to just one sample!
Here are some notes on rate changing:
http://electronotes.netfirms.com/AN317.PDF
http://electronotes.netfirms.com/AN358.pdf
[3] William Ward said at January 17, 2019 at 6:53 pm: “Using the same example (1Hz + 2Hz sampled at 20sps) you discard all samples per half-second period except samples 1, 4, 5, 6 and 7. If you take this new set of samples and . . . . . not have an accurate relationship to the original signal.”
William, are you NOT aware of “non-uniform” or “bunched” samples?
http://electronotes.netfirms.com/AN356.pdf
http://electronotes.netfirms.com/EN205.pdf
[4] Most importantly, I asked you (Bernie Hutchins January 16, 2019 at 8:57 pm):
(1) We both seem to agree that the historical use of (Tmax+Tmin)/2 as a measure of mean is nonsense that results in significant error (what I call the Fundamental Flaw – FF – in either the continuous or sampled cases).
(2) Can we also agree that if there is NO sampling done on a signal it is meaningless to suggest aliasing as a cause of any errors that are present already?
(3) Does it not follow that since the error is already in the unsampled case (FF), it is not caused by aliasing due to sampling that has not yet, and may never occur?
IF YOU CAN – please explain your position.
You never responded.
– Bernie
William – you misunderstand so many things I will just try to handle them a few at a time. Four here.
[1] William Ward January 17, 2019 at 10:23 pm said in part: “Did you look at any FFTs of any signals? How far out in frequency do these signals go. Answer: infinity”
You said “any signals”. No: the FFT (DFT), X(k), length N, is bandlimited to half the sampling frequency, and the time-series x(n) from which it is calculated is exactly periodic with period N. This is all it CAN do. For all its limitations, the FFT, when understood, is at least a pretty good estimation of spectrum. I suspect this is what you used. For a very short comparison of FFT to five other transform pairs, please see:
http://electronotes.netfirms.com/AN410.pdf
[2] William also said in the same comment: “If a SRC was used to downsample to 2-samples/day, you would be filtering out content. There would be no aliasing but you would not have the same signal. You would have a sine wave. Are you sure you would preserve the -3.3V mean through all steps? Would it even be close?”
Of course, as I already told you at Bernie Hutchins Jan 16, 2019 at 10:32 am, it is the DC value and is EXACTLY preserved even to just one sample.
Here are some notes on rate changing:
http://electronotes.netfirms.com/AN317.PDF
http://electronotes.netfirms.com/AN358.pdf
[3] William Ward said at January 17, 2019 at 6:53 pm: “Using the same example (1Hz + 2Hz sampled at 20sps) you discard all samples per half-second period except samples 1, 4, 5, 6 and 7. If you take this new set of samples and . . . . . not have an accurate relationship to the original signal.”
William, are you not aware of “non-uniform” or “bunched” samples?
http://electronotes.netfirms.com/AN356.pdf
http://electronotes.netfirms.com/EN205.pdf
[4] Most importantly, I asked you (Bernie Hutchins January 16, 2019 at 8:57 pm):
(1) We both seem to agree that the historical use of (Tmax+Tmin)/2 as a measure of mean is nonsense that results in significant error (what I call the Fundamental Flaw – FF – in either the continuous or sampled cases).
(2) Can we also agree that if there is NO sampling done on a signal it is meaningless to suggest aliasing as a cause of any errors that are present already?
(3) Does it not follow that since the error is already in the unsampled case (FF), it is not caused by aliasing due to sampling that has not yet, and may never occur?
IF YOU CAN – please explain your position.
You never responded.
– Bernie
Hi Bernie,
Sorry I didn’t respond directly, I responded to both you and Willis on another reply. One can easily get diluted with too many exchanges and that can affect the quality of response. I want to make sure I can give quality responses. You sent a reply to me today twice about an hour apart, were you aware of that? 2:08 and 2:56 PM. They seem like duplicates but there are some differences too. I’ll reply to the 2:56 message. For some questions I may point you to other responses I have provided to others.
For [2] on SRC, Bernie said: “Of course, as I already told you at Bernie Hutchins Jan 16, 2019 at 10:32 am, it is the DC value and is EXACTLY preserved even to just one sample.”
We agree that the DC value will be the same. But that is not resolving our difference we have. The DC value will not be affected during SRC because the content reduction is always on the high frequency side, right? We are reducing sample rate so we have to remove content at high frequencies that will not be compatible with the new slower rate. If we are converting up in frequency then the situation is different and not a subject that applies to this discussion. If you take a signal sampled at 288 and SRC down then your mean will absolutely change because your content has changed. So we don’t agree.
Regarding [3] “bunched samples” and “non-uniform samples”. Thanks for the links Bernie. Cool stuff. But I hope you will agree that the articles show how to *recover* the full signal if you have “damaged” signals or there is some other reason you are not getting all samples the rate would provide. I don’t see anyone trying to use DSP to recover this information in climate science. I see climate science using 2 samples/day and those samples are “non-uniform”.
Regarding (1) on (Tmax+Tmin)/2: Yes we do agree here that it doesn’t provide results as good as a higher rate like 288. I was being sarcastic on a previous post – but now I ask seriously. Can you explain the FF you mention please? I understand the generic use of the term “fundamentally flawed” but “FF” as you have presented it sounds like something with more discipline to it.
Regarding (2) on NO sampling done: No I don’t agree. I wrote a detailed response to you and Willis yesterday explaining this. Willis found reason to be offended. I didn’t hear from you on that explanation. If you have a discrete value related to an analog signal then that is a sample. If you want to do operations on samples that apply to the original signal then you need to do so according to Nyquist. My post yesterday gives more detail. I thought it was a good explanation, and it included the restatement of many obvious things to be thorough (not to insult anyone’s intelligence). I would like a mathematical or scientific explanation from you about why max/min method doesn’t work. What makes it a FF specifically? I say the FF is that 2 “non-uniform” samples (borrowing from the terms you introduced) are not enough to do math that gives good results related to the original signal.
Regarding (3) error caused by aliasing or not: Well this is tied to the comments above. I don’t want to assume you had time to read the post I sent yesterday (Jan 17 6:53 PM). But that is the best I can do to explain this. It took 1200 words to say. I’d like to hear your thoughts if you read it.
Replying to William Ward at January 18, 2019 at 7:28 pm
[8] You said: “ We agree that the DC value will be the same. But that is not resolving our difference we have. The DC value will not be affected during SRC because the content reduction is always on the high frequency side, right? We are reducing sample rate so we have to remove content at high frequencies that will not be compatible with the new slower rate. If we are converting up in frequency then the situation is different and not a subject that applies to this discussion. If you take a signal sampled at 288 and SRC down then your mean will absolutely change because your content has changed. So we don’t agree. ”
Your first sentence conflicts with the last two!!! Please clarify. What is wrong with what I said Jan 17 at 10:32 am?
If you are talking about up-conversion (we’re not) that is an interpolation problem which is completely different, and depends on a signal model (such as bandlimited, polynomial, etc.).
A good point to make here is that Nyquist/Shannon does not (NOT!) say that you have to sample at greater than twice the highest frequency. It says rather that you have to sample at greater than twice the one-sided bandwidth. For example, if a signal’s spectrum is zero except between 12 and 13, you do not need to sample at 26+ but rather at 2+, that is 2x(13-12)+. The band limiting is “bandpass” in this case. If the spectrum goes to DC, the one-sided bandwidth and the highest frequency are the same – hence the common misstatement of Nyquist! But pity the poor engineer who samples an AM radio signal broadcasting at 1 MHz at a 2.5 MHz rate when about 20 kHz will do. Of course the reconstruction is not low-pass, but bandpass in this example. It’s called “bandpass sampling”.
[9] You also said: “ Regarding [3] “bunched samples” and “non-uniform samples”. Thanks for the links Bernie. Cool stuff. But I hope you will agree that the articles show how to *recover* the full signal if you have “damaged” signals or there is some other reason you are not getting all samples the rate would provide. . . . . . “
That’s not what the notes says. If you originally sample just fast enough, and samples are lost (perhaps every 5th sample) you are just plain out of luck. If however your original bandwidth was only 4/5 what would have been allowed, you can still reconstruct exactly, although not with the usual low-pass (sync interpolation). I found your example of 20 samples of which you kept only 1,4,5,6,7 as remindful of bunched sampling. You had a bandwidth of 2 and 5 samples/cycle. Another lesser-known thing about Nyquist rate is that it is the average that matters! But you do need to know what you are doing. Oh – and you do absolutely need to know the times.
[10] Continuing your thought: “ I don’t see anyone trying to use DSP to recover this information in climate science. I see climate science using 2 samples/day and those samples are “non-uniform” “
Oh, but climate science as all about attempts to recover information – many quite problematic.
[11] “ Regarding (2) on NO sampling done: No I don’t agree. I wrote a detailed. . . . . . If you have a discrete value related to an analog signal then that is a sample. ”
NOPE. Only if you give the time the sample was taken. Tmax and Tmin (as analog) are given without time, as with a min/max thermometer. If you HAD a daily file of dense samples, then any computational software will likely have functions for min and max that obligingly also return the times index. But if you had that file, you would compute the mean directly from it.
[12] “Regarding (3) error caused by aliasing or not: Well this is tied to the comments above. I don’t want to assume you had time to read the post I sent yesterday (Jan 17 6:53 PM). But that is the best I can do to explain this. It took 1200 words to say. I’d like to hear your thoughts if you read it ”
I did read it – twice. If (Tmax+Tmin)/2 is wrong in continuous time, it is still wrong, for the same fundamental reason, with any sampling, and sampling which may never happen can’t be a cause of the analog errors. The problem with what you wrote is that the reader comes to a halt very often with a question of logic or what is just a demonstrably dubious claim. Sorry to have to say this.
– Bernie
Bernie,
You said: “Your first sentence conflicts with the last two!!! Please clarify.” We were talking about DC after SRC.
My reply: I’m confused as to what could be our misunderstanding. I’m not trying to insult you by stating some basics – just trying to connect and see what the misunderstanding is: As I see it, the DC is the content that establishes the offset for the scale of measurement. If we are using degrees C then the yearly signal rides on this and the daily signal rides on the yearly signal, visually speaking. While doing SRC I would not expect this “DC” value to change. But the higher frequencies will be filtered. We will see this in the things that make the daily sinusoid distort. The more we filter down the less the distortion. Agreed so far? As that content changes, depending upon how far down you sample, how much content there is to remove and how much of it is removed, the mean will change. I’m not sure how my first sentence contradicts my last 2 as you said. Can you elaborate please?
Bernie said: “A good point to make here is that Nyquist/Shannon does not (NOT!) say that you have to sample at greater than twice the highest frequency. It says rather that you have to sample at greater than twice the one-sided bandwidth. For example, if a signal’s spectrum is zero except between 12 and 13, you do not need to sample at 26+ but rather at 2+, that is 2x(13-12)+. The band limiting is “bandpass” in this case. If the spectrum goes to DC, the one-sided bandwidth and the highest frequency are the same – hence the common misstatement of Nyquist! But pity the poor engineer who samples an AM radio signal broadcasting at 1 MHz at a 2.5 MHz rate when about 20 kHz will do. Of course the reconstruction is not low-pass, but bandpass in this example. It’s called “bandpass sampling””
My reply: I wrote a 1900 word essay to introduce a concept to a broad audience. I didn’t write all that could be said on the subject. What you write about is real cool and I’d love to pick you brain about your experiences if we could meet someday. I understand what you are saying about bandpass sampling. It is also called “undersampling” by some. Over 20 years ago I worked on the design to bring the first integrated digital front end to the newly developing cable tv set top box world. At the time, a bag of converters that would do the job in 5M unit/yr quantities was over $20 for those who could even pull off the performance in an IC. This bag of components had to be integrated and sold for under $4. The requirement was for 10 ENOB and crazy phase error figures and a 6MHz 256 QAM signal had to be recovered from a low-IF of 45MHz. The technology to do this was not there at that time for an integrated solution at $4. But the trick was to undersample and essential use aliasing to downconvert the 6MHz channel down to baseband where it could be recovered. Yes, cool stuff, but… it is difficult to get across the simple messages in the paper – that kind of detail would have made this DOA for most. The basic statement of Nyquist is correct but there is more to it. I’m not sure that addition enhances my paper however.
Bernie, I’m sorry to have offended you with my strong pushbacks on some comments. I’m wondering how much of the ongoing friction is just a positive feedback loop of us both looking for respect from the other. I can see from what you have said that you have a deep DSP knowledge. There were some fundamental issues that we have been in contention and our disbelief of the others’ positions perhaps makes us discount the other at times. I think there is usually an underlying misunderstanding and I should probably be more patient to tease that out rather than jump into combat.
William –
You said: “ We agree that the DC value will be the same. . . . . . If you take a signal sampled at 288 and SRC down then your mean will absolutely change because your content has changed. So we don’t agree. ”
I was supposing that you merely misspoke! The DC value is the same, but the mean changes! It is the SAME (sum-of-samples)/(number-of-samples). I explained this on Jan 16 at 10:32 a.
Are you omitting the pre-decimation filter (allowing aliasing)? If so, it is not unlikely that some upper component(s) will alias to DC and corrupt the correct value, but BOTH the DC component AND the mean will change together.
Possible confusion: what do you mean by SRC (I assumed Sample Rate Converter). What is it? Is it not the “downsampler” used in the classic Multi-Rate texts (Vaidyanathan, Fliege)? It’s not just the square with the box with the down-arrow and number inside – is it? If it were, it does NOT even apply to the case where you keep only Tmax and Tmin, as these are not equally spaced (in actual time) at the input. Aliasing is not the fundamental reason that (Tmax+Tmin)/2 is a poor estimator.
-Bernie
Willis,
There is nothing special about an air temperature signal. You are mystifying it. There is nothing in an air temperature signal that you can’t find in a music recording. You think I’m being arrogant. Well, I can understand why you say that. I don’t think I’m better than anyone. So, I would not call that arrogant. I do think I know a lot more about signal analysis than anyone who has spoken out against what I have presented. I have spent 35 years doing it. Thousands of hours looking at signals go back and forth between the domains. Designing board level systems and integrated circuits. Then working with the best converter designers in the world and the best system designers in the world – with me being at the system/applications level of that. I’m listing to the responses and I see people who are very smart, very knowledgeable – in many areas I can’t even begin to follow what you are doing. But when I see people making the most fundamental mistakes and standing up so boldly proclaiming it, it kinda looks to me like the other person is the arrogant one. I don’t think that that is what is going on but when people get stubborn that is the look. We all have to decide, is it better to look polite and not break through or push and try to break through. When pushing happens then egos can bristle. The other night we discussed personal attacks. Jeez, did you go back and read what you just said to me? Ok, I’ll be fine, but I think we have reversed roles tonight.
Back to the temp signal. If you mystify it your lost. I don’t know how I can convince you. Air temp signals are really boring – real slow. Yawn. Try to sample millimeter wave communications! Audio, video, they both have every element you mystify. And yes, air temp signals are just combinations of sine waves at the most fundamental level. The intermittent nature of some components does not change anything. Any of the pseudo-cycles you mention are just intermittent signals and their frequency will be in a contained range. Sample to cover that range and it will be in the samples. It is really basic.
I have explained and I’m not sure you keep stubbornly claiming that I have not clarified how a real world implementation of Nyquist is done. Did you miss that maybe?? When you start you explain theory and then you introduce application of theory. Hey, maybe my writing sucks and I didn’t communicate it well in the essay. The challenge I had was trying to keep to a word limit – so I condensed the paper but offered the full version. The full version is where more of the Nyquist theory is developed. But still only a few bits of what is possible to say. I said I was using 288-samples/day as the effective Nyquist frequency because NOAA did so. It made sense because the error below that was small and only started to increase as I got down to 72-samples/day. I didn’t try enough example stations to make a strong statement that it could come down to 72. Maybe there are other stations out there that present profiles that need all 288 that NOAA gives us. Is this really such a big point against my case?
Willis said: “I showed that 5-minute sampling is only trivially better than hourly sampling.”
Reply: Jeez Willis. I showed that it wasn’t trivial. I wrote the essay. I presented data. I would think you might try to analyze the data I presented before crafting your alternate data. I explained my methods. I’m not sure I understood your methods. Probably my limitation. But we have what I presented and what you presented. They disagree unless I’m just not understanding. I don’t really remember you citing my data and showing the error. I only remember you doing your alternate analysis and proclaiming mine incorrect.
Regarding “reconstructing the signal”: You have still not locked in on the importance of the ability to reconstruct. Maybe my other post tonight will help. That explains the things that you seem to be missing.
You quoted Nick: “The Theorem tells you that you can’t resolve frequencies beyond that limit. But we aren’t trying to resolve high frequencies. We are trying to get a monthly average. It isn’t a communications channel.”
You said: “Despite that, you have not yet admitted that your initial claim was wrong.”
My reply: It is not wrong. It is correct. I have said that you can’t alias and expect to eliminate the aliasing after sampling. If you are interested in only the monthly average, then you need to filter out faster signals before sampling. That is the theory and practice and it is proven. But if the aliasing effect is small because the frequency content is not large enough or of specific phase to cause problems then you are just lucky. It doesn’t mean the method is correct, just lucky based upon the spectrum.
Willis said: “And as a result, you waltz in here and start babbling about sampling at 2X the highest frequency in a temperature signal, and I just roll my eyes and think “Here we go again, another signals guy who thinks he’s God …”.
My reply: Ouch. It sounds like you had it out for me from the beginning… “…another signals guy… babbling…eyes rolling”. Perhaps this sentiment is evident in your approach to this. It feels like your goal is to snuff it out and move on. Ok, you are here talking with me, thank you, so I’m probably wrong – but there is something to the feeling. No, I don’t think I’m god but I’m very confident in what I’m presenting based upon real world applications – and temperature signals are not any different.
William, thanks for your answer. I did NOT have it in for you from the beginning. However, when you start by saying that we have to sample the temperature at higher than the highest frequency, and the highest frequency has a period on the order of seconds … do you seriously expect to be taken seriously?
This is especially true when very soon you start talking about a “practical Nyquist sample rate”, without ever defining what that is except to say that you think it is 288 samples per day … what happened to “higher than the highest frequency” that you defended so passionately? Now you say:
Where is your “highest frequency in the signal” in that claim? Do you see why I shake my head when I read your claims?
Next, is it a “violation of Nyquist” to only take two readings per day, the high and the low, at whatever time they might happen, and average them? Nope, because as far as I know, Nyquist says nothing about that situation. If you think Nyquist or Shannon discussed that situation, please point out where.
Everything I’ve read regarding Nyquist discusses evenly spaced samples, or samples with jitter, or random samples. So perhaps you’d be so good as to provide us with a citation that discusses your claim that taking just two samples, a high and low sample whenever they might occur, violates Nyquist …
Next, you say:
I read your essay. Your “data’ is ONE VERY UNUSUAL DAY, a day in Alaska where for half the day the temperature did what it usually does—it pauses and hangs out at the freezing point.
And foolish me, I thought I was being kind by not embarrassing you by pointing out that you are drawing a big conclusion based on one stinking ridiculous day’s worth of extreme data …
So instead of busting you for that silliness, I figured I’d take the high road and just show how to do it. I averaged a number of days and months from a number of datasets to find out what the average RMS error is when calculating the average. You say;
So … I understood your methods. You analyzed one day. Instead, I looked at the average error in the signal of interest, which in climate is almost always the daily that makes up the monthly data.
And now, now you tell me that you didn’t understand my methods? But did you ask for an explanation?
Hah. No chance. You know you are right, apparently, so there’s no need to ask for explanations.
Now, I freely admit that that may not be an accurate reflection of who you are, William. But it sure as hell is who you look like from this side of the screen.
Returning to the data, I showed that in calculating daily averages, hourly sampling is only very slightly better than sampling every five minutes. Here’s that graph again:
To calculate that, I took the average of the RMS error between calculating the daily average using 5-minute samples and using hourly samples. Here’s how.
I looked at a full year of days. I calculated each day’s average using 288 samples per day. I calculated each day’s average using 24 samples per day. I took the differences between the two and calculated the RMS error. That is the difference between 5-minute and one-hour samples, and it’s only four-hundredths of a degree. If you still don’t understand that, ASK!
And that means that whatever aliasing may exist due to using the hourly data, it only has an RMS average error of 0.04°C when calculating daily averages. And that means that no, 288 samples is NOT necessary, and sampling slower than that is NOT violating Nyquist as you claim.
First, you have already tacitly said that you were wrong without saying you were wrong by inventing a concept you call a “practical Nyquist rate” … which is NOT, in your words “at least 2x the highest frequency component of the signal”. You claim that sampling at 288 per day does NOT violate Nyquist, when you opened the discussion by clearly stating that it does violate Nyquist … and dissed me for questioning it.
Next, the issue is not whether or not there is aliasing. Unless you filter out the high frequencies, there will be aliasing, even at your “practical Nyquist rate”.
The issue you don’t seem to grasp is, is aliasing a difference that makes a difference? This is not the lab, and we’re not looking for spiritual signal purity. This is the real world.
Look, William. I consider you an expert in your field. But your field is obviously not the practical use of statistics and signal theory in the field of climate. So you invent things like a “practical Nyquist rate” and pick a number for that and then tell us that we’re all wrong to question it … sorry, but that dog won’t hunt.
Now, I’m more than happy to discuss this further. And I reckon you’re a good guy … just insufferable at times. But heck … so am I, so we have at least that in common. Well, plus one more thing we have in common. Although I’ve been called “Willis” all my adult life, my full legal first and middle name is William Ward … go figure.
So to move this discussion forwards, let me see if I can clarify my position:
1. Your “practical Nyquist limit” of 288 cycles/day is just something you made up based on highly inadequate data made up of just one unusual day’s worth of samples.
2. There is no practical way to sample temperature data at “2X higher than the highest frequency” because we’d have to sample at microseconds to do that … and if nothing else, the lag in the thermal sensor would obviate that. Nor is there any reason to do so—for our purposes hourly data is quite adequate.
3. How bad (max+min)/2 is has nothing to do with Nyquist. It is a problem because of the unusual nature of the choice of times to sample. You could be taking max and min temperature samples at 6X the Nyquist rate and the average of them would still be inaccurate.
4. In the world of temperature, there is no practical difference between hourly samples and 5-minute samples.
5. Aliasing is only a problem when it actually is a problem, not when theory says it is a problem.
6. Not following Nyquist is only a problem when it actually is a problem, not when theory says it is a problem.
7. Unlike the sampling that I suspect you are used to doing, the main use of sampling in climate is to provide data for averages which will be used in turn to calculate trends. As such, it is the accuracy of these trends that is the important metric of the adequacy of the sampling rate, not whether the sampling rate fulfills theoretical requirements or contains aliasing.
I’m happy to discuss any of that with you. But if you’d be so kind … if you can’t follow what I’m saying, just ask.
My best to you,
w.
Willis
You said, “4. In the world of temperature, there is no practical difference between hourly samples and 5-minute samples.” I think that you should define “practical” as an acceptable tolerance in error to achieve a result sufficient to resolve trends in temperature changes, and/or estimates of energy content. “Practical” means different things to different people.
Clyde Spencer January 18, 2019 at 11:34 am
Good question, Clyde. I just found some USHCN data on the web that hasn’t been blocked by the government shutdown. I got 13 years of data for the USHCN station nearest to where I grew up, in Redding, California. I calculated the average for each day using first all 288 daily samples, and then using 24 hourly samples. Here are the results:
Some notes. The largest average error is five thousandths of a degree.
The largest RMS error of the daily errors is six hundredths of a degree.
The largest absolute error, both positive and negative, is a quarter of a degree.
That’s what I’m calling “no practical difference” …
w.
Clyde,
I have tried to understand Willis’ complaint about how I presented the theory and then the application of the theory. I have tried to restate and summarize but this has not helped. I would like another opinion. Have you been confused my presentation of that information? Let me briefly summarize again, as I have not changed or altered positions, but perhaps added the practical portion somewhere after the start of the discussion.
According to the Nyquist-Shannon Sampling Theorem, we must sample the signal at a rate that is at least 2 times the highest frequency component of the signal.
fs > 2B
Where fs is the sample rate or Nyquist frequency and B is the bandwidth or highest frequency component of the signal being sampled. However, Real-world signals are not limited in frequency. Their frequency content can go on to infinity. This presents a challenge to proper sampling, but one that can be addressed with good system engineering. When air temperature is measured electronically, electrical anti-aliasing filters are used to reduce the frequency components that are beyond the specified bandwidth B, thus reducing potential aliasing. Another method of dealing with real-world signals is to sample at a much faster rate. The faster we sample the farther in frequency we space the spectral images, significantly reducing aliasing from undesired frequencies above bandwidth B. This is how Nyquist is applied practically. In the real-world, a small amount of aliasing always exists when sampling, but careful engineering of the system will allow sampling to yield near perfect results toward our goals.
Experimentally, multiple *calibrated* and *matched* converters can sample the *same event* and the results can be compared. At some sample-rate you meet your required accuracy and beyond that there are increasingly diminishing improvements. NOAA uses 288-samples/day (divided down from 4,320 and I don’t have the official reason why they did this). You would design your system, not for the typical or “average” signal content but for the content with the most high frequency energy. You want your system to capture all data from every day and every station.
Does this sound confusing to you or like I’m presenting contradictory claims?
Thanks in advance.
Willis, Clyde,
Almost every station I examined shows “significant” error (many tenths to several degrees C) per day. The following graphs show for each day the difference between 288-samples/day and NOAAs (Tmax+Tmin)/2 (“historical method”). 288-samples is reference and historical method is subtracted. Result is daily error.
https://imgur.com/QyfAonp
https://imgur.com/aoUX30R
https://imgur.com/hfqjMz5
These errors can be seen over years in both absolute value and trends.
https://imgur.com/cqCCzC1
https://imgur.com/IC7239t
I have 26 of these charts corresponding to the 26 stations is Fig 7. They are not all properly labeled for public consumption but they are available and the labeling could be added.
Willis, you have not acknowledged any of the data I have presented.
As I said in my paper:
It is clear from the data in Figure 2, that as the sample rate decreases below Nyquist, the
corresponding error introduced from aliasing increases. It is also clear that 2, 4, 6 or 12-
samples/day produces a very inaccurate result. 24-samples/day (1-sample/hr) up to 72-
samples/day (3-samples/hr) may or may not yield accurate results. It depends upon the
spectral content of the signal being sampled. NOAA has decided upon 288-samples/day (4,320-samples/day before averaging) so that will be considered the current benchmark standard. Sampling below a rate of 288-samples/day will be (and should be) considered a violation of Nyquist.
The goal is to design a system to handle the worst case signals. The goal is to have a system that works equally well for all days at all stations. Finding stations that work well with 24-samples/day doesn’t mean you decrease the system performance to match those stations. You don’t design for the “average” condition, you design for worst case conditions.
Willis, USCRN hourly data is the 5-minute data integrated to hourly. Why would you assume that sampling hourly would be equivalent? It may produce similar results and it may not. Do not confuse the 2 scenarios. If there is higher frequency content then you will get different results between sampling hourly and integrating 20-second samples to hourly. Sampling hourly will alias content faster than that.
William
I anticipated your question and started to respond a couple of days ago. I decided I wasn’t really going to contribute anything and deleted what I wrote. However, since you asked, I’ll stick my neck out.
You and Willis both impress me as being bright, experienced people. However, my sense is that you are arguing at cross purposes. Willis seems to be comfortable with loosely defining “practical” as results that appear to have ‘relatively small’ (but not rigorously defined) errors. Further, he seems to focus just on temperatures and takes an approach of demonstrating with data whether something is outrageously wrong, rather than following through with how errors propagate and impact claims made by alarmists. You seem to be more concerned with a theoretical approach and focus on what an ideal sampling protocol should be like. Therein, I think, lies the essence of your cross purposes. To be flippant, the difference between “Good enough for government work versus the attitude of a perfectionist.”
Now, something that I think that you should have stressed is that the issue of under sampling may not be critical for getting acceptable temperature estimates (providing that an acceptable tolerance is specified), but it clearly results in distortion of the shape of the time-series. This is more important when calculating the area under the curve for energy calculations. (As an example, consider what happens when a cold front passes a station shortly after the daily high — the bottom drops out and the nice sinusoid disappears.)
I think that what is missing is something akin to a design specification that starts with just what the data collection is intended to address. That is, it should contain quantitative goals, with acceptable quantitative errors for each step in the data collection and analysis chain. If there is agreement on what the purpose is, and what acceptable error is, then one can proclaim whether existing data are fit for purpose or not. Short of that, it is two experts touching their favorite part of the elephant and holding their ground on what the ‘truth’ is.
Hi Clyde,
I’m replying to you post where you start: “I anticipated your question and started to respond a couple of days ago.”
I agree with your assessment of the different perspectives that are tripping up the communication between Willis and me (at least the technical issues.) I’m approaching it from an engineering perspective. Even that can bifurcate adding confusion. If I’m referring to what NOAA has given us with USCRN then analysis goes toward what is allowed with what they have already done. If we are discussing what the USCRN specifications should be or could be, that is another discussion. And of course, we start the discussion with the theory, which differs in that it starts with an ideal band-limited signal. We don’t really have those in the real world. Enter engineering with the addition of filters to approach the ideal band-limited signal. The design needs to comply with a specification. When discussing USCRN I usually assumed NOAA had reasons for their specifications so I deferred to those where I had nothing else to suggest in its place.
This provided opportunity for the communication to derail. Thanks for your insight. Your neck is safe.
William Ward
In case you missed it, I wanted to be sure that you saw the link I provided to Scott:
https://library.wmo.int/doc_num.php?explnum_id=3179
It is from a World Meteorological Organization document on automated weather stations. It has some interesting insights on how they think sampling should be done. I noted in particular that they recommended filtering the output of things like thermistors before digitization. There are competent people looking at the problems, but I get the feeling that the academics aren’t aware of it.
Clyde – thanks for the link to the WMO AWS guide. Great information.
Bright Red and Paramenter: See this guide referred by Clyde. See section 1.3.2.2 Sampling and Filtering on pg 15 of PDF (pg 539 of document). Key points:
Considering the need for the interchangeability of sensors and homogeneity of observed data, it is recommended:
(a) That samples taken to compute averages should be obtained at equally spaced time intervals which:
(i) Do not exceed the time constant of the sensor; or
(ii) Do not exceed the time constant of an analogue low-pass filter following the linearized output of a fast response sensor; or
(iii) Are sufficient in number to ensure that the uncertainty of the average of the samples is reduced to an acceptable level, for example, smaller than the required accuracy of the average;
(b) That samples to be used in estimating extremes of fluctuations should be taken at least four times as often as specified in (i) or (ii) above.
Emphasis on Note (b): Samples should be min 4x as specified in i or ii for “extremes of fluctuations” (I assume this means high frequencies and not max/min amplitude).
I started to calculate potential break frequencies of the anti-aliasing filter, but then remembered something more important and that comes from pg 11 (PDF) [535 in doc]. See Data Acquisition heading. It seems the architecture uses a switched ADC – so the converter is shared. The sensors for the various parameters (pressure, temp, humidity, etc) are fed through their front end and signal condition circuits and then switched/muxed in to the ADC. So 4,320-samples/day may be defined at that speed for another parameter that has higher frequency content than temperature. All of these things are related so I’m not sure what that would be. I’m just pointing this out as something to consider. If there is another variable that needs faster sampling then this could explain why 4,320 was selected and why averaging down by 15:1 (4,320 to 288) is done for temperature. Other variables may not use this averaging or a different averaging factor appropriate to that variables sampling needs.
That’s all for tonight. Sleep depravation meter is whizzing around at amazing speeds.
Hi William,
From the document”
(a) That samples taken to compute averages should be obtained at equally spaced time intervals which:
(i) Do not exceed the time constant of the sensor; or
(ii) Do not exceed the time constant of an analogue low-pass filter following the linearized output of a fast response sensor; or
(iii) Are sufficient in number to ensure that the uncertainty of the average of the samples is reduced to an acceptable level, for example, smaller than the required accuracy of the average;
(b) That samples to be used in estimating extremes of fluctuations should be taken at least four times as often as specified in (i) or (ii) above.
“
I expect that the 4320 samples/day is in line with the time constant of the temperature sensor or following filter that they are using as per (i) and (ii) and nothing to do with multiplexing the A/D input.
It is also interesting that the recommended sample rate is about three times faster than required by Nyquist which is a very reasonable/normal and practical design decision.
Clyde Spencer January 19, 2019 at 12:10 pm
OK, here’s my rigorous definition. With respect to 288 samples per day, errors in daily average temperature with an average of less than 0.01°C and an RMS error of less than 0.1°C are acceptable.
However, I will note that this does NOT include the (min+max)/2 errors. Unfortunately, for the most part they are all that we have … I’ll have to take a look at the effect of those errors before commenting further on them.
w.
Willis
I think that we are all in agreement that it is an unfortunate set set of circumstances that we are saddled with an historical data set that begins with mid-range values, and that the paid professionals try to use them to make a silk purse out of a sow’s ear.
You said, “With respect to 288 samples per day, errors in daily average temperature with an average of less than 0.01°C and an RMS error of less than 0.1°C are acceptable.” Acceptable for what? Acceptable for everything that we might ever want to do with temperature data? Acceptable to justify mean annual temperatures to 0.001 deg C? Acceptable to calculate energy hiding under bushes?
William, my thanks to you for your complete and interesting analysis.
I realized yesterday that there is a huge difference between guys like me and signal guys like you.
You have access to the analog signal. We don’t.
This opens up a whole host of possibilities. You can filter your signal before sampling it. You can amplify or decrease certain parts of the signal. You can heterodyne it. Hosts of possibilities.
We don’t have that option. We get what we get—some data hourly, some data at 288 samples per day, some (min+max)/2. Not pretty.
I do think that we agree on most things. For example, the (max + min)/2 method of calculating the mean gives ugly errors. Here’s the data for Fairbanks 2015:
How does this affect the trends? Haven’t looked at that.
I think we have two remaining disagreements.
First, I’ve shown that the hourly errors are very small, and the hourly sampling does NOT contain aliased signals. I’ve also shown that it is past the “knuckle” in the graph where faster sampling gains us very little.
As a result, I’d say that the practical Nyquist limit is hourly. I’ve not seen anything to change my mind. You have NOT demonstrated that there is any aliasing in the hourly data. You have NOT given us an example where using 288 samples gives a significantly better result than hourly sampling, or an example where using hourly data leads to significant errors.
Here’s an oddity for you. The good folks at NOAA have averaged the 4,320 samples per day or whatever they are taking into 5-minute segments. In essence, this has filtered out any frequencies with periods shorter than 5 minutes.
Of course, this means that the highest frequency remaining in the digital signal is 288 samples per day … and Nyquist says we have to sample at twice that frequency.
But unfortunately, not having access to the analog signal, we can’t do that … which strictly speaking means that even the 288 samples per day is below the Nyquist limit. To which I can only say … so what? What practical difference does that make?
So that’s our first disagreement.
Second, my blood is still angrified by this exchange:
Then you followed that insult up by saying that well, no, we DON’T have to sample at 2X the highest frequency in the data, we can sample far below that, 288 samples per day is just fine, we’ll call that the “practical Nyquist limit” … which is EXACTLY WHAT I HAD SAID and had been severely dissed for saying.
I’ve given you a couple of opportunities to apologize for that piece of ugly paternalistic nastiness, and you’ve shined them on.
So those are our two areas of disagreement. You think that there is something holy about 288 samples per day, and you think you are qualified to talk to people who disagree with you as though they were ignorant children.
So there we are. I’ll take a look at trends and see what I can find. I can tell you right now that there will be no significant difference between hourly data trends and 288 sample trends.
In addition, I suspect strongly that the difference in say 30-year trends using min/max versus 288 samples will be very small. We can do that monte-carlo style, because the USCRN data lets us quantify the min/max error as to mean, skewness, kurtosis, and RMS. So all we have to do is add that error to any longterm set of daily data and convert that into monthly and then 30-year trends. I don’t think the difference will be large.
As always, thanks for your perseverance,
w.
OK, I looked at trends for the full Fairbanks data, 2007-2018. The max+min has an error of about 0.06°C per decade, which is large.
The hourly sampling, on the other hand, has the same trend as the 288-sample data to six decimal places.
More evidence that there is no problem with using hourly data …
w.
OK, I looked at the trends in the full Fairbanks USCRN 2007-2018 record. Here are the decadal trends:
2.671°C/decade traditional (max+min)/2
2.6175°C/decade 288 sample
2.6184°C/decade hourly sample
The difference between the traditional and the 288 sample is 0.06°C/decade, a significant amount.
The difference between the hourly and the 288 sample is 0.0009°C/decade, a meaningless amount.
More evidence that hourly data is perfectly adequate …
w.
Hi Willis,
Thanks to your reply here on the exchange to Clyde. I sent you a post last night (Jan 19, 8:37 PM). That post, I hope, will close more of the gaps in our understanding. It also addresses some or most of your concerns with the way I have treated 288 samples, etc.
I think we are getting close to a harmonious understanding – but I’ll wait for you to reply to last nights post. I’ll address a few things you said here. And I’ll have a reply to Clyde tonight after I return.
Willis said: “Here’s an oddity for you. The good folks at NOAA have averaged the 4,320 samples per day or whatever they are taking into 5-minute segments. In essence, this has filtered out any frequencies with periods shorter than 5 minutes.”
My reply: Willis, I agree with you. What NOAA did is strange. I struggled with how to comment about in in the paper without wasting too many words and distracting people on an already complex subject. As I see it, NOAA’s 4,320 samples averaged down to 288 is actually different and superior from a design perspective than a pure sample rate of 288. If you have a signal with cycles faster than 144-cycles/day then 288- starts to alias this. If that content is large in amplitude then the aliasing is large. For our air temp signal I think we agree that this is not the case so I’m not trying to assert that. When NOAA averages, they lose the particular samples but the data is there in the sample. While individual frequency components are lost in the averaging, the energy is there and the mean calculated with 4,320 and 4,320 averaged to 288 should be the same except for sample rounding. About using 4,320: This frequency is still very slow for converters. It might be done to make the anti-aliasing filter easier to implement (lower cost, smaller components), and it wont introduce as much phase shift or pass-band ripple. It is also possible that the average down is to just not have to deal with so much data. Not that it would be a lot of data by today’s standards.
Willis said: “Of course, this means that the highest frequency remaining in the digital signal is 288 samples per day … and Nyquist says we have to sample at twice that frequency.”
My reply: See my post from last night. Selecting the sample rate is to not alias any potential content in the signal that you want to keep. Anti-aliasing filters are aligned to this.
Willis said: “But unfortunately, not having access to the analog signal, we can’t do that … which strictly speaking means that even the 288 samples per day is below the Nyquist limit. To which I can only say … so what? What practical difference does that make?
So that’s our first disagreement.”
My reply: Ok, I think I see your point. NOAA themselves have chosen to throw away data in their averaging. Well once you sample properly you can digitally throw away data if done properly. That isn’t aliasing. They lose frequency components but the energy should be retained in the average if I’m thinking clearly.
Regarding that entire tangle that started when you said something like “we can sample 2x the frequency of interest”: There were a few others, primarily Nick, who said it is okay to alias because we are only interested in the long term trends. Others said it was not even possible to alias. If the aliased content is very small then he is right that you might be able to get away with a violation. But if not then it creates problems. Either way he was promoting an idea that violates the most basic requirements of sampling. When you said sample the frequency of interest I heard that as echoing what Nick said. My entire case was dependent upon people, other people reading our exchanges, to take in the concept of proper sampling. When 2 of the most respected people on the forum started to derail the most basic concept I felt I had to speak strongly to counter that. There was no disrespect intended. Note I never went into personal attacks or countered any personal attacks on me. I did apologize to you and you accepted. I think it was after the incident you mentioned. Now more on this “frequency of interest”. This detail matters. As an engineer, when designing the system, you can arbitrarily set the Nyquist limit. If you, or the climate scientist who sets the specifications tells you that only frequencies below a certain point are valuable for the research and frequencies above this are noise, then that can guide the design. Should guide the design. Just like the audio examples we talked about for CD audio with sampling at 44.1ksps. You said that I thought there was something “holy” about 288-samples/day. No, I just think your perception of what I intended is not correct. As explained in my paper last night, we are focusing on 2 different things. An engineer sets guard bands, captures all of the content, even if the frequency components are not commonly experienced. I acknowledge in that post that you appear to have provided convincing analysis that 24-samples per day captures most of the signals. As an engineer I’d like to capture all of the content I saw. So if more analysis showed that 72-samples/day gave sufficient guard-band then I’d go with that. I thought the more conservative approach was to align with NOAA’s 288 – assuming they actually did research to come up with their system. And overkill in this instance is not a bad idea. Also, my real point in the entire paper was to get people to see that 2-samples/day, whether regularly timed or max/min are inferior to a higher sample rate. I didn’t write a paper to get people to bow to 288-samples/day (said with a smile).
I’m eager to hear your thoughts after reading my reply from last night. I think we are getting close to agreeing – or at least a much more harmonious disagreement.
William, upon re-reading what I wrote I realized I owe you an explanation.
I have nothing but my reputation. I have no diploma in science. I took a total of two science classes in college—chem 101 and physics 101. I am 100% self-taught. I have nothing but my thousands and thousands of hours of study, my interesting ideas, my unquenchable honesty, my honor, and my reputation for admitting my mistakes when I make them as we all do. Plus half a dozen papers published in the peer-reviewed journals and over a hundred citations to those papers.
As a result, people think that they can take free shots at me. One such person was one of my scientific heroes, Dr. Roy Spencer. One day somebody must have pissed in his oatmeal, and he up and wrote a particularly ugly and untrue post attacking me. He falsely claimed that I was taking credit for another man’s discoveries, which was a damned lie.
I wrote a post in reply, explaining exactly where he was wrong … but the damage was done, he didn’t have the hair to apologize, and as a result, to this day I get fools and idiots telling me “Oh, we don’t have to believe a word you say, Dr. Spencer said so!”
Gotta say … my respect for Dr. Spencer took a dive that day …
That is why your ugly and untrue attack on me was so disturbing. And now, unless you finally get the balls to apologize, fools and idiots will no doubt say “Oh, Willis, we don’t have to believe you about signal analysis, William Ward said so!”
And that is why I have asked for a clear apology, to keep fools and idiots from believing you. You claimed I was foolish and ignorant to say that the practical Nyquist limit was NOT twice the highest frequency in temperature signals, and then shortly afterward you said the very same thing I’d said, that in fact 288 samples per day or fewer is the practical Nyquist limit for temperature signals.
Somebody showed that they were foolish and ignorant in that exchange, but it sure as hell wasn’t me.
Respectfully,
w.
Willis
There are a couple of particularly obnoxious alarmist trolls that I have been exchanging comments with on Yahoo. They seem to think so highly of themselves that they behave as though they have a license to insult. I wouldn’t be surprised to discover that they have been banned from WUWT for their behavior. One of them, it doesn’t really matter because they use pseudonyms (and I suspect it is really one person with different personas) had provided me, unsolicited, with what they thought was an accurate representation of your background after I had linked to one of your articles. I never bothered to try to confirm because it didn’t matter to me. You are someone who has demonstrated numerous times that you are able to think outside the box, are facile with acquiring and processing data in an understandable way, and have made contributions to understanding climatology. As a colleague remarked to me once, a ‘sheepskin’ opens doors to certain jobs, but doesn’t guarantee that you can figure out when to come in out of the rain. You have no need to apologize for the lack of a degree. Now, having said that, as I have told Mosher, self-educated people often have gaps in their knowledge-base that they are not even aware of. So, you do need to exercise some humility when it is possible that you are making claims that are outside your area of expertise. But, that applies to everyone! I’ve worked with FFTs for years, but I don’t consider myself an expert in the subject.
Willis says (of Roy Spencer): “He falsely claimed that I was taking credit for another man’s discoveries”
…
He did not do that.
…
He pointed out that you were “re-inventing the wheel,” due to the fact that you did not research prior work.
Spencer’s exact words were: “But don’t assume you have anything new unless you first do some searching of the literature on the subject.”
…
http://www.drroyspencer.com/2013/10/citizen-scientist-willis-and-the-cloud-radiative-effect/
Willis,
This is in reply to your post from Jan 20 at 12:05 PM.
I thought we were getting closer to agreeing on the technical issues, but I don’t know yet because the personal issues are in the way. I would like to resolve this, so I think that means we will have to discuss our personal processes a bit. You have already done that quite a bit and told me much about your thoughts about me. Now, I will share with you some of my thoughts about you.
First, I think it takes a lot of courage to reveal something that has caused you emotional pain. I don’t know the details about your interaction with Dr. Roy Spencer, but I understand it has marked you. I’m not taking sides in that – just acknowledging the impact the interaction had. Revealing that to someone (me) whom you think has been or is hostile to you takes even more courage. It was some time ago that I first started reading your posts on WUWT. You immediately stood out to me because of your insights and analytical capabilities. I was quite impressed, and I thought: “I’d like to meet that guy”. I started to wonder about your background because your ability to whip out all kinds of analysis was a real stand-out. I remember Googling your name and I did come across one of those (despicable) websites that catalogues all of the “deniers”. You were in there and there were criticisms of your work and your education. I remember reading that you were self-educated. It was written as if it were some kind of deficiency. I thought that if it was true that you were self-educated with all of the analytical capabilities you demonstrated on WUWT, then my estimation of you just jumped up an order of magnitude. I know how hard it is to get through an engineering curriculum. But most do this at a young age, with the financial assistance of their parents – so they are not working or working much. They can focus on their studies. They have the benefit of being forced to be disciplined lest they fail out. They have the benefit of fellow students to bounce ideas off of, TAs and Professors and lab assistants to help them learn this crazy difficult stuff. Then they get out in the working world and have more senior co-workers to guide them along in their development. There is also the incentive that you need to succeed or fail in your career. Going through this is difficult, but success is common. In contrast, I doubt 98% of people who succeed in this path could have “self-educated” themselves to the extent you have. Once I had all the tools of engineering at my disposal and having been forced to “learn to learn” – now I too self-educate. I taught myself audio engineering and started a successful audio engineering and mastering company. I also started a record company – in parallel to working the corporate world. 10 years ago, I got into building and renovating properties and got my general contractors license. I retired from the corporate world at a young age to run my businesses and part of that is home building and real estate investing, along with the audio work. So, I appreciate someone who self-educates. I do not think I could have done what you have without first making it through the system. What you have done is something to be proud of and I would not let the lack of a diploma to have any meaning to you except great satisfaction of your independence, self-reliance, tenacity and natural capabilities.
While “watching “you on WUWT I also noticed that you moved fast, thought fast, were quick to judge/evaluate, quick to share your opinions and your opinions (even if data based) were strong and sometimes overbearing. I’d also like to add quick to confront and not bashful about being blunt. These are not necessarily “bad” qualities, but ones that invite in-kind communication. Now, imagine for a minute my thoughts and feelings after putting together a well thought out and thoroughly reviewed paper based upon my career long experience/expertise, to have someone dash off a quick analysis and then publicly proclaim with CAPS that the most fundamental issue in the paper – (a fundamental concept that is “101” to the discipline) is not correct. The line was: “I stand by that. As I said above, Nyquist does NOT mean that you have to sample at 2X the highest frequency in your data, just at 2X the highest frequency of interest.” It just wasn’t any person saying this. It was the guy whom I admired for his tremendous analytical skills (you). The point of contention was very basic – definition of Nyquist theorem. So maybe you were telling me I was the idiot. 35 years of work, put hundreds of hours into a paper and I can’t even get the 101 right. Actually, I didn’t go to that place in my mind, but it did seem to threaten to shut down the progress of communication and exploration around the subject. If I were a timid person and didn’t push back, I think a lot of readers might have dismissed the concept of Nyquist. After all, Willis proclaimed it wrong. There is a lot of good stuff on WUWT. It would have been easy for readers to move on. I had plans to engage around this topic and your far too fast proclamation threatened that goal. I’m not a timid person so I pushed back. I did say that the error you were making was “101”, but I needed a counter force to equal your style of fast judgement. To make you pause and consider maybe the person who wrote this is qualified and knowledgeable too – maybe you should slow roll the conclusions. Maybe I overestimated the force needed. In hindsight I should have said something like: “Wait a minute Willis. The definition of Nyquist is pretty fundamental – we can read a text book together and get the definition. When you say you don’t need to sample 2x the frequency content and that you only need to sample 2x the highest frequency of interest, what do you mean by that? Can we explore and discuss this?” I didn’t. I missed an opportunity to do it much better. I expect I’ll do better next time. But it is almost ironic, Willis. What you did with your assessment of my work is very parallel to what Dr. Spencer did to you. However, seeing how you respond when angry and hurt, I’ll bet your response to him was stronger than mine was to you.
I don’t agree with your assessment that my statement was an “ugly attack”. I was saying you were fundamentally wrong on the issue. I was not therefore dismissing all of your other great qualities. I was not dismissing you as a person. I think your fears that your background (lack of diploma) will haunt you are unfounded, but I understand the emotions are real. My advice is to just tell anyone who downs you for lack of diploma to effe themselves. Now, you tried to shame me into apologizing to you: “And now, unless you finally get the balls to apologize…”. I don’t need to be shamed into doing what is right. I see you are upset so I’m overlooking a lot of behavior toward me that I think is worse than what I did in quality and quantity. I can’t apologize for making an “ugly attack” – because that isn’t what I did – not something I believe I did and not what I had in my mind or heart. But I see how upset you are – and since my words did that, I’m remorseful. Willis, I’m sorry that how I spoke to you made you feel bad about yourself and mad at me. If I could do it over I would. For you, for me and for everyone else reading.
I’m not a fan of how you can be when you are hurt and angry but I’m a fan of your capabilities and analysis. However, I recommend you think about slowing down your snap judgements and allow more time and space for contrary opinions. The speed and intensity you use can trample others at times – but even that doesn’t invalidate you or your capabilities. I just thought it might be appropriate for me to share my thoughts on this.
I really want to resolve this with you Willis because I don’t think there is any reason for either of us to harbor this. I’d also like to see how much honest harmony of understanding we can get on the technical subject. I value your input and your capabilities. Maybe all I said here wasn’t what you expected or wanted but it is the most honest and benevolent response I can offer.
William Ward January 20, 2019 at 10:18 pm
William, first, thanks for your explanation. Here’s the part I don’t get.
You started out by saying that we have to sample at twice the highest frequency in the temperature signal. This, as I pointed out, is a frequency with a period on the order of a second or fractions of a second. I said no, there was no reason to sample at that high a frequency, that we can sample at a lower frequency because we’re not interested in those high frequencies.
Then, after insulting me instead of just saying I was wrong, you went on to say well, no, we don’t really have to sample at milliseconds to be at twice the highest frequency, we can sample at a far lower frequency. You say we can sample at or above what you call the “practical Nyquist limit” of 288 cycles per day, which is far, far below 2X the highest frequency in the temperature signal. Here’s the discouraging part.
THAT IS EXACTLY WHAT I SAID, AND WHAT YOU CLAIMED WAS A FOOLISH NEWBIE ERROR!
Not only that, but I’ve shown that the error from sampling at a twelveth of that “practical Nyquist frequency” is trivially small—mean error on daily averages on the order of 0.005°C, RMS error of 0.05°C, maximum error 0.25°C. And obviously, since the errors are symmetrical, the error on the monthly averages is smaller than that.
So no, William, I fear I’m not buying your explanation. If your initial claim were true, you would still be saying that we need to sample on the order of seconds. But you’re not. You have agreed with me that sampling at 288 samples per day, or perhaps even hourly, is entirely adequate and satisfies your “practical Nyquist limit”.
And what is this “practical Nyquist limit” based on? For example, could we use your practical Nyquist limit of 288 samples per day if we were interested in the one-minute fluctuations in the signal?
Of course not. Those frequencies are way above the practical Nyquist limit.
But 288 cycles per day is above the frequencies of interest, which are almost exclusively the daily averages that are turned into monthly averages and long term trends.
Now, let’s recall, this is YOU saying we don’t need to sample at twice the highest frequency in the signal. Instead, you are saying it is OK to sample at something well below the highest frequencies but above the frequencies of interest … and in that context, please consider my statement:
That’s what really frosted my banana. After dissing me for making a claim that we don’t have to sample temperature data at a frequency of milliseconds, YOU say we don’t have to sample temperature data at a frequency of milliseconds.
And your diss? That was the final straw. It wasn’t that I was wrong. That would have been fine. It was not even an emphatic statement that I was wrong. That would have been fine too—I’ve been wrong many times. Heck, I’ve even got a whole post called “Wrong Again”, posted because I was indeed wrong … and believe me, that’s not easy to admit in public. But it’s something I’m fanatical about—if someone can show I’m wrong, I will admit it with no hedging.
No, your insult was that I was wrong because I’ve never taken a college course in signal analysis … which makes you no better than the other jerks out there who think that my lack of a formal education is an ironclad mystical guarantee that I can be safely ignored.
So no, William, you are not the good guy in this. Yes, I probably over-reacted; but I’m really, really tired of the long line of pricks who claim that everyone can ignore me because I haven’t taken a college class in their favorite subject. And despite your good intentions and your good nature, it turned out that you are just another in that long line. As soon as the dispute started, you reached for that ever-present and well-worn personal attack, which is always the same bogus claim—that “Willis didn’t study this in college so all of you can and should ignore him totally.”
Can you understand now why I reacted as I did?
Look, William, I do respect your knowledge, as I respect that of all people with a deep and thorough understanding of their chosen subject. But when you started out, in your very first response to me, by ragging on my lack of formal education, I fear my respect for you as a person took a huge hit … and it is only my respect for your knowledge and my sense that your social skills aren’t that sharp that has kept me in the discussion.
As I said several times, it does seem that you really don’t understand the effect of your words, which in part is why I’ve stayed in the discussion. I don’t think you set out to join the aforementioned long line of pricks, and I don’t think you even realized you joined them … but join them you did, and emphatically so …
Now as I said before, I’m willing to reset and go forwards. I don’t like bearing grudges. And as you said, I don’t think our remaining disagreements are large.
But before we dive back into the science, I did want you to understand very clearly what your words look like and what effect your words have from this side of the silver screen …
And with that out of the way, returning to the science I still have not found any USHCN sites which have a significant difference between hourly samples and 288-samples, either in the mean results or in the trends.
And I still have not found any sites where there is any kind of significant aliasing of higher frequencies into the hourly samples. Yes, there is aliasing into the two-hour samples. But as I demonstrated above, I haven’t found any in the hourly samples.
So I’ll ask again—do you have any actual examples where using hourly results gives significantly different answers from 288 samples per day, and if so, what and where are they? I’m happy to be proven wrong, but it takes facts to do that, not accusations about my well-known lack of formal education.
My best regards to you, and I do regret that we got off on the wrong foot,
w.
Hi Willis,
You ended you post with “Your Friend”. Well alright! Thanks Willis.
Willis said: “All the data that I’ve looked at give a mean daily error on the order of five-thousandths of a degree; an RMS daily error on the order of five-hundredths of a degree; and a maximum daily error on the order of ± a quarter of a degree. Together these add up to a trend error on the order of a few thousandths of a degree per decade. None of these are significant in the field of climate science.”
My reply: Can you clarify what you are comparing here? Is it 24-samples/day vs. 288-samples/day? Or is it one of those vs. max/min? I’m assuming the former, but please clarify so I can respond to the correct concept. I’m not hung up on the difference between 288 and 24. We have shown error between 288 and max/min. Paramenter has provided some good information in addition to mine. If 24-samples/day produces the same error as 24, this doesn’t really change the core message. I think we are in agreement.
Willis said: “I don’t see the “oscillation” that you mention so perhaps I don’t understand what you are referring to.”
My reply: Look at my Fig 2. As you read the chart from the bottom up (increasing sample rate). If you were to plot the error vs sample-rate would decrease from 0.7 or 0.8 to 0.1, cross over zero to -0.1 and then back up to 0. I didn’t plot other rates, so we don’t know what it does other than the ones I show for that example. Not exactly an oscillation, but a convergence with ripple. The error changes signs. Your analysis is RMS. Can you explain how you account for sign of error in your analysis?
Willis said: “A much more important question is, what we can do with the errors that using min-max has created in the past?” And: “So let me invite you to consider that question, of how we might minimize the errors of the traditional method ex post, as a much more important puzzle than the exact reason that we get errors from the traditional method. I’d be very happy to hear your thoughts, particularly on removing the aliasing …”
My reply: An admirable goal! I wish I had a more optimistic reply to match the good intention of your goal. There are plenty of texts you can refer to. I found this brief paper to be convenient:
http://www.dataphysics.com/downloads/technical/Effects-of-Sampling-and-Aliasing-on-the-Conversion-by-R.Welaratna.pdf
Quoting the paper: “Aliasing is irreversible. There is no way to examine the samples and determine which content to ignore because it came from aliased high frequencies. Aliasing can only be prevented by attenuating high frequency content before the sampling process…”
Maybe if you study the individual station signals you can come up with some innovative way to reduce the daily mean error generated by max/min for days in those stations. If you can do this successfully for day after day in a station then maybe you are on to something. I’ll think about this some more…
Willis,
I’m replying here to your post where you advised me I’m now cataloged under P for Prick. I’m just going to overlook all of that vitriol, not because it has no impact on me, but because my most honest response to you is to see the immense amount of visceral pain and hurt you seem to be in over this issue. I feel sad that you have experienced this in the past and that you carry this scar. Its a very human issue most of us can relate with and therefore easy to have compassion for. I’m sure I’m not going to be able to convince you that I wasn’t thinking at all about your education. Yes, I had read the info on that nasty blog, but 1) it was not forefront in my mind, 2) even if it were I had no confirmation from you that you were self-educated until after you revealed it to me, and 3) as I said before I deeply admire what you have done. I’m not the slightest bit critical of it. I won’t spend any more energy trying to convince you as you are not really open to it at the moment. Maybe these words will mean more in the future. I was simply fighting over the point in the discussion and I was trying to push you back hard to get you to pause and consider your position. As I said previously, there was a better way for me to do it – and I missed that opportunity.
I won’t try to further untangle the mess around “practical” implementation of Nyquist. I have restated and clarified my position at least twice, maybe 3 or 4 times if you include all of the people I talked to about it. I can’t undo the confusion. I can only ask that we try to understand that it was confusion and mutual impatience in communicating. My clarifications are there if you want to take them in. Otherwise, I don’t think I can say anything more that is constructive.
You said: “And with that out of the way, returning to the science I still have not found any USHCN sites which have a significant difference between hourly samples and 288-samples, either in the mean results or in the trends.”
My reply: I didn’t do much of a study between 24-samples/day and 288-samples/day. I focused on 2-samples/day or max/min and 288-samples/day. In my table of Fig 2 I show that 24, 36 and 72-samples per day are +/- 0.1C from 288-samples/day. I have other similar work I did that showed similar results. But my study of this was not exhaustive. It was not my focus, but I see the value of what you added with your analysis. I did say in my paper that 24-samples may give good results, depending upon the spectral content. But again, I went back to my engineering approach and thought while +/-0.1C is “small” to my thinking, 1) it seems to matter to climate science and 2) NOAA used 288 and 288 seemed to allow the engineering requirements to be satisfied. Seeing the error start to oscillate around +/-0.1C for 3 rates before reaching 288 suggested to me that we were converging toward a good design. Also, I was trying to expose people to a problem with the numbers we are “fed” in the narrative of alarmism. I wasn’t being asked to design the next generation system, so I though going with the “reference” network NOAA came up with probably had some proper thinking that went into it. I accept your analysis about hourly vs 288 – it actually fits with the smaller sample in my analysis, that most days do well with hourly sampling. Again, the phrase “do well” is rather subjective without a specification or definition of that term.
Does this explanation satisfy and resolve our technical differences? I won’t ask about the interpersonal differences. Maybe time and new, more positive interactions will allow that to heal in the future.
I think and hope we can all take away that sampling theory plays a role – significant role in measuring temperature, and climate science seems to overlook/ignore that and there are mean errors and trend errors that result. There are obviously many other errors that factor in. (I provided my list of 12 errors/issues in a post to others.) But I had not really seen violating Nyquist/sampling theory as an issue in the conversation about climate. So I wanted to bring something new to the discussion.
William Ward January 21, 2019 at 8:36 pm
William, I absolutely do understand that you didn’t realize that when you accused me of a lack of formal education that you were talking about my lack of formal education. It’s the only reason that we’re still in this discussion.
But the fact remains that you were indeed talking about my lack of formal education … which was the point I was trying to make when I said that you really, really don’t see what your words do.
I do understand that there is a difference between practical implementation of Nyquist and the theoretical implementation of Nyquist. That’s the difference that I was trying to point out when I got shut down by insulting my education …
Thanks for that. Let’s start with the data. All the data that I’ve looked at give a mean daily error on the order of five-thousandths of a degree; an RMS daily error on the order of five-hundredths of a degree; and a maximum daily error on the order of ± a quarter of a degree. Together these add up to a trend error on the order of a few thousandths of a degree per decade.
None of these are significant in the field of climate science.
I’m sorry, but I have no idea what this means. My results show the following:
I don’t see the “oscillation” that you mention so perhaps I don’t understand what you are referring to.
I have not yet found any hourly data that does not compare very well to the 288-sample data. Nor have I found any aliasing in the hourly data, that you warned about, although it certainly exists in the lower freqencies.
It’s an important question, because while we have very little 288-sample data, we have much, much more hourly data … and as I’ve said before, I respect your opinion in these matters.
My friend, as I’ve said, I don’t think you knew what you were stepping into. I’m not angry with you. I believe that you are a good guy who unknowingly stepped into the wrong long line … and yes, we are very close on the technical questions.
Given the size of the error from two samples per day, it is obvious that Nyquist plays a role in the genesis of the error. However, that’s only of theoretical importance. A much more important question is, what we can do with the errors that using min-max has created in the past?
I’ve had a couple of insights in that direction. The first is that the (max+min)/2 errors have a strong annual cycle. This offers the possibility of either basing our trends on the months with the least errors, or of subtracting the known error structure from the data. I suspect that the structure of the errors is in part due to the times when the temperature crossing the freezing point of water, which would allow for a more general application. But that’s just a guess at this point.
The other plan of attack is that as you’ve pointed out, aliasing is a problem when we sample below the Nyquist limit. It seems to me that it might be possible to figure out the structure of the aliasing, at least the part of it due to being below the Nyquist limit and remove it …
So let me invite you to consider that question, of how we might minimize the errors of the traditional method ex post, as a much more important puzzle than the exact reason that we get errors from the traditional method. I’d be very happy to hear your thoughts, particularly on removing the aliasing …
My best to you, and as always, my thanks for your constructive tone.
Your friend,
w.
Willis
You said, “… how we might minimize the errors of the traditional method ex post, as a much more important puzzle than the exact reason that we get errors from the traditional method.”
It seems to me that if it is possible to correct ex post, the solution would be easier if we were certain of the “exact reason that we get errors from the traditional method.” Inasmuch as the mid-range value is acknowledged as not being as robust of an estimator of the central tendency as is the mean, I’m not optimistic about the probability of making acceptable adjustments.
Clyde Spencer January 22, 2019 at 8:27 am
Thanks, Clyde. That assumes that Nyquist errors are not correctible. This is where beginner’s mind comes in. I start with the assumption that things are fixable, even Nyquist errors.
It seems to me that the real question is not the origin of the errors, it is the structure of the errors. In the Redding dataset, for example, the monthly error varies regularly over the course of the year from about 0.2 to 0.8 °C, with a minimum in the summer … and that seems to me to indicate that we could remove at least some of that error.
I also need to run a periodogram of the Redding errors, to see what is happening. That may also suggest some error correction methods.
Finally, my hope is that William or you or some other signal engineer will come up with some lines of attack. I mean, aren’t signal guys supposed to be the ones able to clean up messy, noisy signals? I suspect he (and others) know of methods I haven’t even dreamed of …
Always more to learn …
w.
Willis
You said, “It seems to me that the real question is not the origin of the errors, it is the structure of the errors.”
I have been giving some thought to this and I think that the explanation for the mid-range value differing from the mean is a result of the daily temperatures being skewed or unsymmetrical. It is similar to the usual situation of mean, median, and mode being identical for a symmetrical, normal distribution, but the median and mean shift when there is a long tail on the distribution. Thus, I would speculate that the mid-range and mean temperature would be most similar about the time of the equinoxes (lag?) and would have the greatest difference about the time of the solstices. That is a generalization. Because the daily high temperatures usually occur in the late afternoon in the Summer, the peak insolation (noon) may not be driving the symmetry. Additionally, any cold front will probably distort the temperature distribution at any time of the year. It is the latter problem that leads me to believe that there is too little historical information to correct the mid-range values.
Thanks, Clyde. For me, the oddity is that there is a trend inherent in the error between the traditional and the true daily means. It’s not clear to me why the error would change over time.
Sadly, after doing more work, I’m slowly coming to your conclusion, which is that we have too little data to determine what’s happening. We can improve the situation by removing the annual cycle of the trends. That will make the data more accurate, but it doesn’t fix the real issue, which is the trend in the error …
Seems to me that the error is related to certain weather conditions, and that if there is more or less of whatever that condition is, we get either more or less trend in the error … but that is handwaving which doesn’t easily translate into mathematical procedures.
I continue the investigation … and will report any findings of interest.
Best regards,
w.
Willis,
You said, “… the oddity is that there is a trend inherent in the error between the traditional and the true daily means. It’s not clear to me why the error would change over time. … We can improve the situation by removing the annual cycle of the trends. That will make the data more accurate, but it doesn’t fix the real issue, which is the trend in the error … Seems to me that the error is related to certain weather conditions, and that if there is more or less of whatever that condition is, we get either more or less trend in the error.”
We know that the minimum temperatures are increasing more rapidly than the maximum temperatures. That is one clue. I have speculated, previously, that the different climate zones are experiencing different rates of temperature increase, which means that results may be sensitive to the selection of stations.
Fundamentally, the difference between the mean and mid-range temperatures is related to the shape of the daily temperature curve. If there is a change in the trend, that would suggest to me that there is an unidentified process shaping the curve, i.e. changing the skewness. That is a second clue. Yes, there is a dearth of high-quality data to unravel the mystery.
However, something that you might want to consider is to treat the daily temperature curve as though it were a distribution of the frequency of temperatures and calculate a pseudo-skewness, as an index to work with, and then see if there is a correlation with the mean/mid-range error.
I’ll be interested in hearing what you come up with.
Willis,
“It seems to me that it might be possible to figure out the structure of the aliasing, at least the part of it due to being below the Nyquist limit and remove it …”
Yes, you can do that. The main contribution is from the aliasing of harmonics of the average diurnal cycle with the sample frequency. So just calculate an average diurnal cycle, get its DFT (hourly sampling will do). If you start from midnight, the sampled means are just equal to the initial sampled value, which is the coefficient of the cos coefficient of the DFT for that harmonic at the sample frequency.
For example, if you sample twice a day, that will alias with the second harmonic of the average diurnal cycle d(t). Suppose the DFT
d(t)= a1*cos(w*t) + b1*sin(w*t)+a2*cos(2*w*t)+ etc w the angular diurnal frequency
It also aliases with the 4th, 6th etc. So the error due to alias is
a2+a4+a6… This is close to a2
If you sample 3x, then error is a3+a6+a9…
I can produce more details, numbers etc.
Nick Stokes January 23, 2019 at 1:54 am
Thanks, Nick, I knew someone would have an answer.
I also like Clyde’s idea of calculating the skewness of the daily temperature data and using that to unravel the Gordian knot …
w.
Willis,
You said, “I also like Clyde’s idea of calculating the skewness of the daily temperature data and using that to unravel the Gordian knot …” I lost some sleep last night thinking about that off the top of my head remark. Because it has been decades since I took a statistics class, I had forgotten the details for calculating skewness. So, I did some background reading today to refresh my memory. While the calculation is rather straight forward, it involves cubing the difference between the mean and sample temperature and summing all the cubed differences. It appears that there is an issue of stability or robustness of the calculation, perhaps related to the cubing. In particular, at least one article suggested that this is a good example of needing to rely on the Law of Large Numbers to get a reliable estimate. That is, a sample of at least a couple thousand was necessary to get close to the result derived from 5,000 samples. What’s worse, it isn’t just a matter of asymptotically approaching the true value, but the numbers oscillate above and below the true value as the number of samples increases. The advice given was not to rely on calculations of skewness (or kurtosis) for samples under a few thousand and to instead rely on a histogram for a subjective estimate of the skewness. I’m not happy with a subjective estimate, nor am I happy with having to bin the temperature data to prepare a histogram because it reduces the accuracy and precision. Therefore, I’m thinking about a different approach to model the skewness using the shift in the mean compared to the median or mode obtained with binning. I’m going to have to think about this some more. I’ll get back to you if I think I have found a workable solution.
Thanks, Clyde. The other problem with skewness is that you need all of the data to calculate it. My method calculates the relationship between trend error, and max and min trends. Once that is calculated, I would think it could be applied to “nearby” stations, or in particular earlier records of the same station, without needing all of the data to calculate it.
However, it’s likely worth taking a look at skewness, or at a minimum the median/mean difference, to try to at least understand what is going on.
w.
Willis
You said, “The other problem with skewness is that you need all of the data to calculate it.” I’m assuming that stations with high temporal resolution can be used to establish a relationship between the mid-range values and the true mean, based on the assumption that it is asymmetry in the daily time series that is responsible for the mid-range value being different from the mean. Then, that might be used to correct the mid-range values, IF we can come up with another descriptor or predictor of the skewness, such as the season.
Yet another problem is that while the skewness might be related to the seasons, I can imagine situations where unusual weather might give an unusual skewness. At this point (I’m basically thinking while typing) perhaps multiple regression of all the available meteorological data might reveal some single or combined correlations. The essence of the problem is that two temperatures per day isn’t a lot to work with, but typically other meteorological data such as humidity, wind speed and direction, and cloudiness might be helpful.
OK, Clyde, I looked at skewness, and it’s ugly …
I generated 100,000 data points with a Poisson distribution. Skewness is 0.43, as you might expect.
From that Poisson data, I selected 288 data points at random and got the skewness. I repeated that 1,000 times. Here’s the summary of the results:
The answers go from 0.02 to 0.94. The interquartile range is 0.32 to 0.52 … ooogh. Ugly.
Well, it was fun while it lasted.
w.
Willis
You have confirmed what I read in the online article today — conventional skewness is a poster child for the utility of the Law of Large Numbers. One needs much more than 288 samples; at least an order of magnitude more!
Willis
You said, “OK, Clyde, I looked at skewness, and it’s ugly …” Inasmuch as it appears that the conventional calculation of skewness (third moment) is not robust or reliable, I’ve been thinking about an alternative metric.
For a symmetric frequency distribution, all the measures of central tendency are coincident. As a tail begins to stretch out, the mean moves away from the mode in the direction of the stretching. So, I propose a different metric for skewness. Find the difference between the mode and mean for a given time series, where the sign indicates which tail is skewed. To adjust for different means, divide by the standard deviation to transform the difference into z-scores. I haven’t explored the effect of the standard deviation increasing with the stretching of the tail, but I suspect it will just dampen the rate of change of the z-score. The bottom line is that all three metrics (mean, mode, and SD) are readily available in all statistics packages and even Excel. So, calculating “Spencer’s Skewness index” with high-temporal resolution temperature data (e.g. 288 samples/day) should give you something to plot mid-range values (or mid-range to mean error) against, to see how asymmetry in the temperature data affects the error.
I’d rather see something that measures the asymmetry of the daily temperature curve, but the only thing that comes to mind is to treat the daily temperatures as a frequency histogram where the time is replaced with a dummy value where the Tmax is treated like the mode, and assigned a value of zero.
Willis,
You said, “…but so far all I’ve learned from you is that you are an arrogant man who thinks he knows more than everyone else, and who is unwilling to admit it when he makes an error.” There is an old saying that when you point a finger at someone, there are three fingers pointing back at yourself. I think that your response is out of line. William has been quite the gentleman and doesn’t deserve that level of incivility.
Most of us who post here clearly have a high opinion of ourselves. But, you have basically lowered yourself to the level of an ad hominem attack on William. I think it would better to just agree to disagree if you can’t provide an argument William is willing to accept. Such remarks do not become you!
You also remarked, “Climate signals have pseudo-cycles that appear and disappear at random, only to be replaced by some other pseudo-cycles.” What you are calling “pseudo-cycles” can be explained as constructive/destructive interference by out-of-phase sinusoids revealed by Fourier decomposition.
A little more civility, please!
Clyde, thank you for your comments.
Clyde, I said quite clearly:
So I’m aware of my own faults …
However, when a man comes in and accuses me and everyone here of having closed minds, and claims that we’re ignorant, I’m sorry, but I’m gonna hit back twice as hard. And the sooner William notices that, the sooner he’ll quit that nonsense. It’s not doing him any good.
Finally, you say:
I’m sorry, but that assumes that there are underlying sinusoids that are invariant in frequency, phase, and amplitude. If you can demonstrate that, please do. The daily sunspot data might be a good place to start. Until you or someone can do that, I’ll continue to call them “pseudo-cycles”.
Best regards,
w.
Willis
It is entirely conceivable that there are forcings that come and go. That results in actual, real-world periodicities that are only transient.
However, it is my understanding that the utility of the Fourier Transform is that ANY varying signal can be represented by a decomposition into sinusoids of different amplitudes and phases. Doing the decomposition can provide insights on the frequency of what you are calling “pseudo-cycles.” From that, one might conclude what the nature is of the actual forcings. However, to do that, one might need to at least meet the Nyquist Criteria for sampling, which might be as high as every few minutes. Therefore, as Ward has been suggesting, we should probably be acquiring modern data towards the end of being able to faithfully reconstruct the time-series so that analyses above and beyond the temperature question can be addressed in the future. That is, we shouldn’t restrict ourselves so that at sometime in the future we have to invoke the Stokes Lament, “That’s all we have to work with!” What I read Ward as saying is, “Let’s do the best job we can so that we can move forward, and not just accept as adequate that which we have inherited.”
Clyde Spencer January 19, 2019 at 12:28 pm
Thanks, Clyde. If you want some fun, take a long natural signal, say daily sunspots. Do a Fourier decomposition on the first half of the data, and then another on the second half of the data … what you are very likely to find is that each half is made up of very different sinusoids.
Heck, I’ll save you the trouble. Here you go …
As you can see, the issue is not interference patterns from underlying stable cycles. It is that the underlying cycles themselves come and go, appearing, changing in frequency, phase, and amplitude, and then disappearing …
Which is why I call them “pseudo-cycles” …
Regards,
w.
Willis
Thank you for spending the time to generate the periodogram. I’m not surprised by the results. Sunspot numbers and the shape of their envelope are known to vary with time. Taking the beginning half and the ending half is effectively two different signals with the commonality of an approximately 11-year base signal with unknown influences superimposed. I’d be surprised if they looked identical.
However, a periodgram isn’t quite what I had in mind because it is a summary of the power of the apparent frequencies. Notably, it is missing phase information of the composite sinusoids.
“The power spectral density, PSD, describes how the power of your signal is distributed over frequency whilst the DFT shows the spectral content of your signal, the amplitude and phase of harmonics in your signal.”
https://dsp.stackexchange.com/questions/24780/power-spectral-density-vs-fft-bin-magnitude
If you take two vibrating tuning forks in proximity, they will generate an apparent third tone that varies in amplitude commonly called a “beat.” What would a periodogram show?
William Ward January 17, 2019 at 10:23 pm
We can ignore the energy above the 3rd harmonic (8 hours per day) because it doesn’t alias into the hourly data. Absolutely, as you show, sampling at 12 cycles per day (every 2 hours) aliases the signal. Let me demonstrate. Here are six days of temperature data sampled at 288 cycles per day (every 5 minutes).
And here is the periodogram of the data. As you can see, there is little strength at anything faster than about 8 hours (3 cycles per day).
Now, you are 100% right that we get aliasing when we sample at 12 cycles per day (every 2 hours). Here’s that periodogram, with the 12 cycle per day periodogram (red) overlaid over the 288 cycle periodogram (blue).
As you can see, there is a very large aliased cycle at four hours. Ugly.
But let’s look at what happens when we sample 24 times daily, or every hour.
Note that there is very, very little evidence of aliasing in the hourly samples. In fact, the periodogram looks very much like the original periodogram at 288 cycles per day.
Which is another example of why, in a practical sense, I say that hourly data does NOT violate the Nyquist limit …
My best to you both,
w.
Hi Willis
Over the last few years I have enjoyed and learnt a few things by reading your numerous articles on WUWT. As this topic is of interest I have decided to join in and have some final questions.
Out of interest could you tell me the maximum error I can expect in the daily mean from 24 samples/day compared to the 4320 downsampled to 288 samples a day for a new site to fill in a gap and one being upgraded from an old fashion min/max if that makes any difference.
Are your periodograms capable of more short period resolution and given the large daily variations dominate can the Y axis be Logarithmic also? As I believe this would add additional insight and allow the results from different sample rates to be better compared.
Best Wishes
Hey, Bright, interesting questions. First, nobody can say what a new site will do. It may or may not be where the old site was. It may be near to the old site but in a different enclosure. The old site may have used slightly inaccurate instruments.
Given that, however, the results from the 100,000 or so five-minute samples I’m using as a dataset are:
Average error, daily avgs. based on hourly vs. based on 5-minute samples = -0.001°C
RMS error, daily avgs. based on hourly vs. based on 5-minute samples = 0.048°C
Maximum error, daily avgs. based on hourly vs. based on 5-minute samples = 0.206°C
You can see why I say that hourly data is more than sufficient for almost all purposes.
Next, I’m not sure what you mean by “more short period resolution” for the periodograms. They resolve down to the sample rate …
Finally, no, I’m not interested in a log scale. Theoretically I’m opposed to log scales unless a) there is a valid reason for them and b) they don’t distort the results. Given that, no problem.For example, I use a log x-scale on my periodograms, so we can see the shorter time scales.
Unlike say FFT results, my scale compares cycles to the range of the original data. This shows visually just how much power there is in each period. If I use a log scale, it will appear visually as though the periods with almost no power have large power … bad idea. Such plots are one of the reasons that folks get confused about things like a solar effect on temperature. They see a microscopic cycle blown up big and say “See!”.
All the best,
w.
Hi Willis,
Thank you very much for taking the time to explain you decisions regarding the Periodograms that you produce.
Sorry in advance as I am doing this on a mobile device and for some reason can not cut and paste.
The following is only in relation to the error between the 288/day and 24/day sampling.
Thanks for the demonstratation that it is possible to determine the errors by post processing the data for a site where the 288 samples are available. It seems to me that each site will likely yield a different error and the same site a different error each day. My comment is that it is not possible to say in advance that for any given site 24samples/day will always yield a suitably low error that would meet the requirements of an equipment specification that represents current best practice of say 288 samples/day.
BTW I fully understand your “for all practical purposes” comment as I spent most of my career doing just that but sometimes you just have to do what is best practice or at the maximum technology will allow for many reasons and not all of them technical. It is always hard to sell something with a lower specification than a competitor that sells a better spec at the same price.
William Ward January 17, 2019 at 6:53 pm
Dang, gotta say, now that’s a clever way to open a discussion—claim that the people you are speaking with have closed minds … you really, really don’t see what you are doing, do you?
Seriously? You truly think that I and others don’t know that analog signals are continuous and digital signals are not? Are you not reading what we’re writing?
Oh, this just gets better. Now we’re all just innocents who have never gained sight of your brilliant wisdom … look, if you’re gonna insult us, at least have the balls to QUOTE WHAT WE’VE SAID that you are disparaging. And yes, digital samples are representative of a signal, duh … who ever said they weren’t?
Moving on, you say:
First you lecture us about the Nyquist limit. Then you invent something you call a “practical Nyquist limit” and say hey, that’s OK, 288 samples per day doesn’t obey Nyquist but it’s close enough. Now you are back again to tell us that your “practical Nyquist” is nonsense because what it gives us CANNOT represent the original signal …
Say what? Make up your mind!
Meanwhile, I’ve shown immediately above that hourly sampling gives a result which is just about identical to 5-minute sampling, so obviously I’m violating your “practical Nyquist limit” by a factor of 12 without bad effects …
And you claim OUR minds are closed?
It appears that you are innocent of the concept called “for all practical purposes”.
I remember my wonderful high school math teacher, Mr. Hedji, explaining to us what “for all practical purposes” meant. He said, “You’ve heard of Zeno’s paradox?” Of course we had, he’d told us about it. Before an object can travel a given distance d, it must travel a distance of d/2. And in order to travel d/2, it must first travel d/4, etc. Since this sequence goes on forever, it therefore appears that the distance d cannot be traveled.
Mr. Hedji said “Suppose we line all the boys up against one wall of the classroom and all the girls on the other side. Every time I ring a bell, they move half the distance toward the middle. Now, of course, as Zeno points out, doing that they can never arrive at the middle of the room.”
“But before long, they’ll be close enough for all practical purposes” …
So yes, in my periodogram above of hourly sampling of a temperature signal, there is still some small residual aliasing at about 2 hours.
But hourly sampling is perfectly adequate for all practical purposes …
w.
Willis,
There was nothing in what I wrote that would have warranted that kind of hostile response from you. Asking you to take a look at something with an open mind was not saying or inferring that you “have a closed mind”. There was no intent to insult you. It is pretty standard practice to restate fundamental facts when making a case. Stating basic facts to build up a case should not be viewed as an insult. I addressed the post to you – but I write to reach the full range of audience. I had many people making the claim that discrete values (max and min) are not samples – and they made their case for a few reasons. My reply to you tried to address many of the claims against my paper – even if you specifically didn’t make all of the counter claims.
You seem to have inferred tone and intent was not there. I have gone back and re-read my post nearly a day later and I’m quite fine with it. I’m a fan of sarcasm. I have used it on several posts. But I don’t go into hostility with it. The post that has triggered you is about as dry as it can get. I had no humor, sarcasm or tone in it at all. I don’t understand your response to it.
William Ward January 18, 2019 at 4:51 pm
Say what? If you ask everyone to stand up, it implies they’re sitting or lying down. If you ask someone to help you, it assumes they are not helping you.
And if you ask people to look at things with an open mind, it assumes that their minds are closed. Otherwise, why would you have to ask them?
As I said before, it seems you really, really don’t know what you sound like from this side of the screen. Patriarchal and condescending are the words that come to mind.
Now, as I said, I can be insufferable as well … but the difference is, I know it, and you don’t seem to.
In any case, I’m happy to re-reset. But please, leave the things that sound like “open your minds to my wisdom” out of your comments, they don’t do you any good.
Here’s the kind of thing that drives a man mad. You’ve claimed that the “practical Nyquist limit” is 288 samples per day … but you haven’t provided a scrap of actual measurements to justify that. You seem to think that your mere word is enough to make it true.
And it’s bizarre, because you started by insisting that “we must sample a signal at a rate that is at least 2x the highest frequency component of the signal.”
I questioned this, saying:
And your response:
Bad start. Very bad start. When a man says “I mean no rudeness”, I just roll my eyes and go “Yeah, right. If you meant no rudeness, you wouldn’t be rude.”.
But then, having been very rude in your answer, you go on to claim that oops, you really didn’t mean it, and 288 samples per day were actually quite acceptable and were above the “practical Nyquist limit” … whatever that is …
I, on the other hand, have shown that hourly measurements give results with an average error in the daily mean w.r.t. 5-minute measurements of five-thousandths of a degree, with an RMS error of the errors of 5 hundredths of a degree. Those are meaninglessly small differences, so obviously, if 288 samples per day is above the “practical Nyquist limit”, then so are 24 samples per day.
But nooo … you refuse to hear that. You are welded to 288 samples per day, your mind is made up, and you don’t want to be bothered with facts … and if you don’t understand the facts and graphs I’ve presented, you don’t ask me what I meant. You just plow forwards.
In any case, today I finally found some USCRN data that’s not blocked by the government shutdown. I got 13 years of data for the USHCN station nearest to where I grew up, in Redding, California. I calculated the average for each day using first all 288 daily samples, and then using 24 hourly samples. Here are the results:
Some notes. The largest average annual error in 13 years of daily data is five-thousandths of a degree.
The largest annual RMS error of the daily errors is six-hundredths of a degree.
The largest absolute error in 13 years, both positive and negative, is about a quarter of a degree.
Here’s the state of play. According to YOU, we do NOT have to sample a signal at a rate that is at least 2x the highest frequency component of the signal as you first claimed. Instead, you say 288 cycles per day is acceptable under the Nyquist criterion for our practical purposes … and I’ve shown that there’s no practical difference between that and hourly sampling.
Finally, I’ve demonstrated that despite strong aliasing when you go from 288 samples per day to 12 cycles per day, there is only negligible aliasing when you go from 288 samples per day to 24 cycles per day. You keep making claims about aliasing, but I notice that I’m the only one measuring it in the actual data under discussion …
So … will you NOW admit that hourly samples are acceptable with respect to your “practical Nyquist limit”, and that I was correct in my opening statement that you were so rude to me about? To use your words, there was indeed a “first day of class mistake” made, but YOU were the one making it …
Best regards,
w.
Willis –
As the “other guy” who William considered close-minded, in order for outsiders to make a judgment in this regard, they have to know the subject – signal processing in this case. Many here know the basics well, but not the details such as we have here that come with experience. So not so good. But – us being worried about being called uncivil (even if it were true – and it’s not) takes a backseat to resisting those who are distressingly wrong, especially in engineering matters (a hard, logic-based science). We can (with likely little rest-of-world consequences) “agree to disagree” (a meme some favor) if we are voicing opinions of Beethoven vs. Stravinsky. Engineering? Feet to fire please. In my experience, it is the person who comes to see that he/she has lost (demonstrably wrong) who calls for peace. But you know that.
-Bernie
Willis –
As the “other guy” who William considered close-minded, in order for outsiders to make a judgment in this regard, they have to know the subject – signal processing in this case. Many here know the basics well, but not the details such as we have here that come with experience. So not so good. But – us being worried about being called uncivil (even if it were true – and it’s not) takes a backseat to resisting those who are distressingly wrong, especially in engineering matters (a hard, logic-based science). We can (with likely little rest-of-world consequences) “agree to disagree” (a meme some favor) if we are voicing opinions of Beethoven vs. Stravinsky. Engineering? Feet to fire please. In my experience, it is the person who comes to see that he/she has lost (demonstrably wrong) who calls for peace. But you know that.
-Bernie
Bernie,
Because I asked you and Willis to open your minds to the particular points I was making is not the same as saying you are both closed minded people. I didn’t intend that and I don’t see why a reasonable person would interpret it that way. And as you said in another post, it is not true that I think I’m the only person who knows anything. I agree with your comments: it is important to resist those who are distressingly wrong. I see your points as being distressingly wrong and obviously you think mine are. So you are making the case that we should keep up the debate. Great, I’m on board. But you can’t have it both ways. I can’t be the arrogant know it all, for doing so any more than you.
While I don’t mind the friction, and I support your point about fighting for correct math and science, I also completely support Clyde’s adult reminder to be our best selves while communicating. I would like a redo on few things I have said, that is for sure! Where possible I have corrected myself, adjusted my tone or overtly apologized. But I have not had a public meltdown and I’m not joining in with any drama. I’m unphased by the indictments of arrogance. I could easily lob back the counter indictment, but I won’t. I won’t because it isn’t dignified and I actually don’t feel that way. I see people getting upset and out of that acting in an unpleasant manner. But I’m not hung up on that.
Tremendous heat can burn things to ash or it can forge great bonds. I have made good friends in the past that started out with a lot of friction. It doesn’t always end with the best of those possibilities, but I’ll do my part to aim for a positive outcome, even if we knock some things over along the way.
William Ward at January 17, 2019 at 8:12 pm said in part to Willis:
“ You think I’m being arrogant. Well, I can understand why you say that. I don’t think I’m better than anyone. So, I would not call that arrogant. I do think I know a lot more about signal analysis than anyone who has spoken out against what I have presented. “
Are we to assume (1) that this sort of comment is in the past and 2) you won’t take offenses to those of us who are trying to help your understanding of signals and in turn to interact with what YOU say?
-Bernie
William,
At the risk of getting bogged down in a wordy comment and not getting a answer from you, I will be brief and to the point.
Clearly, you don’t seem to be making the distinction between the “Real-world signal” in the abstract and that “real-world” accessible through physical measurement – which keeps tripping me up! And I would hazard a guess that this is at the bottom of most disagreement here – would you agree?
It is well known that thermometers – automatic and manual – suffer from a response lag (L) introduced by the Stevenson screen in which they are housed. It is over and above the L of the instrument itself coming to thermal equilibrium. This L is on the order of minutes. More over “the screen’s lag time L lengthens with decreasing wind speed, following an inverse power law relationship between L and wind speed (u2). For u2 > 2 m s−1, L ∼ 2.5 min, increasing, when calm, to at least 15 min.
Spectral response properties of the screen to air temperature fluctuations vary with wind speed because of the lag changes. Additionally there is a tendency towards more frequent low wind speeds as temperatures increase, and an associated increase in lag time and radiation error.
I’m no DSP expert but as a layman it is not hard to see that the screen is operating as a variable high-pass filter, with a ventilation-dependent frequency response.
It has been noted in many studies that the magnitude of the error depends not only on the windspeed but also on the wind direction relative to the azimuth angle of the sun. Most observational evidence put the total error of the apparatus itself at 1C, although greater errors have been noted*.
Therefore, accessible real-world “signals” are composed of “samples” the measurement of which is limited in frequency and therefore the frequency content of the measured “signal” is finite. Would you agree with that statement (Or one better worded)?
I’m certain that no-one here disputes the rules of DSP but is that what they are disputing?
*Note that the error also varies with station location and climate zone.
Hi Scott,
Thanks, you bring up a very important point. One that I have touched upon briefly in some posts, but clearly it has not been the bulk of the discussion. So you give an opportunity to add a bit more on that subject. The subject I bring up with sampling is larger than can fit into 2,000 or even 20,000 words, but through the back and forth comments more can be brought out.
You are right, the characteristics of the thermal transducer (and screen) should be factored into the design. I assume the engineers NOAA commissioned to design USCRN did this but I don’t have any information saying either way.
There is the question of: “what is the signal?” Well, air moves and mixes, and has a spatial and vertical temperature profile. We measure at 6 feet off of the ground or where ever it is measured in the screen. But we would get different readings by moving the screen laterally or vertically. If you have an IR Thermometer (IR Gun), go around your living room or house and try to determine the temperature. Shoot the floor, walls, ceiling, objects at different locations. You get a range of numbers. So what is the temperature of your house or a room in the house? For personal comfort we don’t need this kind of nuance – what the temperature is at the thermostat works for most. Unless you have an older home that is not well balanced and you have hot rooms and cold rooms. The thermostat reading doesn’t represent the house well. Relative to that analogy, how does our Stevenson Screen method work related to the planet? Does it represent it well or not? So, is the signal from the air outside of the screen or inside of the screen? I always go back to a system level definition. We are really trying to look at the temperature of Earth so it starts outside the screen. The screen must be considered a component in the data acquisition system. I see the screen acting as a low-pass (high-cut) filter. You said high pass – I’m not sure if you misspoke or meant to say that. If we see it differently let me know what you think. The engineers designing it should have some way to study the signal outside of the screen with a low mass/fast response time instrument and get an idea of the frequency content there. The transducer selected should be modeled to understand its characteristic equation – and this gets factored into the design. Also, understanding the characteristic equation, with properly sampled data one can calculate what is happening outside of the screen even though it is measured inside. It isn’t just the screen or transducer that needs to be factored. It is the entire circuit. The anti-aliasing filter used would only need to respond to what happens inside of the screen as the screen could be viewed as the first stage of the filter network.
Mercury thermometers don’t really allow you as much freedom to work, but they can be/have been modeled. I just don’t think that information factors into the data analysis.
Regarding infinite vs finite frequency: I’ll used the example of electronics to illustrate. Frequency is always infinite, but from a practical perspective it is finite. Why is it infinite? If you look at frequency content of a signal and go out further and further you will always measure content, but its magnitude gets smaller. Audio amplifiers specify a “signal to noise ratio” (SNR). The possible input of signal to the amplifier is limited by the design spec. The components are selected and the architecture selected to minimize noise so you get a good clean signal. But when you go to measure the noise, if you have really good instruments you can see noise is there as you go out in frequency, it just gets really small. For example the best instruments can measure to -160db or lower. Every -20dB (with voltage) means you have gone down to 1/10th of the input [every +20dB means you get a gain of 10 or 10x]. So 1V of noise if reduced by 20dB would be 0.1V. Reducing it 40dB means it is down by 1/100th, so it is 0.001V. There are 8 20dB steps to get to -160dB, so the noise would be down 10^8 power or 0.00000001V. At some point the instrument itself generates the noise and you no longer know what is going on. We say it is infinite because it is beyond out ability to measure. I’m not sure if it is just a good guess that it is infinite or if theoretically it has been proven to be so. It doesn’t really matter for practical purposes. Can you hear noise at -100dB? Yes – with the right music source and listening environment. The human brain/ear is amazingly sensitive – but sadly this ability diminishes with age. If listening to hip-hop though earbuds on a subway then , no it won’t be audible. How about -120dB? Well, the best amplifiers shoot for -130 or -140dB of noise so you know it is outside of human hearing even in the best environments. The key is to design for the practical need of the application. But do you design for the kid riding the train listening to hip-hop or for the person in their listening room, listening to a 24bit/96ksps recording of the world’s finest orchestras on their $100k playback system? [Strangely enough, people can hear (in double blind experiments) differences in amplifier all specified at the same good specs.]
It is true that Nyquist requires sampling fs > 2B. But B can be altered by design – based upon knowing what is needed practically. The frequency gets limited with filters, knowing what is being discarded by the filters. As I said in my Full version of the paper, aliasing always exists, but if the system is done right the aliasing doesn’t not impact our measurement based upon the accuracy specified.
What is the dispute here? We all seem to disagree that max/min method is not good. The reason for this is in dispute I think, at least with some, not with others. Also in dispute is “are max and min values samples?” and do we have to comply with Nyquist when using samples? Also in dispute is the sample rate that is required to satisfy Nyquist. But actually, from some of the latest responses, I’m getting some hope that we can gain some greater agreement. All of our stubborn determination on this subject can be a good thing or a bad thing. The communication gets frustrating and then feelings get hurt and tempers flair. But if we all stay with it then maybe we can gain some agreement and if that happens the “stubborn determination” is good.
I think one sticking point is whether we should specify the system based upon the average or upon the worst case. So if we can get to some agreement that a system needs to be designed for worst case then maybe we can also agree that much of that performance is not needed in many or maybe most instances. Not sure yet.
Is this any help?
I’ll edit my self: I said: “The frequency gets limited with filters, knowing what is being discarded by the filters. As I said in my Full version of the paper, aliasing always exists, but if the system is done right the aliasing doesn’t not impact our measurement based upon the accuracy specified.”
Clarification: Filters “roll-off” or reduce frequencies, they can’t completely eliminate them. Filters can be designed to be more aggressive and remove more faster but this comes at a cost of “ripple” (adding error to the amplitude) or phase shift (frequency components don’t line up in time). So filtering reduces content but doesn’t eliminate it. What is left can still alias, hence my comment that “aliasing always exists”. But it is also true that this aliasing can be below anything that matters to us based upon our specifications and the design.
William Ward January 18, 2019 at 6:45 pm
Thanks, William. Neither I nor Clyde have denied that there is a significant error between the 288-samples and the NOAA traditional method. I was unaware that there was any dispute about that.
My apologies, I was unaware that there was something I should have acknowledged.
It is clear from the ONE DAY’S WORTH of data in Figure 2 that ON THAT DAY as the sample rate decreases below Nyquist, the corresponding error introduced from aliasing increases. However, in larger samples that is not true.
So far I have not found any 288-sample datasets that give inaccurate results when sampled hourly. Let me repeat my data for 13 years worth of 288-sample data from Redding.
I calculated the average for each day using first all 288 daily samples, and then using 24 hourly samples. Here are the results:
Some notes. The largest average error is five thousandths of a degree.
The largest RMS error of the daily errors is six hundredths of a degree.
The largest absolute error, both positive and negative, is a quarter of a degree.
So no, William, there is no significant difference between 288 samples per day and one sample per hour.
Unless I missed it, to date I’ve presented 14 years of data showing only trivial differences between 288 samples per day and 24 samples per day. You haven’t presented a single station that does NOT “work well with 24-samples/day”. If you can show that such stations exist, you may have a point … but so far, so good. I will continue to examine other stations, now that I’ve found a source that is not closed due to the Gov’t shutdown.
The real question is, why would you assume that I don’t know the difference between integrated hourly data and one sample per hour? Once again, you assume I don’t know what I’m doing, when I know very well. My data comparing 288 samples per day and 24 samples per day is NOT using integrated hourly data as you foolishly assume. I am doing exactly what I’ve described. I take the 288 samples per day. From those 288 samples, I select one sample on the half hour. That is what has given me the results.
And no, as my periodograms have shown, sampling hourly does NOT alias content in any meaningful way. Here is a periodogram of 6 months of 288-sample data.
As you can see, there is some energy in the frequencies with 12-hour, 8-hour, and 6-hour periods. Now, here is that same graph overlain with the periodogram of hourly sampling.
As you can see, other than a trivial bit of aliasing up near the highest frequencies, the two periodograms are nearly identical.
However, things get very different when we sample every two hours. Here is that comparison.
As you can see from the black line showing the two-hour sampling, there is extensive aliasing into the frequencies with periods of 4 hours and about 1.4 hours.
Please allow me to suggest that you download and analyze a year or two of actual USHCN data. You can’t pull out one very unusual day as you’ve done and base your conclusions on that. If you think that there is aliasing going on, then please demonstrate that it is there and that it is significant.
Best regards, 3 AM, light rain falling, I’m off to bed …
w.
Willis,
Thanks for your reply from Jan 19, 3:12 AM. I’m working projects all day, but will thoroughly read (re-read) you post and work on a reply tonight.
William, thanks to you for continuing the discussion. As I said before, I know everyone has a life outside, and so if someone doesn’t answer right away, I figure they’re out in the world.
Best regards,
w.
Willa’s said “As you can see, there is some energy in the frequencies with 12-hour, 8-hour, and 6-hour periods.”
One thing I noticed is that the predominate frequencies at this site are all divisible into 24. Given the sample rate of 24/day is phase locked to the main frequency component I think it is also likely phase locked to the 6, 8 and 12 hour components. Having the sampling phase locked to all the signals frequency components often leads to some very interesting outcomes.
It would be interesting to see how other sample rates such as 27/day perform or even non integer rates
Cheers
Red
Oh, yeah, I forgot … the location of the accessible USHCN 288-sample data is here.
Regards to all,
w.
A new day, new data … to pick a location as different as possible from Redding California, my last analysis, I picked Fairbanks Alaska. Here is the error data from that USCRN station, showing the difference between sampling 288 times per day and sampling 24 times per day (hourly samples).
Once again, we have only very small differences between 288 and 24 samples per day …
More to come, I’ll post up the further analyses of the Fairbanks data as they are finished.
w.
Five days ago, Rud Isvan opined:
This sanguine faith has not yet been justified. On the contrary, fundamental misconceptions about what constitutes frequency aliasing and what inadequacies of sampled data are unrelated to it stubbornly persist. Thus we have Willis stating;
Since the Nyquist frequency is DEFINED as 1/(2 delta t), where delta t is the fixed sampling interval, the sampling rate, 1/delta t, cannot decrease “below Nyquist.” It can only decrease below some independently determined highest frequency of appreciable spectral content in the continuous signal. In either event, barring aliasing of spectral content into zero frequency, the efficacy of the sampled data points in estimating the signal mean is entirely a matter of sample SIZE, not of any frequency aliasing. Sadly, the inept practice of plotting periodograms as a function of the logarithm of period, instead of simple frequency, distorts spectral content (area under the curve) and totally obscures what happens around zero-frequency.
1sky1, it appears you have not have noticed, but that statement was not entirely mine. I was merely qualifying William’s statement. He said:
I merely qualified it by saying, with my additions in capital letters, that:
So if there are errors in that statement, they are Williams, not mine.
Regards,
w.
While the capitalized words qualify William’s statement, you seem to accept his false attribution of increased error in estimating the true mean to alaising. Further on, you state:
However, with bihourly sampling, the Nyquist frequency is 1/ 4hrs–which is NOT aliased. Furthermore, unless there’s aliasing into zero-frequency, aliasing does NOT affect the stimation of the mean.
Well, always more to learn. I took a look at how the error increases as the number of samples per day decreases in the Anchorage data. Here is that graph. It compares the error at a given number of samples as compared the value at 288 samples per day. As you would imagine, the error at 288 samples per day is zero.
As you would expect, the more samples per day, the smaller the error. The graph shows a few interesting things.
First, the traditional way of calculating the error, (min+max)/2, is NOT simply another kind of 2-samples-per-day as William has said. As you can see, it gives a daily RMS error about 12% larger than the error at two samples per day.
Next, compared to 288 samples per day, the RMS error using hourly samples is quite small, less than a tenth of a degree C. William above said that:
He then claimed, without any evidence, that the practical Nyquist limit was 288 samples per day.
However, as the graph and the table in my previous comment shows, at hourly samples, the mean error is 0.001°C compared to 288 samples, and the RMS error is 0.056°C compared to 288 samples. We’re well past the “knuckle” in the graph, and so at this point, it is obvious that the “apparent gains are diminishing beyond any benefit” to be gained from faster sampling.
Finally, the results for Fairbanks Alaska USHCN data in terms of mean and RMS error are indistinguishable from those of Redding California, or those of Chatham Wisconsin. To date, I’ve looked at 26 years of USHCN data. In no case does hourly sampling give any significant errors compared to 288 samples per day.
Onwards … aliasing. William keeps claiming, again without evidence, that if we sample hourly, the higher frequencies will alias into the result. Now, just as with the Redding data, I find that his claim is true if we are sampling every two hours (12 samples per day). Here is a periodogram of those results:
Just exactly as with the Redding data, there is strong aliasing at the four hour and just under an hour and a half periods. However, now take a look at the hourly data.
There is only a tiny bit of aliasing at just under three-quarters of an hour. Other than that, the results are basically identical to the results from sampling at 288 cycles per day.
All of which supports my contention that hourly sampling of temperature data is more than adequate for practical purposes, and that hourly sampling is above or at the practical Nyquist limit.
Best to all,
w.
Hi Willis,
You have written a few posts since I last wrote to you. I’m going to focus on your last post (Jan 19, 3:19PM) first and I will do this for a few reasons. 1) I’m gaining some optimism that maybe we can find a common understanding and your most recent post provides a potential platform for that, 2) I want to suspend and if possible, move past the friction and 3) I’d like to reply to you tonight and I don’t think I can process it all at once. I’ll pull in selective comments of yours from other posts.
On 2 separate posts Willis said: “Fig 3 again is just one day. Again there is no way you can talk about Nyquist or aliasing for a single period.” And: “Please allow me to suggest that you download and analyze a year or two of actual USHCN data. You can’t pull out one very unusual day as you’ve done and base your conclusions on that.”
My reply: I have over 1GB of USCRN data downloaded and this was used in my analysis. For Fig 1, I used a day that was representative of my case but not the worst I saw. Over 28 stations were used specifically in the paper but many more were analyzed in the study. It is not a correct assessment that I’m basing my case on just 1 or a few days or stations. More below will clarify approach and strategy with the presentation.
Willis, it was just over the past 24 hours that I could start to see through the chaos of our disagreement, and I started to see a way that might allow us to find a common agreement. It began with a better understanding our disagreement. Let’s see if we can make progress. I think your case is against 288-samples/day. Would I be correct to say that at 24-samples/day you agree with my overarching assessment? Are we primarily divided by the number at this point? My thrust with this paper was to introduce Nyquist and sampling as the reason that the historical method has issues. Even 2 regularly timed samples/day has issues based upon sampling and not complying with signal analysis requirements. It feels like the baby is circling the drain with the bath water if we are divided by the number. Are we aligned on the basic issue? Does the application of Nyquist shed light on problems and potential solutions?
Now, I approach this from an engineering perspective. Please allow me to develop this. I will get to a conclusion. I would like to use an example to illustrate. If you are tasked with recording high fidelity audio and you begin your exploration with a concert grand piano, you will quickly discover that there isn’t much content above 10kHz. It’s there but it is usually down below -60dB FS. If you assume all instruments are going to operate similarly you might design your data acquisition system to not alias the grand piano. But if you look at violins the frequency is much higher. If you look at cymbal crashes, you will see 20Hz – 20kHz flat content across the spectrum. But cymbal crashes are 1) infrequent in most music and 2) usually brief in the program material. Your recording of the cymbal crashes would suffer if you designed the data acquisition system for just the piano. As I did my investigation, I had a reference provided by NOAA and that was 288-samples/day. I went out and looked at graphical images of many, many days’ worth of signals. The distorted sinusoid is what I saw most often, but occasionally I saw very different daily profiles. Some were square wave like, and some had many large fast transients. These types of profiles tell you that higher frequencies are present. I searched out a number of these types of days. I also searched for longer strings of days like with these profiles (ex: Figure 5: 10 days at Spokane WA). I was looking for the upper limit of spectral content a day could throw at us. This is where I focused my efforts. Engineers design the system to handle the full range of possibilities. Like the cymbal crashes we need to properly capture the days with higher frequency, even if they are not frequent.
Now, let me quote myself from the paper (below Fig 2): “It is clear from the data in Figure 2, that as the sample rate decreases below Nyquist, the corresponding error introduced from aliasing increases. It is also clear that 2, 4, 6 or 12-samples/day produces a very inaccurate result. 24-samples/day (1-sample/hr) up to 72-samples/day (3-samples/hr) may or may not yield accurate results. It depends upon the spectral content of the signal being sampled. NOAA has decided upon 288-samples/day (4,320-samples/day before averaging) so that will be considered the current benchmark standard. Sampling below a rate of 288-samples/day will be (and should be) considered a violation of Nyquist.”
Notice that I acknowledge that 24-samples/day may produce accurate results – depending upon the spectral content. But from an engineering perspective, we select the sample rate to make sure we don’t alias any signals and I had an abundance of information that signals were present to demand more than 24. Since NOAA used 288 and the error at 24, 36 and 72 were only +/- 0.1C off from 288 I knew we were approaching that limit of accuracy needed. I had not done the statistical analysis you have done. I see that 24-samples seems to work for most situations based upon your data – which I have not studied, but for this discussion I’m assuming is correct. But it seems we are arguing 2 separate points – and therefore we may not be completely incompatible with our conclusions. As an engineer I would not recommend going with 24-samples/day without more research. I prefer to not have error from the days or periods that do have more content. Also, as I have explained several times, but I think was lost in the melee, sampling faster gives us guard band and it relaxes the requirements for the input anti-aliasing filters. Filters can add ripple and phase shift and these problems tend to increase with increasing filter order. Higher sample rate allows lower filter order (stages).
Willis said: “First, the traditional way of calculating the error, (min+max)/2, is NOT simply another kind of 2-samples-per-day as William has said. As you can see, it gives a daily RMS error about 12% larger than the error at two samples per day.”
I said in my paper (below Fig 2): “It is interesting to point out that what is listed in the table as 2-samples/day yields 0.7 C error. But (Tmax+Tmin)/2 is also technically 2-samples/day with an error of 1.4C as shown in the table. How can this be possible? It is possible because (Tmax+Tmin)/2 is a special case of 2-samples per day because these samples are not spaced evenly in time. The maximum and minimum temperatures happen whenever they happen. When we sample properly, we sample according to a “clock” – where the samples happen regularly at exactly the same time of day. The fact that Tmax and Tmin happen at irregular times during the day causes its own kind of sampling error. It is beyond the scope of this paper to fully explain, but this error is related to what is called “clock jitter”. It is a known problem in the field of signal analysis and data acquisition. 2-samples/day, regularly timed, would likely produce better results than finding the maximum and minimum temperatures from any given day. The instrumental temperature record uses the absolute worst method of sampling possible – resulting in maximum error.”
I’m stand by my assertion that max and min are samples. This is based upon my experience and confirmation from people in my industry who have reviewed this for that point. But I’d like to suggest that we not continue to debate that point as I think we have both made our cases to the other unsuccessfully. I think you have accepted 2-samples/day as samples so I think we can proceed with analysis even if we don’t agree with the nature of the problem with max/min – we agree about the results of using max and min, I believe.
If I do a rewrite of my paper – and I might – I’ll consider adding some clarity around the fact that 24 hours appears to capture most of the situations. (Credit to you). But I’ll keep my recommendation for a higher rate for engineering and system integrity purposes. Additionally, I’m working with the knowledge of what is available from a technology perspective. Sampling at 24, 288, 4,320-samples/day are all absolutely glacial compared to converter technology. Memory and storage are cheap. There would not be an engineering incentive to run that slow. All of those rates are considered a stand-still from a converter perspective.
I don’t want to put words in your mouth. Do you agree that some “significant” accuracy can be had by sampling faster than 2-samples/day? When I speak about Nyquist rate, it is from an engineering system design perspective. You were focusing on the minimum increase to rate that seems to capture all of the data from a statistical perspective. I think these are both very valuable perspectives and need not be completely incompatible. I do think we were misunderstanding each other and in the heat of debate missed an opportunity to agree. I’m not talking about a fake agreement to make the discomfort of debate go away. I see potential here to have a positive resolution for having pushed each other. What do you think?
I don’t think you addressed the trends I showed. Yesterday I put up links to figures not in my paper showing stations over 10-12 years.
https://imgur.com/cqCCzC1
https://imgur.com/IC7239t
https://imgur.com/SaGIgKL
They show the yearly average differences and linear trends between the 2 methods. [Note: these figures do not benefit from the full accuracy of processing all 288 daily samples. The data in the table of Fig 7 was generated using all of the samples – no intermittent averaging was done to calculate the linear trends. For the graphs in the links, I took the USCRN monthly averages generated from 288 daily samples. These graphs suffer from some rounding error from using the published USCRN monthly averages. Note in my paper I said (above Fig 6): “While no conclusions can be made by comparing the trends over 7-12 years from 26 stations in the USCRN to the currently accepted long-term or short term global average trends, it can be instructive. It is clear that using the historical method to calculate trends yields a trend error and this error can be of a similar magnitude to the claimed trends. Therefore, it is reasonable to call into question the validity of the trends. There is no way to know for certain, as the bulk of the instrumental record does not have a properly sampled alternate record to compare it to. But it is a mathematical certainty that every mean temperature and derived trend in the record contains significant error if it was calculated with 2-samples/day.”
Trends biases of 0.06 are small according to my way of thinking about numbers, but as others have said, long terms trends of this magnitude are causing serious concern. I said in point 7 of my conclusion: “More work is needed to determine if a theoretical upper limit can be calculated for mean and trend error resulting from use of the historical method.”
Maybe your data analysis skills can be utilized here. Are you willing to examine this? Also, what do you think about the absolute error we see propagating over time? As there seems to be more interest in the “missing” energy, maybe the absolute values are of greater importance. We need to move past the historical method if we want to study that with more accuracy.
I’ll await your reply. Thanks Willis.
Willis
You said, “Next, compared to 288 samples per day, the RMS error using hourly samples is quite small, less than a tenth of a degree C. … All of which supports my contention that hourly sampling of temperature data is more than adequate for practical purposes, …”
Alarmists are citing annual average differences of hundredths of a degree as justification for their concern about trends. There is still disagreement as to how much improvement in accuracy and precision can be justified by large numbers of samples. I think that to resolve the questions, raw data should be collected that is at least an order of magnitude more precise than what is being claimed as evidence for the alarm.
The use of “the point of diminishing returns” as a metric for design specifications usually has the implication that going beyond the ‘knee’ will have costs that aren’t justified. However, what I have read here suggests that there is no additional cost because the technology has advanced so far that hourly data is really an anachronism. That is, current, off-the-shelf A/D converters might actually be the cheapest solution because of the scale of volume production. There may be storage costs for large amounts of temperature data, but I think that the remote sensing industry (consider EROS Data Center) has made significant inroads on the cost of storing and accessing huge amounts of image data.
Clyde Spencer January 20, 2019 at 9:18 am
Thanks, Clyde. While the RMS error is less than a tenth of a degree, the mean error is on the order of a few thousandths of a degree. So it’s within spec.
The problem is not in the collection of the data. It’s in the transmission and analysis of the data. A hundred years of 288-sample data for a hundred stations is over a billion integers, likely requiring a minimum of 9 bits per integer to store or analyze …
Finally, as Anthony has pointed out with the Surfacestations project, in many, perhaps most cases, the SOURCE of the data is hopelessly compromised by encroaching structures and roads, or growing trees, or air conditioner exhausts, or the like … and there is little point in collecting hyper-accurate data from hyper-inaccurate sources.
Best regards,
w.
Willis Said: “The problem is not in the collection of the data. It’s in the transmission and analysis of the data. A hundred years of 288-sample data for a hundred stations is over a billion integers, likely requiring a minimum of 9 bits per integer to store or analyze …
My reply: And the problem with that is? Current typical internet speeds of 50Mbs mean that 1GByte takes about 160 Seconds to download. Even my slow 15Mbs link would only need 533 seconds or < 9 minutes. With computer power and data transfer speed increasing I don't think that future generations will have any issue with what they will consider a tiny amount of data.
Cheers
Hi Clyde,
Clyde Said ” However, what I have read here suggests that there is no additional cost because the technology has advanced so far that hourly data is really an anachronism. That is, current, off-the-shelf A/D converters might actually be the cheapest solution because of the scale of volume production. There may be storage costs for large amounts of temperature data, but I think that the remote sensing industry (consider EROS Data Center) has made significant inroads on the cost of storing and accessing huge amounts of image data.”
I thought I would expand a little on your comment and bring into perspective the cost and memory requirements of doing the temperature sampling at 288 samples/day.
Your are correct that there is no additional cost in the acquisition hardware and this would hold up to around 20,000 samples/second.
On the memory front 18 bits will represent all air temperatures on earth to 0.001C now we can round up to 24bits to make life easier so 288samples/day gives 864Bytes/day or 315.36KBytes/year or 31.536MBytes in 100 years of data collection. To put that in perspective 31Mbytes is about 5 seconds of quality std def video. This means your standard 32GByte SD card for $30 or less could hold over 100,000 years of temperature data from a single station or 100 years from 1000 stations. Now it seems the data is often stored in ascii text file format or 8 bytes/sample (+00.000Cr) Which gives 2.304KBytes/day or 840.9Kbytes/year. So a quality $300 hard drive of 4TByte capacity could store 100 years of data from over 47,000 stations.
And that is before we compress the data which should reduce the storage requirements by a considerable amount.
William Ward January 20, 2019 at 9:08 am
As I said before, you really, really don’t see what you are doing.
When you tell a man “you are wrong”, that’s one thing. There is no disrespect intended.
On the other hand, when you tell a man ” this is fundamental signal analysis 101, first day of class mistake you are making here”, there is obvious disrespect intended. You are telling him that not only is he wrong, he is making a really, really ignorant mistake.
No disrespect?
Look, if you don’t wish to apologize just say so. But you can’t piss on my boots and then try to convince me it’s raining.
w.
J. Philip Peterson January 20, 2019 at 12:31 pm
J. Philip Peterson January 20, 2019 at 12:42 pm
Like I said, Dr. Roy’s crap keeps following me around.
First, There is no practical difference between claiming that I was “re-inventing the wheel” and claiming that I was taking credit for another man’s ideas. If you re-invent the wheel and take credit for it, you are taking credit for another man’s ideas.
Next, the person who didn’t“do some searching of the literature” was Dr. Roy himself. Had he done so, he would have realized that Ramanathan most assuredly did NOT make the claims that I am making. Ramanathan said that there is a “super-greenhouse effect” that acts as a thermostat to keep the temperature of the Pacific Warm Pool from going over about 30°C.
I, on the other hand, say that a host of emergent phenomena act as a thermostat keeping the entire planetary temperature within a narrow band (e.g. a variation of only ± 0.3°C during the 20th century).
Other than the fact that the word “thermostat” appears in both hypotheses, there is no similarity between the two. Which is why Dr. Roy was wrong when he said I was “re-inventing the wheel”—he hadn’t done his homework.
But as it turns out, there are lots of credulous folks out there who are willing to do what Dr. Roy himself did, to believe Dr. Roy without “searching the literature” to determine the truth of the matter … if you are wondering who they are, J. Phillip, grab a mirror …
w.
Hey Willis,
I would not say that the error has “nothing to do with Nyquist”. However, it is also not, as William claimed, just “jittered” samples.
The error comes from a couple sources, both Nyquist and the curious nature of the taking of the signals.
Thanks for clarifying that. As per jitter, again on the basic level I understand it simply as sampling at the irregular intervals what itself induces additional error. And as far as I can see William text does not claim that this the main problem. Jitter is mentioned that it can additionally distort the signal.
Paramenter,
Thanks for your comments. On the issue of jitter, I have a thought exercise (for an atmospheric air temperature signal).
If we start with 2 ideal samples/day (perfectly spaced/timed) I think we have pretty good agreement that these are actually samples and that these samples lead to sampling error – evident when calculating means. Now if we move to a real world converter, they all have clock jitter. The best are in the range of picoseconds of jitter. The error produced can be important for some applications like audio, where jitter is audible at some levels. For other applications this level of jitter is inconsequential. But I think we will get agreement that we still have samples, and sampling error based upon not having enough samples for air temperature. Now what if we use a converter that gives microsecond jitter? How about millisecond jitter? I’m not aware of any converters that give jitter in the second range, but let’s take the progression for the thought exercise. Now how about minutes and finally hours. I think case can be made that the error grows but we still have samples and sampling related error. It would be interesting if someone could provide some mathematical reasons why a time limit would apply and what that would be. Otherwise, I think a logical deduction is that max and min are samples – just very bad ones. 2 discrete values representing an analog signal are samples and if 2/day, and insufficient to represent an atmospheric air temperature signal.
What do you think about this approach?
Hey William,
Sure, sampling process is never perfectly periodic due to clock inaccuracies. But we don’t need milliseconds per temperature records. I reckon some people here objected against interpretation of temperature series in the light of Nyquist because of recording daily max/min is not strictly periodic. We record daily extremes not knowing exactly when they happened. But that only makes things harder in the context of Nyquist not invalidating requirements of signal reconstruction in a reliable way. As you and Clyde quoted classic textbook:
Bright Red and Paramenter: See this guide referred by Clyde. See section 1.3.2.2 Sampling and Filtering on pg 15 of PDF (pg 539 of document). Key points:
Considering the need for the interchangeability of sensors and homogeneity of observed data, it is recommended:
(a) That samples taken to compute averages should be obtained at equally spaced time intervals which:
Here it is stated expressis verbis that samples obtained via irregular sampling process are still samples with the additional burden of error.
Precisely.
Parameter – you said: ” obtained via irregular sampling process are still samples ”
You are not given the TIMES at which Tmax and Tmin are taken. You can only guess. You said as much. So they are not samples to which Nyquist could possibly apply.
I can’t find any link to the “Guide” you mention (please post if you care to) but wonder if “irregular” does not refer to what I call “bunched”
http://electronotes.netfirms.com/AN356.pdf
http://electronotes.netfirms.com/EN205.pdf
-Bernie
Hey Bernie,
You are not given the TIMES at which Tmax and Tmin are taken. You can only guess. You said as much. So they are not samples to which Nyquist could possibly apply.
There are kinds of signal where you actually don’t need timestamp per sample (William pointed audio signals). All you have is an array of equally spaced values which allows you to reconstruct shape of a signal. If you introduce to the sampling procedure irregularity you are likely to distort reconstructed signal, even sampled with sufficient, on average, sampling rate. More irregularities, more probable is that your recovered signal will be distorted severely (depending on nature of a signal). Exactly as per daily min/max. I don’t why this is controversial.
Furthermore, from your daily min/max you derive pretty much new signal – daily midrange (which differs significantly compared with an original one, see here. Red curve original signal, blue based on daily midranges). And here you’ve got all canonical: discrete values (datapoints) per each equally spaced day. So sweet!
I can’t find any link to the “Guide” you mention (please post if you care to)
Clyde referred to this WMO guide, section 1.3.2.2. Good thing is that that was written by weather gurus, not pure DSP wizards to avoid accusations of imposing DSP practices into the alien field of temperature acquisition.
I can’t find any link to the “Guide” you mention (please post if you care to) but wonder if “irregular” does not refer to what I call “bunched”
Thanks for resources, yes, looks like they call it ‘bunched’ or non-uniform sampling procedures. So, looks like Mr Nyquist is quite happy with non-uniform sampling. He is saying that whatever sampling method we use, uniform or not, it still needs to obey limits he defined.
Hey Bernie,
I can’t find any link to the “Guide” you mention (please post if you care to) but wonder if “irregular” does not refer to what I call “bunched”
http://electronotes.netfirms.com/AN356.pdf
http://electronotes.netfirms.com/EN205.pdf
Initially, I haven’t noticed that those articles were authored by yourself. Congratulations – an impressive stuff! So, from your research and simulations you know better than me that ‘bunched’ sampling may introduce significant distortion to the recovered signal. We may recover a signal to some extent by different interpolation techniques but how much – that depends on exact nature of a signal and irregular sampling.
Paramenter said January 23, 2019 at 5:27 am: “ There are kinds of signal where you actually don’t need timestamp per sample (William pointed audio signals). All you have is an array of equally spaced values which allows you to reconstruct shape of a signal. If you introduce to the sampling procedure irregularity you are likely to distort reconstructed signal, even sampled with sufficient, on average, sampling rate. More irregularities, more probable is that your recovered signal will be distorted severely (depending on nature of a signal). Exactly as per daily min/max. I don’t why this is controversial. “
Yes, and I have been pointing just that out to my students for 40 years! We are talking here of knowing (or NOT knowing) the sample time RELATIVE to a regular spacing. That is the “time” we don’t know. For example, in the case where you HAVE actually sampled to 288 samples/day, is a particular value of Tmax reported at n=0 or n=287 or in-between? Potentially HUGE obvious errors, and unrelated to Nyquist or to “jitter”.
Thanks for the WMO link. I will look at it.
You also said : “ So, looks like Mr Nyquist is quite happy with non-uniform sampling. ”
Funny because I know an EE named H. Nyquist! Don’t know if the original wrote about non-uniform spacing. There is a modest little! book of 900 pages titled; Nonuniform Sampling: Theory and Practice, edited by F. Marvasti (Kluwer 2001). Tough going, and a WAD of cash!
-Bernie
Paramenter said, January 23, 2019 at 10:04 am:
“ So, from your research and simulations you know better than me that ‘bunched’ sampling may introduce significant distortion to the recovered signal. We may recover a signal to some extent by different interpolation techniques but how much – that depends on exact nature of a signal and irregular sampling.”
Well no – Actually I am talking about recovering the full signal EXACTLY from bunched samples. Here is an example. Suppose I have a signal of bandwidth B which I sample greater than 2B. The sampling train function is [ . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 . . . . . ]. All is well, and I can recover the signal with low-pass filtering (sync interpolation).
If I then notice that the bandwidth was actually only B/2, then I can obviously resample by throwing away every other sample. That is [ . . . . . 1 0 1 0 1 0 1 0 1 0 . . . . . ] . Keep one, toss one. Wider sync interpolation functions.
With bunched sampling, it means I can, for example, keep 2, toss two. That is [ . . . . . 1 1 0 0 1 1 0 0 1 1 0 0 . . . . . ]. And so on. NOTHING IS LOST because the bandwidth was actually smaller. To reconstruct, you need to calculate the interpolation functions which are “sync-like”, and you have to know the (relative) times of the samples kept.
-Bernie
William said: ” If we start with 2 ideal samples/day (perfectly spaced/timed) I think we have pretty good agreement that these are actually samples and that these samples lead to sampling error – evident when calculating means. ”
Really ! If these are “ideal samples” as you postulate, the anti-aliasing measures assures that you have only a DC value and one cycle of a sinusoidal, and the two samples average EXACTLY to the correct mean.
What sampling error are you talking about?
-Bernie
Bernie Hutchins
You said, “… and the two samples average EXACTLY to the correct mean.” Not so. If you were dealing with a single frequency, that would be true. But, it should be obvious that the daily temperatures do not follow a pure, single frequency, but only appear to do so because that is all that can be extracted from two samples. Therefore, it should be obvious that the mean is not going to be correct if it is based on a fictional single frequency. In extreme cases, the real world time series may not even resemble a sinusoid, but be more like a saw-tooth form, in which case the steep rise or decline needs a very high sampling rate to be captured.
Clyde – thanks
I was responding to William’s particulars: ” 2 ideal samples/day (perfectly spaced/timed).” Assuming no one neglected to implement the required anti-aliasing filters for this rate, this can only be a DC term plus a fundamental – a mean and a single sinewave cycle, any two samples of which are symmetric in amplitude about the mean, and average to the exact mean.
I have previously posted (Jan 16 at 10:32 a) an argument that you can even decimate an additional step – to just one sample – and still get an error-free mean.
You even get the correct mean of your sawtooth.
Please explain.
– Bernie
Bernie
You said, “I have previously posted (Jan 16 at 10:32 a) an argument that you can even decimate an additional step – to just one sample – and still get an error-free mean.” That is effectively what calculating the mid-range value is doing. Willis has shown that errors ARE introduced compared to the actual mean by using fewer samples than are required by the high-frequency components. Additionally, an unstated assumption is that one is dealing with low frequencies so that a linear interpolation is a best guess. However, if high frequencies are present, it is possible that the true value of the original signal at the position of the interpolated value is VERY different from the interpolated value! If it is a singular noise spike in the original data, then the interpolated value will be close to being the correct average. However, if the signal is characterized by many noise spikes, then the interpolated value will be lower than the mean. One has to be careful to define all of the circumstances, and properties of the data before making generalizations.
Clyde Spencer at January 23, 2019 at 10:30 am said: “ Bernie You said, “I have previously posted (Jan 16 at 10:32 a) an argument that you can even decimate an additional step – to just one sample – and still get an error-free mean.” That is effectively what calculating the mid-range value is doing. “
You can’t be suggesting that the mean and the mid-range are the same! They can be drastically different. What are you saying? It is true that the mean and the mid-range along with Tmax and Tmin are all single-number PARAMETERS of the particular cycle although they are not samples of the signal, since no sampling time is associated with any of them. Note that a “running mean” or “moving average” (rectangular tap FIR digital filter) would almost certainly be a signal.
-Bernie
Bernie
No, I’m NOT suggesting that the mean and the mid-range are the same. I’m one of the people who first questioned the utility of the mid-range being used as though it were a mean.
I was responding to your claim that you could decimate two readings and get a single value that would be an accurate representation of the mean. I was pointing out that collapsing two samples to one was what the mid-range calculation does, and it has been demonstrated to not be equivalent to the mean.
Bernie I think you misunderstood the drift of that thought exercise. “Ideal” was just referring to perfectly timed. I developed the idea that if you start with a real world, best in class performing converter you get jitter in the pico-second range. No one seems to say that this small clock error invalidates Nyquist. I progressed to greater and greater jitter times and ended with scenario of Tmax/Tmin. I asked of those claiming max and min are not samples, to explain where along the path from ideal to best in class to absurd max/min do we stop calling it samples? What is the limit and what is the math or science that justifies that claim? For those who say there is no timing with max and min, think of the case where we extract max and min from 288-samples per day. Timing is there. If we got the same values from a max/min thermometer we don’t have the timing, but it is still a Nyquist/sampling issue. Not enough data to reconstruct the original signal. When we reconstruct with a DAC we don’t feed the timing information along with the sample. The timing is implied by the reconstruction rate and ordering of the samples. Sample times are not recorded at all for most applications (like audio). The timing is inferred between the rate and position of the sample in the stream.
William
In my dictionary, any observation and recording of data is a “sample.” However, for this topic, a distinction has to be made for multiple random samples versus periodic or continuous sampling with a uniform temporal sampling interval.
Nyquist works equally for faster and slower sampling frequencies, the only difference is the measurement spectra. If you go for a fast phenomenon, like the temperature is not, then you need faster sampling. When your goal is the monthly average temperature, the 2 samples per day are just fine. The important detail here is that noise must also be observed in relationship with the Nyquist criterion.
Why 4,320-samples/day? Because they can. It kinda obsoletes the old data… not.
William Ward,
I’ve read all your writings here and your full paper. You are one of the most intellectually dishonest writers I’ve ever witnessed. Have you ever had your psychopathy measured? No matter how carefully commenters worded their posts to you, you never got it. Many times writers here, made honey traps for you and you just burnt them down. You aren’t even aware of what this speaks about you, let alone your theory. I went out of my way to understate and downplay the obvious in my comments, particularly regarding the real world spectral analysis of temperature time series. And yet you bull-shitted your way through wordy answers that avoided responsibility for your own statements. You know there is a difference between signal theory and real measurement and you now know that measurement systems can not capture frequency information below about 10 minutes. And yet here you are still flogging that dead horse.
Scott W Bennett
This response seems out of character compared to your previous dozen comments. Is it really you, or did someone hijack your name? In any event, it is an ad hominem attack that contributes little to the technical discussion. Are you purposely trying to start a flame war? You are not being a good ambassador for Australians.
You concluded your personal attack by stating (without citation or evidence) “… you [William] now know that measurement systems can not capture frequency information below about 10 minutes.” I don’t know why you would say that. Automated weather systems routinely REPORT summary averages in the 1 to 10 minute range, and collect samples much more frequently. See, especially, the discussion of sampling at pages 539 and 540 at the following link:
https://library.wmo.int/doc_num.php?explnum_id=3179
Lag is the word, is the word…
I wrote a simple thoughtful and succinct summation and got a long wordy prevarication from William in return.
At some point you have to call BS on the navel gazing of abstraction and return to the real world.
Lag (L) is the real world problem in the existing record. The historic, the traditional, the long term records we “actually have” were all made with thermometers in enclosed screens. Those thermometers were relatively slow to come to thermal equilibrium but the whole apparatus can lag by 15 mins. This places a real world constraint on the time constant of the system and the resulting measurements themselves. Thus, there is a true and identifiable spectral limit for any historical time series and/or any potential “signal” analysis made from them!
So that’s my bottom line! And talk of the comparison of those records with higher sampling rates is pointless!
Newer systems have introduced their own unique problems (And screens) such as a systematic bias resulting from the rapid response to micro temperature fluctuations that are more likely to be anthropogenic in origin* and/or unrepresentative of a natural system** and therefore, are not comparable with the existing records.
What is the thermal L of the real system being measured, are you measuring an homogenous temperature field that is at thermal equilibrium or are you sampling momentary fluctuations of dubious relevance?
Interestingly, studies of the spectral range of temperature information using automated – fast response type – electronic thermometers in Stevenson screens find a diffuse forcing around 12h and turbulent-like behaviour for smaller scales***. More interestingly perhaps, from 1h up to around 9h, air temperature behaves in a predictable way. Beyond this scale, Hurst values become smaller than 0.5, indicating a decreasing predictability.
*i.e. Exhausts, vehicular wake etc
**Having more thermal inertia and a slower rate of change
***No smaller than 10 minutes due to screen L!
Scott,
I saw your last hostile comments and I see this one. Why do conclude that I’m prevaricating? Why isn’t it a reasonable conclusion that I didn’t understand your comment or question and therefore missed your point? I took the time to write a longer post with the attempt to be helpful. If it wasn’t helpful or insightful, well ok, that is something I can handle. But it sounds like your actual goal was to set a “honey trap.” Is it really necessary to bring that kind of hostility to the conversation? Constructive feedback is something I will consider. Hostility is just dismissed. Your hostility speaks to your character – or lack thereof – not to mine. I’m not going to engage you in some adolescent spat on a forum. I have not seen any sincerity from you and therefore your attacks have no significance to me. I don’t think your hostility shows anything good about you. If something I said or did got under your skin, if you tell me about it constructively I’ll try to address it. If my personality or style is not to your liking then just stay away from me because it is not going to change for someone that has no significance to me and comes at me with hostility.
And though you don’t deserve for me to dignify any other response from you, I will address your specific technical comments for anyone else who may be reading.
Thank you for sharing some detailed information about the kinds of lag that the Stevenson screen and thermometers introduce. Why didn’t you just make that point without the hostility?
Scott said: “At some point you have to call BS on the navel gazing of abstraction and return to the real world.” And: “So that’s my bottom line! And talk of the comparison of those records with higher sampling rates is pointless!”
My reply: Why would you say that sampling and sampling properly is not the real world? It is how it is done in every other application I can think of except climate science. Why is it pointless to compare the correct method to the method currently used that doesn’t give us accurate information? Maybe I don’t fully get your drift, but it sounds like you are saying the problem is the lag. Well why can’t there be more than 1 problem? Mercury in glass thermometers can be replaced with other faster instruments. Screens can be redesigned. (Is this what you are recommending?) The max/min method will still not give you what is correct. From an engineering perspective, capture all of the content available and once sampled properly you are free to filter out what you don’t want or need. When you start talking about exhausts and vehicular wakes then aren’t we now speaking about improperly sited stations? (Yet another problem with the record).
Scott, if I missed your point maybe I’m just slow on comprehending what you are saying or maybe you just don’t write clearly. Neither scenario needs to be an indictment of us personally. I’m looking past your hostility to open a door to see if a better version of Scott might want to come forward. I’ll allow some space for you to come around, but just know that I’ll be just fine giving you the silent effe-you if you want to keep coming at me with hostility. Don’t confuse dignity with weakness. I tend to be rather forgiving – and I’d rather not have you as an enemy. The choice is yours.
Wlliam,
Again, you did not address what I’ve said. It is very hard to believe that you are being genuine by not conceding a single point that disrupts your narrative!
In Australia we already have over 500 AWS mostly housed in existing Stevenson screens that record three measurements a second (highest lowest and last). However, these are not compliant with WMO guidelines that recommends iteration over time to smooth out rapid fluctuations:
“The natural small-scale variability of the atmosphere, the introduction of noise into the measurement process by electronic devices and, in particular, the use of sensors with short time-constants make averaging a most desirable process for reducing the uncertainty of reported data.
In order to standardise averaging algorithms it is recommended:
(a) That atmospheric pressure, air temperature [And others] be reported as 1 to 10 min averages, which are obtained after linearization of the sensor output…
These averaged values are to be considered as the “instantaneous” values of meteorological variables for use in most operational applications and should not be confused with the raw instantaneous sensor samples or the mean values over longer periods of time required from some applications. One-minute averages, as far as applicable, are suggested for most variables as suitable instantaneous values. (1.3.2.4 Instantaneous meteorological values)”
You can see the two main issue above are one, to reduce the error of the reported data introduced by over sampling!* And two, to make the new system compliant with the old without introducing further uncertainty.
Probably the single most important reason the WMO insist on numerical averaging of electronic sensors is because mercury and alcohol thermometers have both longer and different time constants! Which makes mirroring the behaviour of liquid in glass thermometers an intractable issue. To be very clear and to restate this particular point:
“Alcohol thermometers (that measure temperature minima) have longer time constants than mercury thermometers (that measure temperature maxima).”
So again to restate, there a two real world problems pulling in opposite directions and they can not be solved by higher frequency sampling even in the most ideal situation because this simply introduces its own problems!
1-10 minute averaging is ‘worlds best practice’ going lower or higher is not recommended by anybody measuring real world temperature “signals”!
*To infinity and beyond as you keep repeatedly recommending!
Scott said “You can see the two main issue above are one, to reduce the error of the reported data introduced by over sampling!*”
Let me start by saying that one mans noise is another mans data.
It is easy to downsample what you call over sampled data. So please explain the introduced error due to oversampling?
William,
I hope somebody has actually had contact with you in the flesh because you sound a lot like a bot to me; I say this sincerely and with no hostility. Synthetic or cyber personalities – if you prefer – act a lot like the way you present here. Lots and lots of words but with very little meaning.
Here, in your very last response to me, you wrote 10 paragraphs, one 200 words long, that managed to say nothing approaching a logical debate, let alone a scientific discourse!
Here they are, in all their synthetic banality. You manage to say nothing of substance in paragraph one:
Three paragraphs of nothing follow:
Five paragraphs now and 300 words in and “you” are yet to say anything of substance.
Finally – six paragraphs in – you may have actually asked a question but only by restating the initial “argument” for FFS! Are you actually thinking about (Or computing) what I said:
Let’s agree to say that you are slow – though very, very hard to believe – I’ll give you the benefit of the doubt and will go through pg.5 point-by-point:
[1] It’s not a sampling limited issue! It’s an uncertainty limited issue! Lag – the time constant – of the real system, is the constraining parameter. Temperature isn’t heat! Heat is a measure of flux. Temperature itself is an index of heat but only when the measuring device is itself in equilbrium with the system being measured. There is an inherent L in the real world being “measured”.
[2] It’s hard to answer this without first repeating – again – are you even thinking about what I’ve already said several times? In glass thermometers have been replaced, screens have been redesigned and if you mean the ”Tmeam” that equals (Max = Min)/2, I’ve said Ad nauseam that it is incorrect but that Min & Max – alone as singular selections – are much more “correct” than any 2 random samples!
[3] Here you make my own point exactly and I quote: “capture all of the content available and once sampled properly you are free to filter out what you don’t want or need.” Of course, “Capturing all “available” content is – to be very generous – the million-dollar-question and the same one I’ve been raising here!
[4] No, you are talking about ephemeral phenomena whose fluctuations pollute the record because they are only picked up by the “super sampling” of electronic sensors available today. No known real world climate or weather related heat fluxes change as fast as these sensors are capable of recording! By anthropogenic, I mean things like black bitumen roads with thermal bubbles disrupted by vehicles that create sudden inflows. These types of changes are not possible anywhere in the natural world except for specific landscapes such as the magma fields of a volcanos.
William,
You then end your output, with another 100-odd words of nothing!
cheers,
Scott
Hi Scott
Scott Said”I mean things like black bitumen roads with thermal bubbles disrupted by vehicles that create sudden inflows. These types of changes are not possible anywhere in the natural world except for specific landscapes such as the magma fields of a volcanos.”
Clearly you have never experienced what a slight wind change can do near an inland lake or river.
From
http://www.bom.gov.au/climate/change/acorn-sat/documents/ACORN-SAT_Observation_practices_WEB.pdf
“The primary AWS sampling rate is 1 Hz, and mean, maximum temperature statistics are generated from the valid 1 Hz samples over the period of interest. The AWS stores the previous 72 hours of data in the form of statistics for ten minute periods in a circular buffer.”
Saying that faster sample rates cause errors is misleading as they are just capturing the reality of the site. If this is a problem then the site location is the problem not the sampling. It seems to me this additional higher sample rate data could be used as a site diagnostic.
Some general comments and not in reply:
Clearly to compare data collected with different system filter characteristics it is first necessary to match the system filter characteristics by putting one or both signals through a suitable proven transfer function. And yes having commentators compare a figure collected using one system to one collected using another system is a big problem.
It would be a big improvement if future researchers did not have as many valid reasons to complain about the data collected from now on as there is for data collected in our past and collecting it according to Nyquist is a big step in the right direction. There are also many places where standardised filtering is used for data comparison such as EMI measurements so it is about time the collection of climate data caught up with what industry has been doing correctly.
Hey Bright Red,
Thanks for kind words. As per data acquisition I also cannot see any reason why not use high quality measurements that include suitable sampling procedures. Sticking to daily midranges seems to be an artifact of the historical record where daily min/max is usually all what we have. Usual answer to discussed here objections is that by doing monthly averaging of daily midranges we effectively construct the new signal so we don’t need recover any daily frequencies. Well, again, that’s fine providing that you accept inherited errors caused by such treatment of underlying data, which as William posts points can be significant.
Few years ago there was a discussion on Judith Curry blog about potential impact of aliasing on averaged temperature record. Author of this article, Dr Richard Saumarez, attacked the problem from a different angle. He argues that yearly temperature record most likely is heavily aliased. Trends may be much more immune to the consequences of that (though not completely resistant) but any model built on aliased signal is in a grave danger. Discussion under that post in some respect is similar to discussion under this post. As Dr Saumarez pointed out when confronted with the problem of undersampling the usual response is:
“I’ve got a lot of data, I’ve analysed with ‘R’ packages and I don’t think there isn’t a problem.”
Priceless.
Now it seems the data is often stored in ascii text file format
NOAA stores their subhourly data in plain text, fixed-width delimeted. A file contains, along with the air temperature, several other attributes as solar radiation and so on. Usual size per each file containing yearly data is ~14 MB, thus for, say 10,000 stations across the globe that would be ~137 GB per whole globe per each year. That’s piece of cake – on the one decent laptop we could store global data for several years.
Hi Paramenter
“I’ve got a lot of data, I’ve analysed with ‘R’ packages and I don’t think there isn’t a problem.”
Yep Priceless.
No eliminating one error item that you have full control over simply makes no sense to me.
It would be interesting to have some of the commentators here at a design meeting where the specification of the data collection system was being decided and the topic of error due to potential aliasing came up.
Q) Is there aliasing at 24samples/day in the limited examples we have looked at?
A) Yes but it is small.
Q) Can you give the daily maximum error due to aliasing for all station locations in the world over the 30 year design life?
A) No
Q) What sample rate would reduce this error to practically 0, being at least an order of magnitude lower than the resolution we are recording, for all stations at all times?
A) Current best practice and available data indicates 288samples/day should be plenty.
Q) Is there any additional cost in implementing 288/samples/day
A)No
Then 288 is the minimum. Next topic.
Bright Red,
I love the Q&A. I have been in a few meetings like that.
Hi Paramenter,
You said: “That’s piece of cake – on the one decent laptop we could store global data for several years.”
When you consider that the fate of humanity and life as we know it on the planet is in jeopardy from climate catastrophe, you would think incurring the cost of a few laptops would be warranted.