Guest post by Lance Wallace
Abstract
The traditional estimate of temperature at measuring stations has been to average the highest (Tmax) and lowest (Tmin) daily measurements. This leads to error in estimating the true mean temperature. What is the magnitude of this error and how does it depend on geographic and climatic variables? The US Climate Reference Network (USCRN) of temperature measuring stations is employed to estimate the error for each station in the network. The 10th-90th percentile range of the errors extends from -0.5 to +0.5 C. Latitude and relative humidity (RH) are found to exert the largest influences on the error, explaining about 28% of the variance. A majority of stations have a consistent under- or over-estimate during all four seasons. The station behavior is also consistent across the years.
Introduction
Historically, temperature measurements used to estimate climate change have depended on thermometers that record the maximum and minimum temperatures over a day. The average of these two measurements, which we will call Tminmax, has been used to estimate a mean daily temperature. However, this simple approach will have some error in estimating the true mean (Tmean) temperature. What is the magnitude of this error? How does it vary by season, elevation, latitude or longitude, and other parameters? For a given station, is it random or consistently biased in one direction?
Multiple studies have considered this question. Many of these are found in food and agriculture journals, since a correct mean temperature is crucial for predicting ripening of crops. For example, Ma and Guttorp (2012) report that Swedish researchers have been using a linear combination of five measurements (daily minimum, daily maximum, and measurements taken at 6, 12, and 18 hours UTC) since 1916 (Ekholm 1916) although revised later (Moden, 1939; Nordli et al, 1996). Tuomenvirta (2000) calculated the historical variation (1890-1995) of Tmean – Tminmax differences for three groups of Scandinavian and northern stations. For the continental stations (Finland, Iceland, Sweden, Norway, Denmark) average differences across all stations were small (+0.1 to +0.2 oC) beginning in 1890 and dropping close to 0 from about 1930 on. However, for two groups of mainly coastal stations in the Norwegian islands and West Greenland, they found strongly negative differences (-0.6 oC) in 1890, falling close to zero from 1965 on. Other studies have considered different ways to determine Tmean from Tmin, Tmax and ancillary measurements (Weiss and Hays, 2005; Reoicovsky et al., 1989; McMaster et al., 1983; Misra et al., 2012). Still other studies have considered Tmin and Tmax in global climate models (GCMs) (Thrasher et al, 2012; Lobell et al., 2007).
This short note examines these questions using the US Climate Reference Network (USCRN), a network of high-quality temperature measurement stations operated by NOAA and begun around 2000 with a single station, reaching a total of about 114 stations in the continental US (44 states) by 2008. There are also 4 stations in Alaska, 2 in Hawaii, and one in Canada meeting the USCRN criteria. Four more stations in Alaska have been established, bringing the total to 125 stations, but have only 2-3 years of data at this writing. A regional (USRCRN) network of 17 stations has also been established in Alabama and has about 4 years of data. All these 142 stations were used in the following analysis, although at times the 121- or 125-station dataset was used. The stations are located in fairly pristine areas meeting all criteria for weather stations. Temperature measurements are taken in triplicate, and other measures at all stations include precipitation and solar radiance. Measurements of relative humidity (RH) were instituted in 2007 at two stations and by about 2009 were being collected at the 125 sites in the USCRN network but not at the Alabama (USRCRN) network. A database of all measurements is publically available at ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/. The database includes hourly, daily, and monthly results. This database, together with single compilations of multiple files kindly supplied by NOAA, was used for the following analysis.
Methods
The monthly data for the 142 stations were downloaded one station at a time and joined together in a single database. (Note: at present, the monthly data are only available to the public as separate files for each station. Daily data are available as separate files for each year for each station. This requires 142 separate downloads for the monthly data, and about 500 or so downloads for the daily data. Fortunately, a NOAA database manager was able to provide the daily data as a single file of about 373,000 records.)
The hourly data include the maximum and minimum 5-minute average temperatures recorded each hour as well as the mean temperature averaged over the hour. The daily data include the highest 5-minute maximum and the lowest 5-minute minimum temperatures recorded in the hourly data that day (i.e. Tmax and Tmin) together with the mean daily temperature (Tmean). The average of Tmax and Tmin ({Tmax+Tmin}/2) is also included for comparison with the true mean. The monthly data includes the maximum and minimum temperatures for the month; these are averages of the observed highest 5-minute average maximum and minimum daily temperatures. There is also an estimate of the true mean monthly temperature and the monthly average temperature using the monthly Tmax and Tmin. The difference between the daily Tminmax and the true mean will be referred to as Delta T:
DeltaT = (Tmin+Tmax)/2 – Truemean
Data were analyzed using Excel 2010 and Statistica v11. For each station, the entire length of the station’s history was used; the number of months ranged from 47 to 132. Since the relationship between the true mean and Tminmax may vary over time, these were compared by season, where Winter corresponds to January through March and so on. The diurnal temperature range (DTR) was calculated for each day as Tmax-Tmin. For the two stations with the highest and lowest overall error, the hourly data were downloaded to investigate the diurnal pattern.
Results
As of Aug 11, 2012 there were 12,305 station-months and 373,975 station-days from 142 stations. The metadata for all stations are available at the Website http://www.ncdc.noaa.gov/crn/docs.html.
Delta T averaged over all daily measurements for each station ranged from -0.66 oC (Lewistowne, MT) to +1.38 oC (Fallbrook, CA, near San Diego). (Figure 1). A negative sign means the minmax approach underestimated the true mean. Just about as many stations overestimated (58) as underestimated (63) the true mean.
Figure 1. DeltaT for 121 USCRN stations: 2000-August 5, 2012. Error bars are standard errors.
A histogram of these results is provided (Figure 2). The mean was 0.0 with an interquartile range of -0.2 to +0.2 oC. The 10-90 percentile range was from -0.5 to + 0.5 oC.
Figure 2. Histogram of Delta T for 121 USCRN stations.
Seasonal variability was surprisingly low: in more than half of the 121 stations with at least 47 months of complete data, the Tminmax either underestimated (28 sites) or overestimated (39 sites) the true mean in all 4 seasons. Most of the remaining stations were also weighted in one direction or another; only 20 stations (16.5%) were evenly balanced at 2 seasons in each direction. 16 of these 20 were negative in winter and spring, positive in summer and fall. Over all 121 stations, there was a slight tendency for underestimates to be favored in winter and spring, with overestimates in summer and fall (Figure 3).
Figure 3. Variation of Delta T by season.
Since Delta T was determined by averaging all values over all years for each station, the possibility remains that stations may have varied across the years. This was tested by comparing the average Delta T for each station across the years 2008-9 against the average in 2010-11. The result showed that the stations were very stable across the years, with a Spearman correlation of 0.974 (Figure 4).
Figure 4. Comparison of Delta T for each station across consecutive 2-year periods. N = 140 stations.
When Delta T is mapped, some quite clear patterns emerge (Figure 5). Overestimates (blue dots) are strongly clustered in the South and along the entire Pacific Coast from Sitka, Alaska to San Diego, also including Hawaii. Underestimates (red dots) are located along the extreme northern tier of states from Maine to Washington (excepting the two Washington stations west of the Cascades) and all noncoastal stations west of Colorado’s eastern border.
Figure 5. DeltaT at 121 USCRN stations. Colors are quartiles. Red: -0.66 to -0.17 C. Gold: -0.17 to 0 C. Green: 0 to +0.25 C. Blue: +0.25 to +1.39 C.
Figure 5 suggests that the error has a latitude gradient, decreasing from positive to negative as one goes North. Indeed a regression shows a highly significant (p<0.000002) negative coefficient of –0.018 oC per degree of latitude (Table 1, Figure 6). However, other variables clearly affect DeltaT as shown by the adjusted R2 value indicating that latitude explains only 21% of the observed variance.
Table 1. Regression of DeltaT (Tminmax-True mean) on latitude
| N=142 stations | Regression Summary for Dependent Variable: DELTATR= .467 R²= .218 Adjusted R²= .212F(1,140)=38.9 p<.00000 Std.Error of estimate: .278 | |||||
|
b* |
Std.Err. of b* |
b |
Std.Err. of b |
t(140) |
p-value |
|
| Intercept |
0.75 |
0.11 |
6.6 |
0.000000 |
||
| LATITUDE |
-0.466 |
0.075 |
-0.018 |
0.002 |
-6.2 |
0.000000 |
* Standardized regression results (μ=0, σ=1)
Figure 6. Regression of DeltaT on Latitude.
Therefore a multiple regression was carried out on the measured variables within the monthly datafile. The Spearman correlations of these variables with DeltaT are provided in Table 2. The largest absolute value of the Spearman coefficient was with latitude (-0.375), but other relatively high correlations were noted for Tmin (0.308) and RHmax (0.301). However, TMIN, TMAX, TRUEMEAN and DTR could not be included in the multiple regression, since they (or their constituent variables in the case of DTR) appear on the left-hand side as part of the definition of DELTAT. Also the three RH variables were highly collinear, so only RHMEAN was included in the multiple regression. Finally, because Alaska and Hawaii have such extreme latitude and longitude values, they were omitted from the multiple regression. These actions left 3289 station-months (out of 3499 total) and 6 measured independent variables, of which 4 were significant. Together they explained about 30% of the measured variance (Table 3, Figure 7). However, only latitude and RH were the main explanatory variables, explaining 28% of the variance themselves with about equal contributions as judged from the t-values. When the multiple regression was repeated for each season, in fall and winter the four significant and two nonsignificant variables were identical to those in the annual regression, with adjusted R2 values of 19-20%, but in spring and summer all six variables were significant, with R2 values of 47-50%. However, in all seasons, the two dominant variables were latitude and RH.
Table 2. Spearman correlations of measured variables with DeltaT.
| VARIABLE |
DELTAT |
| LONGITUDE (degrees) |
0.075 |
| LATITUDE (degrees) |
-0.375 |
| ELEVATION (feet) |
-0.169 |
| TMAX (oC) |
0.231 |
| TMIN (oC) |
0.308 |
| TMINMAX (oC) |
0.272 |
| TRUEMEAN (oC) |
0.239 |
| DTR (oC) |
-0.134 |
| PRECIP (mm) |
0.217 |
| SOLRAD (MJ/m2) |
-0.043 |
| RHMAX (%) |
0.301 |
| RHMIN (%) |
0.124 |
| RHMEAN (%) |
0.243 |
Table 3. Multiple regression on DeltaT of measured variables
| N=3289 station-months | Regression Summary for Dependent Variable: DELTAT R= .5522 R²= .3049 Adjusted R²= .3037F(6,3282)=239.98 p<0.0000 Std.Error of estimate: .3683Exclude condition: state=’ak’ or state=’hi’ | |||||
|
b* |
Std.Err. of b* |
b |
Std.Err. of b |
t(3282) |
p-value |
|
| Intercept |
-0.294812 |
0.085454 |
-3.4500 |
0.000568 |
||
| LONG |
-0.169595 |
0.018086 |
-0.005496 |
0.000586 |
-9.3772 |
0.000000 |
| LAT |
-0.407150 |
0.015910 |
-0.032380 |
0.001265 |
-25.5913 |
0.000000 |
| ELEVATION |
0.066710 |
0.018980 |
0.000013 |
0.000004 |
3.5147 |
0.000446 |
| PRECIP (mm) |
-0.008293 |
0.017129 |
-0.000055 |
0.000114 |
-0.4842 |
0.628291 |
| SOLRAD MJ/m2) |
0.000193 |
0.016465 |
0.000013 |
0.001099 |
0.0117 |
0.990630 |
| RHMEAN |
0.552356 |
0.021529 |
0.015417 |
0.000601 |
25.6565 |
0.000000 |
* Standardized regression results (μ=0, σ=1)
Figure 7. Predicted vs observed values of DeltaT for the multiple regression model in Table 3.
Since RH had a strong effect on DeltaT, a map of RH was made for comparison with the DeltaT map above (Figure 8). The map again shows the clustering noted for DeltaT along the Pacific Coast, the Southeast, and the West. However, the effect of latitude along the northern tier is missing from the RH map.
Figure 8. Relative humidity for 125 USCRN stations: 2007-Aug 8, 2011. Colors are quartiles. Red: 19-56%. Gold: 56-70%. Green: 70-75%. Blue: 75-91%.
Fundamentally, the difference between the minmax approach and the true mean is a function of diurnal variation—stations where the temperature spends more time closer to the minimum than the maximum will have their mean temperatures overestimated by the minmax method, and vice versa. To show this graphically, the mean diurnal variation over all seasons and years is shown for the station with the largest overestimate (Fallbrook, CA) and the one with the largest underestimate (Lewistowne, MT) (Figure 9). Although both graphs have a minimum at 6 AM and a maximum at about 2 PM, the Lewistown (lower) diurnal curve is broader. For example, 8 hours are within 2 oC of the Lewistowne maximum, whereas only about 6 hours are within 2 oC of the Fallbrook maximum. Another indicator is that 12 hours are greater than the true mean in Lewistowne but only 9 in Fallbrook.
Figure 9. Diurnal variation and comparisons of the true mean to the estimate using the minmax method for the two stations with the most extreme over- and underestimates.
Discussion
For a majority of US and global stations, at least until recent times, it is not possible to investigate the question of the error involved in using the Tminmax method, since insufficient measurements were made to determine the true mean. The USCRN provides one of the best datasets to investigate this question, not only since both the true mean temperatures and the daily Tmax and Tmin are provided, but also because the quality of the stations is high. Since there are >100 stations well distributed across the nation, which now have at least 4 years of continuous data, the database seems adequate for this use and the results comparing 2-year averages suggest the findings are robust.
The questions asked in the Introduction to this paper can now be answered, at least in a preliminary way.
“What is the magnitude of this error?” We see the range is from -0.66 oC to +1.38 oC, although the latter value appears to be unusual, with the second highest value only +0.88 oC.
“How does it vary by season, elevation, latitude or longitude, and other parameters?” The direction of the error is surprisingly unaffected by season, with more than half the stations showing consistent under- or over-estimates during all 4 seasons. We have seen a strong effect of latitude and RH, with a weaker effect of elevation. Geographic considerations are clearly important, with coastal and Southern sites showing strong overestimates while the northern and western stations mostly show strong underestimates of the minmax method. Although the Tuomenvirta (2000) results mentioned above are averages across all stations in a region, still their findings that the coastal stations in west Greenland and the Norwegian islands showed a strong delta T in the same direction as the coastal stations in the USCRN supports the influence of RH, whereas their finding of the opposite sign for the continental stations shows the same dependence we find here for the Western interior USCRN stations. (Note that their definition of delta T has the opposite sign from ours.)
“For a given station, is it random or biased in a consistent direction?” For most stations, the direction and magnitude of the error is very consistent across time, as shown by the comparison across seasons and across years.
Considering the larger number of stations in the US and in historical time, we may speculate that the error in the minmax method was at least as large as indicated here, and most probably somewhat larger, since many stations have been shown to be poorly sited (Fall et al, 2011). The tendency in the USCRN dataset to have about equal numbers of underestimates as overestimates is simply accidental, reflecting the particular mix of coastal, noncoastal, Northern and Southern sites. It may be that this applies as well to the larger number of sites in the continental US, but there is likely to be a bias in one direction or another in different countries, depending on their latitude extent and RH levels.
This error could affect spatial averaging. For example, the Fallbrook CA site with the highest positive DeltaT value of 1.39 C is just 147 miles away from the Yuma site with one of the largest negative values of -0.58. If these two stations were reading the identical true mean temperature, they would appear to disagree by nearly 2 full degrees Celsius using the standard minmax method. Quite a few similar pairs of close-lying stations with opposite directions of DeltaT can be seen in the map (check for nearby red and blue pairs). However, if only anomalies were considered, the error in absolute temperature levels might not affect estimates of spatial correlation (Menne and Williams, 2008).
Although the errors documented here are true errors (that is, they cannot be adjusted by time of observation or other adjustments), nonetheless it would not be expected that they have much of a direct effect on trends. After all, if one station is consistently overestimated across the years, it will have the same trend as if the values were replaced by the true values. Or if it varies cyclically by season, again after sufficient time the variations would tend to cancel and the trend be mostly unaffected. Of course, this cannot be checked with the USCRN database since it covers at most 4-5 years with the full complement of stations, and normal year-to-year “weather” variations would likely overwhelm any climatic trends over such a short period.
Acknowledgement. Scott Ember of NOAA was extremely helpful in navigating the USCRN database and supplying files that would have required many hours to download from the individual files available.
References
Ekholm N. 1916. Beräkning av luftens månadsmedeltemperatur vid de svenska meteorologiska stationerna. Bihang till Meteorologiska iakttagelser i Sverige, Band 56, 1914, Almqvist & Wiksell, Stockholm, p. 110.
Fall, S., Watts,A., Niesen-Gammon, J., Jones, E., Niyogi, D., Christy, J.R., and Pielke, R.A., Sr. Analysis of the impacts of station exposure on the U.S. Historical Climatology Network temperature and temperature trends. J Geophysical Research 116: DI4120. 2011.
Lobell, D.B., Bonfils, C., and Duffy, P.B. Climate change uncertainty for daily minimum and maximum temperatures: A model inter-comparison. Geophysical Research Letters 34, L05715, doi:10.1029/2006GL028726, 2007.
Ma, Y. and Guttorp, P. Estimating daily mean temperature from synoptic climate observations. http://www.nrcse.washington.edu/NordicNetwork/reports/temp.pdf downloaded Aug 18 2012
Menne, M.J. and Williams, C.N. Jr. 2008. Homogenization of temperature series via pairwise comparisons. J Climate 22: 1700-1717.
McMaster, Gregory S. and Wilhelm, Wallace , “Growing degree-days: one equation, two interpretations” (1997). Publications fromUSDA-ARS / UNL Faculty. Paper 83.
http://digitalcommons.unl.edu/usdaarsfacpub/83Accessed on Aug 18 2012.
Misra, V., Michael, J-P., Boyles, R., Chassignet, E.P{., Griffin, M. and O’Brien , J.J. 2012. Reconciling the Spatial Distribution of the Surface Temperature Trends in the Southeastern United States. J. Climate, 25, 3610–3618. doi: http://dx.doi.org/10.1175/JCLI-D-11-00170.1
Modén H. 1939. Beräkning av medeltemperaturen vid svenska stationer. Statens meteorologiskhydrografiska anstalt. Meddelanden, serien Uppsatser, no. 29.
Nordli PØ, Alexandersson H, Frisch P, Førland E, Heino R, Jónsson T, Steffensen P, Tuomenvirta H, Tveito OE. 1996. The effect of radiation screens on Nordic temperature measurements. DNMI Report 4/96 Klima.
Reicosky DC, Winkelman LJ, Baker JM, Baker DG. 1989. Accuracy of hourly air temperatures calculated from daily minima and maxima. Agric. For. Meteorol.46, 193–209.
Weiss A, Hays CJ. 2005. Calculating daily mean air temperatures by different methods: implications from a non-linear algorithm. Agric. For. Meteorol. 128, 57-69.
Thrasher, B,L, Maurer, E.P., McKellar, C. and Duffy, P.B. Hydrol. Earth Syst. Sci. Discuss., 9, 5515–5529, 2012. www.hydrol-earth-syst-sci-discuss.net/9/5515/2012/ doi:10.5194/hessd-9-5515-2012. Accessed Aug 18, 2012.
Tuomenvirta, R.H., Alexanderssopn,H.,Drebs,A., Frich,P., and Nordli, P.O. 2000: Trends in Nordic and Arctic temperature extremes. J. Climate, 13, 977-990.
APPENDIX
The main concern of this paper has been with Delta T and therefore almost all of the above analyses deal with that variable. However, another variable depending on the daily Tmax and Tmin is their difference, the Diurnal Temperature Range (DTR), which has its own interest. For example, the main finding of Fall et al., (2011) was that the poorly sited stations tended to overestimate Tmin and underestimate Tmax, leading to a large underestimate of DTR. However, the USCRN stations are all well-sited and therefore the estimates of DTR should be unbiased. What can we learn from the USCRN about this variable? We can first of all map its variation (Figure A-1).
Figure A-1. Variation of daily DTR across the US CRN. Colors are quartiles. Red: 4.7-10.8 C. Gold: 10.8-12.0 C. Green: 12.0-13.8 C. Blue: 13.8-19.9 C.
Here we see that the coastal sites have the lowest daily variation, reflecting the well-known moderating effect of the oceans. Perhaps the two sites near the Great Lakes in the lowest quartile of the DTR distribution are also due to this lake effect. The Western interior states have the highest DTRs.
A multiple regression shows that RH is by far the strongest explanatory variable (Table A-1). Solar radiation and precipitation have moderate effects, and latitude is weakly significant. The model explains about 46% of the variance, with RHMEAN accounting for most (42%) of that (Figure A-2).
Table A-1. Multiple regression on Diurnal Temperature Range.
| N=3289 | Regression Summary for Dependent Variable: DTR (CRNM0101_US_AL_AK_HI RH MERGED WITH METADATA NEW)R= .68257906 R²= .46591417 Adjusted R²= .46493778F(6,3282)=477.18 p<0.0000 Std.Error of estimate: 2.3620Exclude condition: v3=’ak’ or v3=’hi’ | |||||
|
b* |
Std.Err. of b* |
b |
Std.Err. of b |
t(3282) |
p-value |
|
| Intercept |
20.14306 |
0.548079 |
36.7521 |
0.000000 |
||
| LONG |
0.010872 |
0.015854 |
0.00258 |
0.003759 |
0.6858 |
0.492909 |
| LAT |
-0.068287 |
0.013946 |
-0.03974 |
0.008115 |
-4.8965 |
0.000001 |
| ELEVATION |
-0.008117 |
0.016638 |
-0.00001 |
0.000025 |
-0.4879 |
0.625687 |
| PRECIP (mm) |
-0.170325 |
0.015015 |
-0.00829 |
0.000730 |
-11.3438 |
0.000000 |
| SOLRAD (MJ/m2) |
0.183541 |
0.014433 |
0.08967 |
0.007051 |
12.7167 |
0.000000 |
| RHMEAN |
-0.484798 |
0.018872 |
-0.09900 |
0.003854 |
-25.6888 |
0.000000 |
* Standardized regression results (μ=0, σ=1)
Figure A-2. Diurnal Temperature Range vs. mean RH.
The figure suggests that a linear fit is not very good; for RH between about 60-95% the effect on DTR (eyeball estimate) is perhaps twice the slope of -0.138 C per % RH for all the data..
Finally, how does the true mean temperature depend on the variables measured at the UCRN sites? The multiple regression is provided in Table A-2. Although all six variables are significant and explain about 79% of the variance, the relationship is largely driven (R2=59%) by solar radiation (Figure A-3).
Table A-2. Multiple regression of true mean monthly temperatures vs. measured meteorological variables.
| N=3289 | Regression Summary for Dependent Variable: TRUEMEAN R= .891 R²= .793 Adjusted R²= .793F(6,3282)=2095.9 p<0.0000 Std.Error of estimate: 4.58Exclude condition: State=’AK’ or State=’HI’ | |||||
|
b* |
Std.Err. of b* |
b |
Std.Err. of b |
t(3282) |
p-value |
|
| Intercept |
9.972524 |
1.062680 |
9.3843 |
0.000000 |
||
| LONG |
-0.037057 |
0.009869 |
-0.027366 |
0.007288 |
-3.7548 |
0.000177 |
| LAT |
-0.201479 |
0.008682 |
-0.365153 |
0.015735 |
-23.2071 |
0.000000 |
| ELEVATION |
-0.307433 |
0.010357 |
-0.001414 |
0.000048 |
-29.6825 |
0.000000 |
| PRECIP (mm) |
0.151732 |
0.009347 |
0.022991 |
0.001416 |
16.2333 |
0.000000 |
| SOLRAD (MJ/m2) |
0.752289 |
0.008985 |
1.144690 |
0.013671 |
83.7285 |
0.000000 |
| RHMEAN |
-0.076282 |
0.011748 |
-0.048521 |
0.007473 |
-6.4931 |
0.000000 |
* Standardized regression results (μ=0, σ=1)
Figure A-3. True mean temperature vs. solar radiation.
===============================================================
This document is available as a PDF file here:
Errors in Estimating Temperatures Using the Average of Tmax and Tmin












Ryan says:
August 31, 2012 at 3:23 am
Seems to me if you want to detect AGW you should use Tmin (usually nightime temps) since air temp at night is wholly dependent on the presence of greenhouse gases.
======================================================================
And you factored in UHI for that?
Perhaps I have completely missed the point of Nick Stokes’s comments, but I utterly fail to see how, in any discussion of global warming, there is no role for temperatures in GCMs. Isn’t the whole point of a global warming argument that temperatures are rising. Lance’s work shows, to me anyway, that there is a built in bias in the way composite (for want of a better word) temperatures are calculated, and that any discussion of global temperatures and global warming must consider the bias and error in the way temperature data is handled. It may turn out to be irrelevant, or it may turn out to be highly relevant, but it must be considered before rejecting it out of hand.
Nick Stokes says:
August 31, 2012 at 3:01 am
“No, of course they don’t make them up. They solve the equations of fluid flow with heat transport, along with half-hourly insolation and IR heat components (and latent heat release, vertical transport modelled etc). All of this on a regular grid, roughly 100 km and half hour intervals. There’s simply no role for any kind of daily mean, and station readings are not used anywhere.”
Not really. They use specific approximations to the governing equations with lots of terms missing. Of course, with codes like GISS Model E, we really DON’T know what they’re solving (and they probably don’t either since they don’t document it anywhere).
Another thing about the climate “models”. If they do indeed solve essentially the same non-linear equations as a typical CFD code, why do their solutions remain “stable” for 100 years of integration time when we know full well that numerical weather prediction model become chaotic after a week? Could it be crappy, highly diffusive numeric discretizations and “controlling” the solutions through ad hoc logic and unphysical filters and source terms? Well, no one from the warmist camp ever wants to talk about these things (can’t blame them…).
I wonder if this work could be used to determine whether the big reduction in thermometers back in the 90s might have an impact on the trend.
Gunga Din says:
August 30, 2012 at 8:30 pm
It looks like the example set by the “Watts et al” paper to put a paper up here to be “fire-proofed” is being followed….
________________________
Willis did it with his Thermostat theory too.
I think it is a great idea. The best run company that I have ever worked for did it with new products all the time before the product went from the pilot plant stage to production. We caught a heck of a lot of costly mistakes that way.
It only works however if egos are left at the door. At another company at a new product presentation, I did my usual critique based on the pilot plant findings and was roundly slammed for attacking their baby by the project engineers. Turned out I was correct and the company lost millions. To save face I was fired shortly there after for not being a “Team Player” – go figure.
I think we see the same type of attitude problem with “Team Players” in climastrology. Dr. Phil Jones of the UEA CRU sent this in reply to Warwick Hughes when he asked for data. “Why should I make the data available to you, when your aim is to try and find something wrong with it?” From: An Open Letter to Dr. Phil Jones
This type of attitude in any scientist is absolutely deadly. Unfortunately it is all to common especially at the Phd level. Max Planck stated (Translated from the German) “A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it”. paraphrased: “Science advances one funeral at a time.”
For interest here is a time plot comparing USCRN stations (only 40 of them with data from 2003) daily temperatures using their values for mean and average (mean = (max+min)/2 Average = average of 5 minute data)
http://www.climateandstuff.blogspot.co.uk/2012/08/uscrn-average-vs-mean-data.html
And a comparison of Berkley and USCRN conus
http://www.climateandstuff.blogspot.co.uk/2012/08/uscrn-comared-to-best.html
@Mathew W.
Thing is Matthew, that without any greenhouse effect caused by the gas in our atmosphere, the heat caused by UHI would immediately radiate into space and the measured temperature at night would be close to absolute zero, just as it is on the moon. Of course, most greenhouse gases block incoming radiation during the day to the same extent, thus making daytime temps colder than they would be. Team AGW claims that for CO2 this is different because CO2 doesn’t block the incoming radiation but does block the outgoing radiation. If this were true the easiest way to detect it would be to look at the difference in trends between nightime (usually Tmin) temps and daytime (usually Tmax) temps over time – if CO2 is really doing what Team AGW claim it is doing then it should cause nightime temps to be rising much faster than daytime temps. This measurement would remove a lot of the error in temperature measurements over time since you would be comparing the delta between two measurements made using the same equipment on the same site at roughly the same time(well, during the same 24 hr period which significantly reduces the chance of site changes). Not seen anybody do it yet though.
I don’t think UHI would make much difference to these measurements, since the UHI sourced heat would be equally blocked by CO2 (or any other greenhouse gas) both during daytime and nightime. Thus looking for the delta would, in theory, cancel out the impact of UHI when looking for the AGW impact.
Excellent work, and yes, this is something that is perfectly obvious. T_max – T_min/2 is nothing but a first order estimator of the mean of a nonlinear function that itself varies substantially by physical location, local conditions and season.
, recall). The oceans, on the other hand, retain heat and their temperature is a nearly direct measure of the bulk enthalpy of at least the surface layer. The heat capacity of the oceans is enormous and the heat is mixed and stored at depth so it takes a long time to release, where the heat capacity of the ground surface layer is nearly irrelevant — one cold clear night and the ground surface is cold where the ocean remains warm(er) for weeks to months after the weather turns.
I would point out, however, that as I’ve been building simple models of the GHE — ones sufficiently simple that they can convince even Latour and the “Slayers” that it exists and doesn’t violate the 2nd law of thermodynamics — I’ve become more and more convinced that the correct metric for global warming/cooling/climate is enthalpy, not temperature. Yes, enthalpy is LOCALLY directly proportional to temperature, mostly, except for that pesky but highly significant latent heat from water vapor, that sucks energy in and doles it out at constant temperature all of the time and over more than 70% of the globe (all open water surfaces, all wet surface soils).
This is a critical contributor to the nonlinear temperature variation observed by this study. When the ground e.g. cools to the dew point of the atmosphere above it, dew forms. Dew rejects its latent heat into the ground at that point, blocking the further reduction of surface temperature even as the latent heat continues to be radiated away. The end result is that a patch of ground actually rejects more of the heat that was absorbed during the day by maintaining a warmer e.g. nighttime temperature to lose the daytime energy that was absorbed AS latent heat instead of being reradiated during the day. Latent heat is also extremely important in energy transport, both vertically through the GHG column and laterally from e.g. the oceans to the land. The continental US may have set records for mean warmth (or not, not my purpose to argue but parts of it were pretty hot compared to the historical record if not the hottest) but the Gulf of Mexico was actually abnormally cool.
I have a theory that this means that this year was unusually efficient at LOSING energy. A hot, dry midwest and southwest radiates heat away from the Earth very efficiently (
A 2-3 C cooler Gulf going into autumn — which looks about where it is — represents a much lower base temperature from which winter cooling will proceed. The southwest and midwest will very likely cool quickly as fall proceeds, but will then be sandwiched between a normally cold arctic and a Gulf that cools to a significantly lower winter temperature than usual. This means that as a heat reservoir it will have less heat to give up to maintain the warmth of the southEAST US.
The North Atlantic also appears to be warm, sure, but at least 2C cooler in the tropics than it has been, on average, over the last decade or so (26-29C compared to 30-31C) at this time of year. We’re already experiencing the first signs that fall weather patterns are kicking in in NC — well off of peak summertime temperatures, nights under 21C, less hazy skies. Hurricanes, recall, are HEAT ENGINES and hence actually cool the ocean from which they draw their energy, so even though they aren’t very common this year Isaac actually left a visibly cooler SST wake of 27C water that is showing no signs of rapidly warming back up to 28-29C. It, too, is currently dumping enormous amounts of latent heat/enthalpy both into space (radiated away at the top of the troposphere) and is going to have a strong net cooling effect on the ground it rains on both by reflecting sunlight and by dumping liquid water that sooner or later will (partially) evaporate from the soils on which it falls.
These could be signs that the NAO is thinking about changing, as they are definitely a change from the predominant pattern of warming in the Gulf and Atlantic north of the ITCZ. It would be most interesting to know what the Hadley cells are doing this year.
rgb
Very good work, Lance, indeed.
I used to work on this problem now for almost 4 years. I fully agree with your findings regarding the bias caused by the method (or “algorithm” as I name it) compared to the “true” mean, which you obtain only by hourly or quarter hourly or even finer measurements. I found the same by using the papers of Aguilar et al 2003 “GUIDANCE ON METADATA AND HOMOGENIZATION” , and Alisow 1956 “Lehrbuch der Klimatologie”. There you find for some stations in Puchberg Austria (9 year run) and in a number of places in Russia some measured biases due to the used algorithm.
One should take in mind, that worldwide about 100 different ones (see Griffith 1997 “Some problems of regionality in application of climate change.”) had been in use and some different are still in use , where the max/min algorithm is only one of them, although rather widely used.
I have written a paper about this “algorithm error” which is accepted by E & E, but not published yet.
But I don´t agree with your conclusion (1) that the trend might not be influenced by this error due to the fact, that constructing the anomaly it will cancel out (implicitly stated that way).
Because you´ll see
1. a variation of this error over the full year. It is different every single month. And if you use monthly averages and deduct from them a 30 year station normal from the result , will have the same fluctuation as before. Eventually the magnitude is somewhat smaller.
2. And if the station normal error itself is different due to various reasons – what it always is, because no station in the world remained unchanged over this time or after or before, the anomaly propagates this error in part or in full.
3. And in case you mix all the various anomalies of the world- as is done- to obtain a global mean anomaly, than the various algorithms introduce an averaged systematic error (cautiously estimated) of about ± 0,2 K at least.
Finally I agree completely with RCS who wrote:
It is easy to show that decimation of a signal by averaging an epoch, say a day and treating that average as a representation of the signal in question is incorrect and introduces errors into the derived time series.
Please excuse my poor english, but I hope I made myself understood.
regards Michael
Sorry the cite was missed
(1) cite “Although the errors documented here are true errors (that is, they cannot be adjusted by time of observation or other adjustments), nonetheless it would not be expected that they have much of a direct effect on trends. After all, if one station is consistently overestimated across the years, it will have the same trend as if the values were replaced by the true values.”
Michael
Frank K. says: August 31, 2012 at 5:28 am
Frank, I was explaining why T_minmax and T_mean, in my view, have no role in GCMs. That was the proposition that Lance was raising. Can you point to where GCMs rely on those quantities?
Well It looks like a lot of work for statisticians. I thought that I read somewhere that (Tmax+Tmin)/2 was NOT a good measure of diurnal variations so gave the wrong average. So the first thing I looked for was a graph of ‘diurnal variations’ ; any place, any day. Why not July 4th 2,000 at 1600 Pennsylvania Avenue. Well not there; or anywhere else or any other day.
Maybe fig 9, its got at least 23 hours, but evidently doesn’t reflect any one day.
But I’ll take it as gospel, that any place, any day, the diurnal temperature over 24 hours looks like fig 9.
Well obviously, fig 9 is NOT a 24 hour sinusoid. Looks like it has at least a second harmonic component, and likely higher ones, both odd and even.
So the diurnal Temperature at any place is clearly not a one cycle per 24 hours band limited signal, it has once per 12, and once per 8 , and so on frequency components.
So TWO samples per 24 hours (Tmax and Tmin) is just barely enough samples,for a perfect sinusoid, but for the second harmonic component, the undersampling is a factor of two, and worse for the higher frequencies, so calculating an average, which is the zero frequency component, is already corrupted by aliassing noise; so your min max stuff is not even real data, so all this statistics is being applied to essentially meaningless random numbers that do not comprise real data.
You have to comply with the Nyquist criterion FIRST, before you even have data to statisticate on.
Well so much for temporal sampling; what about spatial sampling.
Wow, I really love those four Alaska weather station Temperature samples; how come so many; didn’t Briffa find that he only needed one Yamal Charlie Brown Christmas tree To determine the global climate.
What is it they say ; “A noisy noise annoys an oyster !” Saw a mathematical proof of that once.
I’d like to see this level of analysis applied to the Manhattan Telephone directory, to see what one can learn about the history of Telephone numbers.
Ryan says:
August 31, 2012 at 6:52 am
I don’t think UHI would make much difference to these measurements, since the UHI sourced heat would be equally blocked by CO2 (or any other greenhouse gas) both during daytime and nightime. Thus looking for the delta would, in theory, cancel out the impact of UHI when looking for the AGW impact.
=======================================================================
Of course I could be wrong, but I thought it had already been shown that areas with UHI had higher (warmer) Tmin temps and that was one reason for the alleged warming trend.
Lance, I’ve spent a lot of time looking at DTR from the NCDC’s Global Summary of Days data, and find over a large number of stations it is very consistent. If you follow the link in my name, you can see what I’ve done.
Nick Stokes says:
August 31, 2012 at 3:01 am
“No, of course they don’t make them up. They solve the equations of fluid flow with heat transport, along with half-hourly insolation and IR heat components (and latent heat release, vertical transport modelled etc). All of this on a regular grid, roughly 100 km and half hour intervals. There’s simply no role for any kind of daily mean, and station readings are not used anywhere.”
and Nick Stokes says:
August 31, 2012 at 10:10 am
Frank K. says: August 31, 2012 at 5:28 am
Frank, I was explaining why T_minmax and T_mean, in my view, have no role in GCMs. That was the proposition that Lance was raising. Can you point to where GCMs rely on those quantities?
Nick, this is really maddening. Are you saying the GCM modelers never check their work? At some point, don’t they have to compare their results with measured temperatures? At least historical retrodictions. And that means they compare to temperatures determined by Tmin & Tmax. Which means they are tuning their models to a set of somewhat biased numbers, which are not just random but may vary by latitude and RH.
Michael Limburg says:
August 31, 2012 at 9:18 am
“But I don´t agree with your conclusion (1) that the trend might not be influenced by this error due to the fact, that constructing the anomaly it will cancel out (implicitly stated that way).
Because you´ll see
1. a variation of this error over the full year. It is different every single month. And if you use monthly averages and deduct from them a 30 year station normal from the result , will have the same fluctuation as before. Eventually the magnitude is somewhat smaller.
2. And if the station normal error itself is different due to various reasons – what it always is, because no station in the world remained unchanged over this time or after or before, the anomaly propagates this error in part or in full.
3. And in case you mix all the various anomalies of the world- as is done- to obtain a global mean anomaly, than the various algorithms introduce an averaged systematic error (cautiously estimated) of about ± 0,2 K at least.”
Michael, thanks for your comment. There may in fact be some effect on the trend, I just haven’t been able to think of an obvious way such an effect could arise. As I alluded to, the USCRN timescale is far too short to create a 30-year station normal and resulting anomalies. Also, as Mosher pointed out, the latitude effect on the error is constant with the station, so no effect of trend there. But on the other hand, the model presented explained only about 30% of the variation, so there are obviously other effects on deltaT that could conceivably affect a trend. You have apparently done the work and found an effect, which is very interesting. I look forward to your paper in Energy and Environment. If you care to share a preprint with me, which I would keep confidential, my email is lwallace73@gmail.com .
My 12:28 am above was a late-night blunder. Here is a correction post.
Here is MT-Lewistown with data split at 2010 July. (blue 200807 to 201007, brown after 201007)
http://i46.tinypic.com/2cf48eh.png
(Mean, stddev, mean std err) before (-0.62, 0.21, 0.05)
(Mean, stddev, mean std err) after (-0.69, 0.33, 0.07) (unchaged)
These two means are not significantly different.
Lance Wallace says: August 31, 2012 at 12:38 pm
“At some point, don’t they have to compare their results with measured temperatures?”
Not for the ongoing functioning of the GCM. But yes, for evaluation, which anyone with access to the output can do afterwards. And for that you can test any temperature statistic you have observations to match. T_max, T_mean, T_noon, anything. Because you have a stream of half-hourly GCM values. They would usually choose a regional measure rather than individual station values.
Of course, some notion of observed temperature is used to initialize GCMs. But it is acknowledged that we can never get an initial state completely. That’s why GCMs are run for a settling in period before the time stretch to be modelled.
OK, I’m glad we agree that at some point the GCM output is matched against temperatures, as well as other climatic variables. Next question: does tuning occur? I imagine if the model doesn’t do well at retrodicting the past, it won’t be favored–so it might have to be tweaked. Now suppose it is trying to retrodict temperatures in say, a landlocked Northern country or perhaps a tropical one, where the Tminmax method may have been used. If the latitude effect applies, it will be trying to match temperatures that are biased low in one country and high in the other. Could this be a problem? Of course, the effect may be trivial, I don’t know.
What I would like to see is a much larger database of global stations with both Tminmax and Truemean (perhaps every 3 or 6 hours could substitute for continuous measures here) measured over a long period. Then a model could test whether the latitude or RH effects noted for the USCRN database could be confirmed, or other significant parameters be identified. Then if the model was good enough to predict the true temperature field (a tall order), the GCMs could be tested against that rather than the suboptimal Tminmax field.
My understanding of the history, is that gcms did not match rising temps with increasing co2, and that Hansen added a climate sensitivity multiplier to “fix” them.
I love it !! N = 3 ROFL and the error is already larger than the purported global warming signal.
Between this study and the comments, the reality of how difficult it is to actually MEASURE temperature is laid bare and it’s about time.
I have only 2 points to add to the discussion:
1) daily temperature “population” is NOT normally distributed and therefore parametric statistics have no business being used in temperature studies like these.
2) according to any and all sampling standards, 3 samples is a pitiful small portion of the ACTUAL NEEDED number of samples to estimate the really TRUE mean daily temperature given the variance.
However, this study is vastly superior to the usual anecdotal attempts to describe the behavior of land temperatures. So thanks for that, very much !
By the way, the photo says the station name is LEWISTON, MT, there is no town named Lewistowne here, and the site is not especially close to Lewistown, MT either. A closer town would be Hobson, MT but who knows why any name is selected. It is just like temperature perhaps – completely to somewhat arbitrary.
Geoff Sherrington says:
August 31, 2012 at 1:12 am
1. I know of a case where for secades the newspapers were given values different to those recorded for official use. How do you measure climate change from 2 starting points a degree apart?
You cannot use anomalies by combining records of two different stations unless you understand why those two stations give different readings and adjust accordingly. I don’t know how using the true averages would change that.
2. Estimation of other parameters from temperature, such as W/m^2. Not so easy with anomalies, is it?
Actually, it is impossible, either with true average temperatures or with anomalies. If you calculate the emissivity of the area surrounding a thermometer according to the average temperature of that thermometer, you are calculating it wrong. What you really need to get the correct value is to integrate the emissivity function (dependent of T^4) in time. And doing so would only give you the emissivity of a tiny area surrounding the thermometer. It tells you nothing about the area 1km further, as you don’t know the actual temperature there. So all you can do is guess. Given that it is an impossible problem, getting so ridicully closer does little to help. Error bars will still be huge.
3. Variations in technique from year to year. The change from liquid-in-glass to thermistor/thermocouple devices, the change from one reading per day to one a minute or more frequently, satellite data – each of these methods casues CARE with temperature measurement because each can give a different ‘anomaly’.
Correct, and again, I don’t know how using the true average, which is calculated from the same thermometers, would make it any better. If because of changing technology you start to get Tmax/Tmin 1C higher, that will also be true regarding average temperatures.
4. Spike rejection – can arise and be filtered different ways, once one determines what is the ‘actual temperature’ and how to measure it.
A spike that appears in average temperature also appears as a spike in the anomalies, so again there is no advantage about this.
5. It is simply sloppy science to use non-standard units like “degrees F minus 30-year reference period” unless there are compelling reasons to vary from K. We’ve moved away from strange units like “furlongs per fortnight” for velocity and most of the world now uses K or C not F. Where do U stand?
I’ve living in Spain all my life. Here nobody uses degrees F. But if we did, it would still make no difference regarding the benefits of using true averages vs Tmax/Tmin.
Regards.
Lance writes “Now suppose it is trying to retrodict temperatures in say, a landlocked Northern country or perhaps a tropical one, where the Tminmax method may have been used. If the latitude effect applies, it will be trying to match temperatures that are biased low in one country and high in the other. Could this be a problem? ”
Its well known GCMs do poorly at modelling regional temperatures but the modellers dont think this is a problem. When their temperatures are all summed up to a global level the result is within cooee of the same summed up, error prone, smeared out values from the “real world’s past” and thats enough for them to argue their model has value.
Yes, they’re tuned. They dont think this is a problem either.
An important contribution. But only for 12 years, for the US and the beginning only of the processes employed to assess temperature change,
.
Well, duh, the Tmax and Tmin will always be affected by the presence of water, i.e. availability of humidity. This is what happens when you use units of measure that only indicate partial heat. The bulk of the atmosphere’s heat is contained by humidity and thus any change in specific humidity will greatly change the Tmax and Tmin. The proper measurement is in Enthalpy NOT Temperature. Qmax and Qmin gives the TOTAL heat and thus the true Mean of the energy content of the air. Hence any temperature anomaly based claim of warming or cooling is meaningless.
The idea that a 10 degree rise in Pt. Barrow, Alaska is equal to a 10 degree rise in Tampa, Florida is an absurdity that anyone with an education in the physical sciences should catch. Maybe Hansen and Schmidt should take a basic course in Thermodynamics to learn what the proper units of measure are for any given observation based claim.
The error due to the minmax filter on a daily basis would be greater than that shown here, as it is filtered by a monthly smooth. The smoothing by using monthly differences filters some of the high frequencies that are part of the daily minmax filter noise, thus reducing variance. The reduction in variance is not due to a cancellation of opposite errors. I would submit that the smoothing using months from the Julian calendar would add additional noise, as the smoothing window is not constant (it varies between 28, 29, 30 and 31 days), the smoothing is done at irregular intervals (i.e. between 28 and 31 days) and the smoothing is quantized at the same monthly intervals, as opposed to, for example, a straight 30-day smooth, where the window is 30-days wide and is shifted one day at a time. Also, only 80% of the variance (roughly) is contained within the “10-90 percentile range,” as opposed to showing a range containing 95% of the variance (2 sigmas assuming a normal probability distribution).
The analysis of minmax bias is by far the most interesting part of this study. Showing the importance of relative humidity provides statistical support to those who have argued that moist enthalpy is a better measure than dry bulb temperature. It is hard to overemphasize the importance of this result.
My only criticism involves the following statement:
I would strongly disagree with the notion that there would be no effect on trends. Theoretically, the statement might be substantially accurate, but only if you use exactly the same stations during the entire century or so period of interest. As E.M. Smith has documented extensively, there have been numerous changes over the past century or so in the stations used to calculate trends, including the “great dying of thermometers” quite recently. The biases found in this study would seem to be significant enough that such stations changes should be assumed to significantly impact trend estimates unless proven otherwise.
Phil,
I agree completely with you. That´s exactly what you will find out if you go deeper into this subject of stations treatment and condition during the time they are on duty.
dScott.
Your are also right. What Hansen and all the others statisticians overlooked is that temperature is a property of matter, and additionally an intensive quantity. They prefer to calculate just patterns within a bunch of figures forgetting totally that these figures are just local indicators of physical processes, which are not in thermodynamic equilibrium. So they do not represent temperatures at all.
From an engineers perspective it is pretty clear that what is going on here is statistical fraud.
What they should be aiming to present is the average temperature related to the presence of the greenhouse gas CO2.
What they are actually measuring is the average temperature related to the presence of greenhouse gases, clouds and wind.
In the UK a summers day without cloud could rise to about 36Celsius, but would only be about 14Celsius on a cloudy day, and even less if there is a strong wind from the north. In other words they are looking for a tiny CO2 related signal in the presence of a huge noise signal but then presenting the information as if there was no error in the CO2 related signal due to the presence of so much noise. The only attempt they are making to reduce the noise is averaging, but averaging is a simple filtering process that will retain much of the input noise at the filter output. This summer in the UK is a fantastic example of this – it has been one of the cloudiest on record – almost every day has been cloudy and so the output of the simple averaging filter will merely record the fact that this was a cloudy year with subsequently low daytime temperatures. CO2 didn’t really play a part in the UK summer – it was dominated by cloud.
Put the real error bars into their charts due to this noise and you would see that their attempts to measure CO2 related trends in the presence of such a huge amount of noise is doomed to failure. They don’t put in such error bars. They don’t even put in error bars related the inaccuracies of using mercury thermometers in Stevensons screens for measuring temperature absolutes. Their error bars are simply related to the way rounding errors in the calculations can occur.
I don’t condemn Team AGW for doing this kind of thing so much as the scientists how handle statistical analysis of data ever day and aren’t prepared to stand up and denounce this kind of schoolboy error.