Errors in Estimating Mean Temperature – Part II

Guest post by Lance Wallace

Last week (Aug 30), Anthony Watts posted my analysis of the errors in estimating true mean temperatures due to the use of the (Tmin+Tmax)/2 approach widely used in thousands of temperature measuring stations worldwide: http://wattsupwiththat.com/2012/08/30/errors-in-estimating-temperatures-using-the-average-of-tmax-and-tmin-analysis-of-the-uscrn-temperature-stations/ . The errors were determined using the 125 stations in NOAA’s recently-established US Climate Reference Network (USCRN) of very high-quality temperature measuring stations. Some highlights of the findings were:

A majority of the sites had biases that were consistent throughout the years and across all seasons of the year.

The 10-90% range was about -0.5 C to + 0.5 C. (Negative values indicate underestimates of the true temperature due to using the Tminmax approach.)

Two parameters—latitude and relative humidity–were fairly powerful influences on the direction and magnitude of the bias, explaining about 30% of the observed variance in the monthly averages. Geographic influences were also strong, with coastal sites typically overestimating true temperature and continental sites underestimating it.

A better approach than the Tminmax method may be to use observations at fixed hours, which would eliminate the problem of the time of observation of the temperature extremes. One common algorithm is to use measurements at 6 AM, noon, 6 PM, and midnight. We will describe this method as 6121824. A second approach used in Germany for many years was to use measurements at 7AM, 2 PM, and 9 PM (71421) or in some cases to use double weights for the 9 PM measurement (7142121). (h/t to Michael Limburg for the information on the German algorithm.)

How do these methods compare to the Tminmax method? Do they lower the error? Would latitude and RH and geographic conditions continue to be predictors of their errors, or would other parameters be important? In this Part II of this study, we attempt to answer these questions, using again the USCRN as a high-quality test-bed.

In Part I, two datasets from the NOAA site ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/ were employed—the daily and monthly datasets, with about 360,000 station-days and 12,000 station-months, respectively. For our purposes here, we also need the hourly dataset, with about 8.2 million records. This was obtained (again with help from the NOAA database manager Scott Embler) on Sept. 4, 2012. These three datasets are all available from me at lwallace73@gmail.com.

The hourly dataset provides the maximum, minimum, and mean temperature for each hour. Also recorded are precipitation (mm), solar radiation flux (W/m2), and RH (%). Since the RH measurements were added several years after the start of the network, only about a third of the hours (2.8 million), days (120,000) and months (3600) have RH values.

A first look confirms that 3 or 4 measurements per day are better than two (Figure 1). The entire range of the 6121824 method almost fits into the interquartile range of the Tminmax method (-0.2 to +0.2C).

clip_image001

Figure 1. Errors in using four algorithms to estimate true mean temperature. Values are monthly averages across all months of service for 125 stations in the USCRN.

A measure of the monthly error is provided by the distribution of the absolute errors (Table 1). The Tminmax method is clearly inferior by this method, having about 3 times the absolute error of the 6121824 method. The two German methods are intermediate at close to 0.2 C.

Table 1. Distribution of absolute errors for 4 algorithms.

Valid N Mean Abserror Std.Dev. 25%ile Median 75%ile Maximum
ABSMINMAX 11109 0.32 0.27 0.10 0.20 0.50 1.9
ABS6121824 11333 0.11 0.10 0.04 0.08 0.15 1.3
ABS71421 11333 0.19 0.17 0.07 0.15 0.26 1.3
ABS7142121 11333 0.20 0.17 0.08 0.16 0.28 1.3

We can compare methods across years or across seasons for any given site. The error for a given method was often about the same across all four seasons, although the bias across methods could be quite large (Figure 2). Errors across years were even more stable, but again with large biases across the methods (Figures 3 & 4).

clip_image002

Figure 2. Errors (C) by season at Durham NC. DeltaT is the error from the Tminmax method.

clip_image003

Figure 3. Errors (C) by year at Gadsden AL.

clip_image004

Figure 4. Errors (C) at Newton GA.

In Part I, I provided a map of the error from the Tminmax method. That map (updated to include 4 new Alaskan stations and an additional month of August 2012) is reproduced here as Figure 5. The strong geographic effect is immediately apparent, with the overestimates (blue) located along the Pacific Coast and in the Deep South, while underestimates (red) are in the higher and drier western half of the continent as well as along the very northernmost tier of states from Maine to Washington.

clip_image005

Figure 5. DeltaT at 121 USCRN stations. Colors are quartiles. Red: -0.67 to -0.20 C. Gold: -0.20 to -0.02 C. Green: -0.02 to +0.21 C. Blue: +0.21 to +1.35 C.

The next three Figures (Figures 6-8) map the three algorithms discussed in this post: the 4-point 6121824 algorithm as in the ISH network and the 3-point algorithms used in Germany (71421 and 7142121). The 4-point algorithm (Figure 6) does not have the well-demarcated geographic clusters of the Tminmax method. There is a cluster of overestimates (blue) in the farmland of the Middle West from North Dakota to Texas. Just to the West of them, however, there are a set of strong underestimates (red) from Montana through Colorado to New Mexico.

clip_image006

Figure 6. DeltaT 6121824 at 125 USCRN stations. Colors are quartiles. Red: -0.24 to -0.07 C. Gold: -0.07 to -0.02 C. Green: -0.02 to +0.02 C. Blue: +0.02 to +0.25 C.

The 3-point scale 71421 (Figure 7) shows something of a latitude-longitude dependence, with the strongest overestimates (blue) mostly in the North and West. This algorithm is rather heavily biased toward positive errors, so that even the red dots include some overestimates along with strong underestimates.

clip_image008

Figure 7. DeltaT 71421 at 125 USCRN stations. Colors are quartiles. Red: -0.21 to +0.08 C. Gold: +0.08 to +0.13 C. Green: +0.13 to +0.20 C. Blue: +0.20 to +0.45 C.

The errors in method 7142121 with the doubled 9 PM measurement (Figure 8) have a cluster of strong underestimates (red) in the Deep South and the Atlantic Coast from Florida to the Carolinas. Here the green dots are the best estimates (between -0.04 and +0.03) but they are spread throughout most of the country with the exception of the Deep South.

clip_image010

Figure 8. DeltaT 7142121 at 125 USCRN stations. Colors are quartiles. Red: -0.41 to -0.17 C. Gold: -0.17 to -0.04 C. Green: -0.04 to +0.03 C. Blue: +0.03 to +0.43 C.

As in Part I, a multiple regression was performed to detect what measured parameters might have an effect on the error associated with a given method. There are 6 available parameters: latitude, longitude, elevation, precipitation, solar radiation, and RH. Since some of these may be collinear, it is important to determine whether they are sufficiently related to cause errors in the multiple regression. The best way to do this is probably the test devised in Belsley, Kuh, and Welsch (1980). Their test has been incorporated in the SAS PROC REG/COLLIN. Not knowing SAS, or having access to someone who does, I tried factor analysis, as implemented in Statistica v11 (Table 2). Two variables with heavy loadings on Factor 1 were solar radiation and RH (with opposite signs). Factor 2 was dominated by the latitude and longitude. Since the earlier regressions showed that RH was generally stronger than solar radiation, and latitude stronger than longitude, the two weaker variables were left out of some regressions to see if the sign and magnitude of the other parameters would change markedly. However, little change was noticed. Therefore the multiple regressions presented here include all 6 variables.

Table 2. Factor analysis of 6 explanatory variables.

Factor 1 Factor 2
LONGITUDE 0.11 0.86
LATITUDE 0.29 -0.78
ELEVATION -0.58 -0.39
PRECIP 0.50 0.10
SOLRAD -0.73 0.30
RHMEAN 0.86 0.09

Following are the multiple regressions on the errors due to the four different methods (Tables 3-6). Table 3 is a slightly modified (addition of stations in Alaska and Hawaii plus one additional month) version of the corresponding table for the Tminmax errors in Part I. As in Part I, the updated regression shows about equal effects of latitude and RH, accounting for nearly all of the 29% R2 value. The maps in Part I and Figure 5 above showed the powerful effect of the coastal stations (overestimates) and the Western Continental stations (underestimates).

image

The six measured parameters had far less effect on the method using four equally-spaced hourly measurements (Table 4). In this case, solar radiation had the strongest effect, with an increase in sunlight leading to larger underestimates. However, the R2 was very small, at about 6%.

image

The strongest effect on the 71421 method was latitude, and it was in the opposite direction of the effect as noted for the Tminmax method (Table 5). Overall, however, the R2 was similarly low, at about 7%.

image

The method that double-counted the 9 PM measurement was similar in one respect to the Tminmax results, with the two main parameters being RH and latitude, both close to equal in explanatory power (t values of +18 and -18.6) (Table 6). However, the signs of each were in the opposite direction from the Tminmax results. The R2 value of 17% was quite a bit higher than for the other two methods using specified hours, but less than for Tminmax.

image

Discussion

A clear finding from this analysis is that the multipoint methods are better than the Tminmax method at estimating the true temperature. In fact, a nice result is that the 2-point method (Minmax) had an absolute average error of about 0.3 C, the 3-point method error was around 0.2 C, and the 4-point method brought the mean absolute error down to 0.1 C. However, this is averaged across all 125 sites and 11,000 months, so errors can be quite a bit larger for individual sites as shown in some of the figures above.

Although one could guess, based on the multiple regression results, that higher-latitude sites using the Tminmax method would be more likely to be underestimating the true temperature, and coastal sites to be overestimating, still the R2 was small enough (29%) that only a ground-truth investigation could be relied on to determine the precise sign and magnitude of the error. It might also be argued that even determining the size of the error at the present time would not tell us what the error was historically. However, the great stability across the years shown by these sites suggests that in fact a proper measurement today could predict past performance for many stations that had stable locations and measurement methods.

With respect to the 4-point method, a second network, the Integrated Surface Hourly (ISH) network uses this approach: ftp://ftp.ncdc.noaa.gov/pub/data/inventories/ISH-HISTORY.TXT. This network apparently has some thousands of stations, although I am not sure how many are of the same high quality as the USCRN stations. Based on these findings, one could expect that the errors at this network are considerably smaller than the errors at stations using the Tminmax method. However, the multiple regressions here give little indication of what direction and magnitude the error might have at any individual station. Therefore, at this network as well as at other stations, a proper series of measurements over several years would be needed to give an idea of the magnitude and direction of the error at a given station. However, if the basic finding here that such errors are highly repeatable over the years applies to many or most stations, then such an approach could go far to indicating the actual temperature field of the world even at much earlier times when only a limited set of measurements (subject to errors of the magnitude and direction found here) were available.

Conclusions

None of the temperature measurement algorithms were without error. The traditional Tminmax method was the worst, with a mean absolute error of about 0.3C. The 3-point German method (71421 and 7142121) had a mean absolute error of about 0.2C, and the 4-point (6121824) method a mean absolute error of about 0.1C. The Tminmax method is strongly affected by latitude and RH, whereas the other methods are less affected by these variables.

All methods were very stable from year to year for most sites. There was somewhat more variation by season, but a majority of methods had the same sign (i.e. consistently over- or under-estimated the true mean temperature) for all four seasons and for all years.

For a given site, it was difficult to predict which of the three fixed-time methods might over- or under-estimate the true mean temperature. Even the Tminmax method performed better than all the others for some sites.

The use of the USCRN network to study these methods was advantageous in offering one of the highest-quality networks available. However, it is of course limited to the US, with a limited latitude and longitude range. Of interest would be to extend this analysis to a more globally representative group of stations. For example, might it be true that stations at polar and tropical latitudes would confirm the latitude dependence found here, and perhaps even show higher underestimates? Would coastal sites around the world continue to over-estimate true mean temperatures? How would poor-quality sites, such as those affected by urban heat island (UHI) or other effects, depend on these parameters compared to high-quality sites? If large areas around the globe were found to be over- or under-estimating true mean temperatures due to the algorithm employed, how might it affect global climate models (GCMs), which may be tuned to slightly wrong historical temperature fields?

About these ads

61 thoughts on “Errors in Estimating Mean Temperature – Part II

  1. kadaka (KD Knoebel) says:
    September 12, 2012 at 10:42 pm
    “Table 1 too large, running into “facebook” section, can’t be read.

    The obscured numbers are :
    75th percentile: 0.50, 0.15. 0.26. 0.28.
    maximum: 1.9, 1.3, 1.3, 1.3.
    So all methods resulted in at least one station having an absolute mean error >1 C.

  2. I am a bit confused. Surely with the advent of temp. recorders temperatures are now measured every second or so and a mean is calculated for the day, also indicating what was the max and what was the min?

    • Yes but historically observers would go out once a day and record just the Tmax and Tmin values in the past 24 hours. So most of our historical global record for the past 100 years is subject to the sort of errors discussed in the post. Even contemporary stations (NOAA-ISH) may use the four observations per day method. I suspect stations in less developed areas may not have access to electronic recorders. Someone more knowledgeable than I might comment on what fraction of stations globally still use a small number of observations to estimate mean daily temperature. And I seem to recall a reference to a meteorological organization (WMO?) continuing to endorse the Tmax Tmin approach in order to maintain historical continuity. Perhaps a reader can confirm or refute that impression.

  3. I’m not sure why you think accurately computing the mean temperature for the day is important. This number really doesn’t tell us anything useful. Surely what matters is whether temperature has changed. Replicating the measurements taken in the past from today’s more complete data would seem a simpler approach.

  4. I wonder what the influence of time zones is. In Part I (Fig. 9), you show the temperature as a function of (local) time. If you want to estimate the true mean (~ area under graph) for the sinusoidal shape, it makes sense that you have a number of “best” times that add up to a good estimate. However, if two closeby stations are on different sides of a time zone, they will have horizontally shifted graphs, and they will use different “best” times. I wonder what happens if you correct for longitude and then sample at 3 or 4 points. I think that strip of reds and blues close to each other (through the center of the USA) in Figs. 6-8 might suddenly turn out to be less different from each other.

    • Frank de Jong says:
      September 13, 2012 at 12:14 am
      “I wonder what the influence of time zones is.”

      An important point that I had not considered. It should be possible to consider this as an additional parameter among those that might affect the error.

  5. With correcting for longitude, I mean sampling each station at “true local time”, i.e. a continuous time related to its longitude. One station close to a time zone would then sample it’s “6 AM” point at, say, 05:35 AM, whilst the one on the other side would sample at 6:25 AM.

  6. HenryP You’re a bit confused, for me is an understatement. If I understand this (and I probably don’t), Its not the thermometers that get it wrong, it’s the method of calculating the mean that is the problem and which mean temperature figure you are using for comparison. if i want to know how much the global temperature is actually increasing or decreasing and i use the 1961 – 1990 mean i will get one answer and if I use 1951 – 1980 I get a different answer. Australia has just had a bumper snow season, best in ten years and in Melbourne for the last 2 years it hasn’t stopped raining. Our dams were below 30% full and now there at almost 80% and still rising. If this is Global Warming bring it on.

  7. Ian H says:
    September 13, 2012 at 12:03 am
    “I’m not sure why you think accurately computing the mean temperature for the day is important.”

    I think it’s always important to get the best estimates you can of anything you are studying. In particular, we are trying to understand the climate, which is driven by parameters such as temperature, among others. An erroneous measurement of one parameter will have ramifications on our calculations. Global climate models are matched against historical measurements of temperature. If those are erroneous, the models will be wrongly tuned. Of course, it may be that these errors are small enough that it will not matter. But it might. Why not try to find the best estimates for everything going into your models?

  8. Richard111 says:
    September 12, 2012 at 11:33 pm
    “What is the relationship between temperature and energy in a fluid? Does it have any real meaning?”

    A deep question, much discussed over the years, in its relationship to climate. I think energy is the important parameter here. Almost the entire energy flux consists of radiation from the sun and radiation into space from the earth. I don’t think we can measure earth’s radiation very well, so don’t know if the flux is balanced or tipped one way or the other. From that point of view, the temperature is a secondary quantity. In fact a “global temperature” has about as much meaning as an average telephone number. A value of 15C, as an average between 30C at the tropics and 0C at the poles, would have very different energy implications from the same value of 15C, as an average between 15C at the equator and 15C at the poles. However, temperature does enter into physics and weather and climate calculations, as in the perfect gas law and of course the Stefan-Boltzmann Law, so from that respect it seems important to measure it as accurately as possible.

  9. While looking into the Time of Observation (TOBs) issue, I found this apparently-forgotten NCDC directory:
    ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/daily/

    It has a curious README file about the USHCN (save it before it is disappeared):
    ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/daily/README

    A real head-scratcher is in “NCDC QA Checks and Adjustments”:

    5. Temperature data from stations that took readings during the morning over some period have been checked for any date shifting resulting from observers assigning readings to the calendar day of occurrence (the previous day in the case of maximum temperature) rather than the observation day. Such readings were switched back to the day of observance as part of the manual QA checks on the HCN/D data.

    Am I reading that right? The min-max measurements are for the previous 24 hours. Let’s say an observer takes readings in the early morning. They know the maximum must have occurred during the previous day, before midnight.

    But if the observer tries to record that maximum on the day it actually happened, the day of occurrence, as part of “quality control” that maximum is transferred to the day when the readings were taken in the morning, the day of observation, which is not the day when the maximum happened.

    That sounds bad enough, as the adjustment creates error. The quality checker knows that measurement belongs to a certain day, but assigns it to the following day?

    But how is this done in practice? “Nearby” stations, perhaps hundreds of miles away, have a maximum of about a certain number recorded for this day, this station has that measurement on the previous day, but since it’s impossible for “nearby” stations to have different maximums on different days, they all have to have matching maximums on matching days, it’s obvious the number is on the wrong day so it gets moved?

    And there’s still the issue of being sure which actual day the maximum actually occurred on. If a researcher wants the maximum of a certain day, like one where there was a notable tornado, and that region has a lot of morning observers recording maximums on day of observation instead of day of occurrence, what reading will he find when checking the daily records?

    Plus I’ve known there to be freakish weather, when a different front is moving in or something similar, when the minimum was during the day with the maximum occurring during the night. How do temperature records built from 24 hour min/max readings show that?

    Am I understanding what it says about that quality check right?

    The previous quality check also doesn’t sound that great.

    4. Checks were implemented to ensure that maximum temperatures were never less than minimum temperatures on the day of occurrence, the preceding day, and the following day. Conversely, checks were performed to ensure that minimum temperatures were never greater than maximum temperatures on the day of occurrence, the preceding day, and the following day.

    Okay, who else besides me has seen weather swings where the maximum of one day can be less than the minimum of the previous day, usually in the spring and fall, or the minimum can exceed the previous day’s maximum? NCDC seems to believe this is impossible.

  10. @Richard, “What is the relationship between temperature and energy in a fluid? Does it have any real meaning?” The relationship is in the specific heat capacity of the fluid, the intrinsic and most real of these phenomena. Temperature is a measure of thermal energy flow.

    Perhaps if you look to the meaning of field in math and physics the meanings may become clear.

  11. Why not let Nature do the averaging? Put your thermometer a few feet underground and you’ll get a nice smooth temperature that rises and falls with the seasons. This would only go wrong in areas with lots of geothermal activity or radioactive minerals, but those are easy to spot.

  12. An unknown (to me) with the local time values in USCRN is daylight saving time, from the USCRN description of “local time” I would assume that it does represent true local time, with daylight saving adjustment for the summer. Thus would this affect any fixed time of day readings?

    It should be visible from the hourly data, I just haven’t had time to see if the changeover days have 23 or 25 hourly readings rather than a fixed 24.

    Another thought I’ve had in this area is to calculate a daily average based upon sunrise to sunrise.

  13. As a database guy I always think in those terms, and it I did a few back-of-the-envelope calculations regarding temperature records. If one took a reading every fifteen minutes you’d have 96 records/day for a site. That would give you a nice sample of the day to produce a temperature curve — if the low of 50 only occurs for one sample and then the curve rapidly climbs and stays around for much of the day, was the average temperature really 61? Surely knowing the shape of the temperature curve would be useful information. Statistical analyses can’t possibly reproduce a measurement like that.

    So, if one had 1500 weather stations producing 96 records/day you would get 144,000 records/day for the entire system. You don’t need but a station ID, timestamp and a temperature measurement in a record, so you’ve got an integer, timestamp, and float for a record (at its most basic). Even allowing for large storage types you’re only talking about 80 bytes/record, or about 11M/day of storage. A year comes to about 4G. A cheap hard drive these days has a terabyte on it, so one of those could store about 250 years worth of data, give or take.

    Wouldn’t that be a nice dataset to look at?

  14. kadaka wrote
    “Checks were implemented to ensure that maximum temperatures were never less than minimum temperatures on the day of occurrence, the preceding day, and the following day.”
    Okay, who else besides me has seen weather swings where the maximum of one day can be less than the minimum of the previous day, usually in the spring and fall, or the minimum can exceed the previous day’s maximum?
    ———————
    Since I have all the USCRN data in a SQL database, I can write a query across that data set and see if that situation ever occurred in the last 8-10 years it covers (in the US).

    I have already checked that the min is always <= max for a day for every day with valid data, but hadn't thought about needed check to nearby days.

  15. Since we have a lot of existing data on the Tmaxmin basis, an extension of the research that would be useful is how does a trend derived from Tmaxmin data compare to a trend derived from the other methods.

  16. It’s interesting how many people seem to miss this.

    Understanding the measurement capability and errors in the system is very important, particularly when the issues being debated are of the same magnitude. If we’re ever to have good data that can tell us what is actually happening, these are the kinds of things that need determining. This is very good work. My compliments to Mr. Wallace.

    Gerry Parker

  17. JamesS,

    “Wouldn’t that be a nice dataset to look at?”

    In my experience too, more data is always better, particularly in a noisy environment. Some will say Shannon’s Theorem tells us we only need so many samples to find the min/max signal, but that is without regard to higher frequency system noise (intermittent jet exhaust) or, as you say, what about higher frequency components that distort the signal into non-sinusoidal shapes. Is there something to be learned from that?

    Gerry Parker

  18. Be gentle with me if I am being stupid ( this is not my area of expertise) but……
    T max/min are the result of energy flow into and out of the climate system.
    The primary source of that energy is the sun and the max amount of energy from it at any one time ( at the equator) is a function of sin(a). E, where a is the angle above the horizon and E is solar emission.
    The amount absorbed by the earth is moderated by such things as clouds, so the actual amount absorbed may be less than max.
    Only knowing what Tmin/max are does not tell us the total amount of radiation received because we have no intermediate points on the curve that would allow us to calculate it….so that curve could be parabolic, hyperbolic or straight line. only if we record a time series can we get any meaningful result.
    So it seems to me that using only Tmin/max to obtain a mean global temp is an exercise in futility.

  19. Frank de Jong says:
    September 13, 2012 at 12:14 am
    “I wonder what the influence of time zones is.”
    =============
    And daylight savings time – which is not universally applied in all time zones. Figure 6 for example. It looks like someone forgot that farmers don’t bother with daylight savings time.

  20. What is the theory that supports TMinMax as a valid way to sample a signal?

    The Nyquist theorem is the method used universally in every other field. Why has it been overlooked by climate science?

    ABSMINMAX is not consistent with the Nyquist theorem and should be expected to generate errors.

    ABS6121824, ABS71421, ABS7142121 are all consistent with the Nyquist theorem. It is no surprise they generate lower errors.

    It is amazing the climate science has constructed a global temperature series while ignoring sampling theory. It is almost as though climate scientists don’t talk to scientists in other fields.

    http://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem

  21. The problem with ABSMINMAX is that you do not know at what time the min and max occurs, so you cannot accurately reconstruct the signal, even though you are sampling at 2x the frequency.

    ABS6121824, ABS71421, ABS7142121 on the other hand are sampling on fixed intervals, similar to what is done with digital music. This allows you to reconstruct the temperature curve – fill in the missing time periods – as is done with digital music – to reconstruct the temperature (music) signal accurately.

    Once you have reconstructed the analog signal from the digital samples, you can then calculate accurately the average power (temperature) in the signal. It is interesting that in audio processing, RMS is considered a more accurate measure of power than the arithmetic mean, to account for differences in signal shape.

  22. Of course the ultimate best average would be if at some time in the future, the temperature feeds from monitoring stations are continuous, and the average could then be integrated over continuous time. That still doesn’t mean there would not be inaccurate monitoring though.

  23. John Phillips:

    Of course the ultimate best average would be if at some time in the future, the temperature feeds from monitoring stations are continuous, and the average could then be integrated over continuous time. That still doesn’t mean there would not be inaccurate monitoring though.

    That would be best, but we have all this existing data to interpret.

  24. Chris says:
    September 13, 2012 at 7:08 am
    “Since we have a lot of existing data on the Tmaxmin basis, an extension of the research that would be useful is how does a trend derived from Tmaxmin data compare to a trend derived from the other methods.”

    I mentioned in Part I that it is hard for me to think of a way in which these errors, whether consistent or random, would affect the trend over a long enough period. Although the 4-10-year coverage of the USCRN is too short to obtain trends that are not obscured by weather oscillations, it is possible to check on whether the error (DeltaT) varies over that time scale. If it does, the trend from the Tminmax and other methods would differ from the true trend.

    For each of the 125 USCRN stations, I regressed the trend of the error for the 4 algorithms against time, using the 11,000-month dataset. For the Tminmax method, slopes varied from -0.2 C to +0.05 C per year. However, only one slope achieved significance (p<0.05). For the other three methods, again precisely 1 of 125 slopes achieved significance. (It was a different station for each method.) At least for this high-quality group of stations, the errors associated with the measurement algorithms very seldom (<1%) give trends significantly different from those associated with the true means.

  25. WRT UHI.

    If you have a site with UHI and you sample at midnight you will be injecting the largest UHI bias into your record.
    A few charts showing bias as a function of the hour.

    http://www.ep2.org.uk/climate/wp6/Climatology_desc.htm

    a few more

    http://www.ship.edu/uploadedFiles/Ship/Geo-ESS/Graduate/Exams/pompeii_answer_100218.pdf

    http://pubs.giss.nasa.gov/docs/2008/2008_Gaffin_etal.pdf

    http://www.springerimages.com/Images/RSS/1-10.1007_s00704-010-0310-y-4

    The other point, of course, is that folks are not looking for the “True” temperature average.
    The global “temperature” average is an index not a physical measure. After all, they are not averaging all air temps, they are average air temps over land with water temps in the ocean.
    That has been recognized by Hansen in his earliest writings. They are not intended to capture the “true” temperature. They are an index created to track a “change” over a historical period.

    It would matter if the bias changed over time as that would effect the trend.

  26. Lance Wallace:

    Thanks for the response.

    (I changed my name because I saw another “Chris” in other comments around here)

  27. Lance Wallace says:

    http://wattsupwiththat.com/2012/09/12/errors-in-estimating-mean-temperature-part-ii/#comment-1077030

    Henry says
    you said:yes: that is the short of it, which I knew. So with the advent of temp. recorders, since the beginning of the seventies, it becomes difficult to compare anything with the past where we had to rely on people doing tests at certain times and record the results. You can actually see that there are jumps in some of the records around that time.

    Henry@askwhyitisso
    According to my own dataset (and now also Hadcrut3, apparently) we have dropped by about 0.2 or 0.1 degree C, globally, since 2000. Cooling will still get worse, I am afraid.

    http://blogs.24.com/henryp/2012/04/23/global-cooling-is-here/

  28. climatebeagle says:
    September 13, 2012 at 6:45 am
    “An unknown (to me) with the local time values in USCRN is daylight saving time, from the USCRN description of “local time” I would assume that it does represent true local time, with daylight saving adjustment for the summer.”

    The USCRN README.txt describes the local time variable as “local standard time.”

  29. When I suggested, on this site two years ago, that modern instruments that I had seen in a marine laboratory were producing a daily, continuous Tmean rather than just Tmin and Tmax Steven Mosher took me to task and pointed out (1) that Tmean = (Tmax +Tmin)/2 was very accurate. He even provided me with a link to a site for which the readings showed that this was true. He also pointed out that we should still use Tmean = (Tmax +Tmin)/2, since it provides continuity with readings from less sophisticated instruments.

    Nevertheless another Blogger on this site, as I recall from Australia, some months later posted a pretty good reason for not using Tmean = (Tmax +Tmin)/2.

    Since then I have been waiting to learn Mr. Mosher’s response. I started posting this before the response came. As usual it seems very reasonable but I still concur with John Phillips: “Of course the ultimate best average would be if at some time in the future, the temperature feeds from monitoring stations are continuous, and the average could then be integrated over continuous time”.

    Until that is done we are only calculating the anomalies of (Tmax +Tmin)/2 and not the average temperature anomalies which remain unknown. (Tmax +Tmin)/2 may be the best proxy that we currently have but why not try for a better one since it is technically feasible? Or are climatologists afraid that the results may, for some reason or other, not be to their liking?

  30. Solomon.

    If you want a historical record you have to use Tmin+Tmax/2

    There is a bias, that is well known in the literature. The issue has always been and will always be “is there a bias in the trend” Some will bias hot, some will bias cold, but unless the bias changes over time, it will not impact the trend. I’ve said that over and over again.

    That said, folks should read this

    http://www1.ncdc.noaa.gov/pub/data/uscrn/publications/annual_reports/FY11_USCRN_Annual_Report.pdf

    a couple interesting tests are underway.

    1. The very FIRST microsite bias test to put real numbers on CRN1-5.
    2. a calibration of LST

  31. steven mosher says:
    September 13, 2012 at 10:44 am
    “WRT UHI.”

    Thank you, Steven, for the references showing clearly for NYC, Hong Kong and other places that UHI is most evident at night, peaking around midnight. So if Tmin occurs at night or very early morning it will be more affected by UHI than Tmax occurring in the afternoon. However, sometimes Tmin and Tmax occur at other times, so the correction for UHI would have to take TOB into account, as you know better than anyone. I did have an Appendix showing the number of times (out of 344,000 station-days) that Tmin and Tmax occurred for each hour of the day. Although generally giving the usual peak times of about 5-6 AM for Tmin and 2-3 PM for Tmax, there was a secondary peak around midnight due to weather systems. This Appendix was left off the post by accident, but I can make it available if there is any interest. lwallace73@gmail.com.

    It has occurred to me that since the USCRN has the true T as well as Tmin and Tmax for each day that the TOB correction presently used could be tested against this dataset. The dataset seems substantial enough now (11,000 station-months, at least 4 years of data for 121 of the 125 stations) to get an indication of the error rate of the present TOB algorithm. I seem to recall something like a 6% estimate of the error rate in a journal article recently. Another independent estimate of the error using these high-quality stations would be of interest. I can’t do this myself since I don’t have the algorithm available (I understand it is pretty complicated).

  32. NOTE also the comparisons between CRN and USHCN V2

    Also interesting to not is that 1 stations is going to be decomissioned because of road being build within 30 meters of the station. The plan is to run two stations to get data on the effect of building a road close to the station.. Predictions???

  33. Steven Mosher says:

    September 13, 2012 at 12:31 pm

    NOTE also the comparisons between CRN and USHCN V2

    Also interesting to not is that 1 stations is going to be decomissioned because of road being build within 30 meters of the station. The plan is to run two stations to get data on the effect of building a road close to the station.. Predictions???
    =================
    Where is the new road positioned, in relation to the prevailing winds ?

  34. It looks like a series of systematic errors are endemic in average temperature measurements. Moreover when one moves away from a high quality network the nature of that systematic error becomes less well understood. At the very least the nature of the systematic errors need to be established for all locations and stated any time the value(s) are used.

  35. Philip Bradley says:
    September 13, 2012 at 2:50 am
    “An article I wrote using Australian data shows how using tmin+tmax/2 compared to fixed time temperature measurements over-estimates the amount of warming over the last 60 years by 43%.”

    Philip, these data were originally analyzed by Jonathan Lowe at his Gust of Hot Air blog. They are apparently based on averaging anomalies across multiple (21?) Australian stations, but individual station data are not presented. Lowe takes each three-hour period of measurement and creates a separate slope for the anomalies. Then these different slopes appear to be averaged in some way and compared to the Tminmax estimate for the period of observation. This step of averaging the slopes appears questionable to me. I would prefer to see the deltaT for each station for each year. The statement that the Tminmax method has a higher slope than the fixed-time measurements is equivalent to a statement that the average deltaT is changing over time in a given direction. Why would that be? The high-quality USCRN sites indicate that the bias is very stable across time. Until I can see the results for individual stations, I can’t accept the conclusion.

  36. The real issue with a tmin+tmax/2 dataset is the sensitivity of tmin to early morning insolation changes. Increase early morning insolation and you get an earlier and higher tmin. This effect is largest in mid to high latitudes in winter (although obviously not so far poleward the sun doesn’t rise), because this is when the longest period of post dawn cooling occurs. These are places and times of year where we find the most warming in tmin.

  37. Lance Wallace says:
    September 13, 2012 at 2:12 pm
    Lowe takes each three-hour period of measurement and creates a separate slope for the anomalies. Then these different slopes appear to be averaged in some way and compared to the Tminmax estimate for the period of observation. This step of averaging the slopes appears questionable to me.

    My understanding is the 24 hour average is just the arithmetic average of the 3 hourly measurements, shown as a trend over the 60 years of data. I’m sure Jonathan (a professional statistician) will be happy to explain further.

  38. Why not dispense with Tmin?, it is only of weather curiosity value. Marry the old Tmax’s (even if they are time based and may not accurately record the real Tmax) with new ‘continuous’ Tmax reading, and we may start to approach a measure of solar insolation worthy of climate studies

    Leave the old Tmax/min recordings for the local radio station to broadcast.

  39. Lance, what you are doing amounts to applying some textbook statistication, to sets of numbers which inherently have NO relationship to each other or to anything else. You might as well be applying your algorithms to the third digit in the licence plates of cars passing you on the street.

    Yes you can apply the rules of statistics to absolutely ANY set of numbers at all, and obtain the comon stats from them. Mean / median / rms value / standard deviation / upper / lower quartiles / whatever / make up your own.

    And those results are quite valid, if you didn’t make arithmetic mistakes.

    But it is a far cry from being able to say those results have any meaning whatsoever. For that to happen, you actually have to have real data, from a system that has a causal reason to produce those numbers that were observed.

    That means that your “data” or its sampling process, has to conform to the information theory rules for sampled data systems, most notably the Nyquist Sampling Theorem.

    I f you don’t conform to that rule for proper sampling regimens, then what you have gathered is NOT signal; it is commonly referred to as “noise”.

    And no amount of statistics can turn iot into data. The central limit theorem can’t buy you a reprieve.

    I get so tired or reading of, and listening to debates about this or that statistical process. It may be fun maths; but it isn’t related to anything, if it is corrupted by aliassing noise due to Nyquist violation.

    And any twice per day Temperature sampling process, whether min-max or anything else, already violates Nyquist sampling rate b y at least a factor of two, and that means you cannot recover even the average of any signal that might be buried in the noise. The daily Temperature 24 hour cycle, would have to be purely sinusoidal with no second or third harmonic component, in order to get even the correct daily average, And I know of no physical reason, why the diurnal temperature curve would .be a pure sinusoid.

    So all of these unemployed “climate scientists” may be doing “science” with their taxpayer grants; the question is; The science of what ??

    If the first lesson in climate 101, is not the general theory of sampled data sytems; then any subsequent lessons, will likely add nothing to the student’s knowledge.

    • Steven Mosher says:
      September 13, 2012 at 4:36 pm
      “Before you go too much farther you might want to check the accuracy of using hourly
      data from CRN to calculate a TMAX and TMIN.
      Looking at hourly data and pulling out the max and min does NOT give you the min/max.
      you need 5 minute data to that.”

      Sorry if I did not make myself clear. There are indeed 5-minute data in the USCRN dataset. Each hour has a max and min 5-min value recorded. Then for each day, the highest max and the lowest min are selected as that day’s Tmax and Tmin.

      REPLY: I’m using USCRN data right now and I concur. Lance is right, Mosh is wrong. – Anthony

  40. Another fine job Mr. Wallace.

    Is there a way to add station age to your analysis? I suspect it would be interesting to see how the variation trends over the life of stations.

  41. george e smith says:
    September 13, 2012 at 4:20 pm
    “Lance, what you are doing amounts to applying some textbook statistication, to sets of numbers which inherently have NO relationship to each other or to anything else.”

    No disagreement here, George, I thought that was the point I was making. We have a very well-sampled nearly continuous stream of temperatures recorded in triplicate using platinum-resistance thermometers traceable to NIST. And then we have various attempts to estimate that daily true mean using 2, 3, or 4 measurements per day. With a great set of 125 stations of very high quality we can then determine the errors due to these limited measurement approaches. We can find the size, direction, seasonal cycles, annual trend, and (to a degree) the main causes of the errors. It may then be possible to apply the gain in our knowledge to estimate the errors at some thousands of other global stations. What’s not to like?

  42. All this temperature math ignores the effect of condensation events. “Averaging” two temperatures mathematically represents combining two or more representative air parcels. The mathematical “average” is purported to accurately calculate the resulting properties of the representative air parcels, after they have been combined under ideal conditions. Ideal conditions would require theoretical adiabatic “container(s)” with perfect rigidity, so that no heat is gained or lost through the walls of the containers and the total volume stays constant.

    However, when one air parcel has a relative humidity and temperature that is above the dew point of the other parcel, a condensation event may occur. Given the requirement of perfect rigidity, the result of a condensation event is the creation of a vacuum, as the water molecules that were previously in vapor form change to liquid form and now occupy much less volume. Since temperature is directly proportional to pressure according to the ideal gas law, the result is a sharp drop in the resultant temperature index (Willis’ thermostat?) when compared to using the arithmetic “average.”

    Theoretical condensation events would occur when “averaging” tropical stations with high-latitude stations, summer temperatures or temperature indices with winter temperatures or indices at the same station, low-altitude humid stations with high-altitude stations, or even within a month at the same station, etc., except perhaps at the driest stations, such as the Gobi Desert.

    In short, the calculation of daily “averages” (regardless of calculation method), monthly “averages,” “anomalies,” and temperature indices, as well as correlations between stations, adjustments of all kinds, trend calculations, statistical properties, etc. all ignore the physics of combining representative air parcels. Specifically, condensation events are assumed away. Relative humidities and dew points are ignored.

    In reality, condensation events are very powerful and power winds as well as storms such as hurricanes and tornadoes. A hurricane needs a constant supply of warm, humid air to feed the condensation engine at the heart of the storm, where continuous condensation takes place creating a vacuum that lowers the pressure within the eye and powers the winds that are trying to fill the vacuum. When the humidity of the air feeding the storm decreases, condensation decreases and the storm weakens.

    Mathematical calculation of the effect of condensation events to obtain a more accurate resulting temperature index or “average” is not trivial. I would submit that part of the “bias” being found in these calculations is the result of ignoring the physics of combining representative air parcels. Changes in weather and in weather patterns would be expected to cause changes in bias that are impossible to predict by using the simplistic mathematical formulas discussed here and in common use in climate analysis and should likewise impact calculated trends.

    Just some food for thought.

  43. Phil says:
    September 13, 2012 at 9:33 pm
    “All this temperature math ignores the effect of condensation events. “Averaging” two temperatures mathematically represents combining two or more representative air parcels. The mathematical “average” is purported to accurately calculate the resulting properties of the representative air parcels, after they have been combined under ideal conditions.”

    Phil–I lost you at the second sentence. Given a monitor measuring temperature at a fixed location, it corresponds to Heraclitus’ river that you can’t step into twice. There is a constant change of air “parcels” passing the device. So the parcels aren’t “combined under ideal conditions”, they aren’t even combined at all, they are just passing through. So we are measuring the temperature of the river of air as it flows past. But the USCRN is doing something else, at least for the last 2-3 years, and that is measuring the relative humidity as well. Do you have some thoughts as to how one could combine these two measurements to get closer to the physics of the situation?

  44. Dinostratus says:
    September 13, 2012 at 6:36 pm
    “Is there a way to add station age to your analysis? I suspect it would be interesting to see how the variation trends over the life of stations.”

    I did not add station age to the multiple regression. However, I did look at the variation of the error over time at each station, finding to my surprise that it was unusually stable.

  45. jrwakefield says:
    September 13, 2012 at 8:13 am
    Though I didnt go into the same depth of detail, I did show a while ago that TMean is a meaningless number because the hourly average temperature of a day is less than TMean.

    Maybe for your stations that was true, but for about half of the 125 stations in the USCRN the true daily mean was higher than the values estimated by any of the four measurement methods.

  46. polistra says:
    September 13, 2012 at 5:35 am
    “Why not let Nature do the averaging? Put your thermometer a few feet underground and you’ll get a nice smooth temperature that rises and falls with the seasons.”

    Actually the USCRN is in fact measuring soil temperature hourly at all stations. I haven’t looked at the data but perhaps it would be good to analyze.

  47. Tmin and Tmax occurred for each hour of the day. Although generally giving the usual peak times of about 5-6 AM for Tmin and 2-3 PM for Tmax, there was a secondary peak around midnight due to weather systems. This Appendix was left off the post by accident, but I can make it available if there is any interest.

    Its a pity that data isn’t available to the minute/second. If I am right, there will be a trend to earlier Tmins, but I’m not sure hourly buckets will pick it up.

    Anyway I have sent you an email and I’ll take a look at the data.

  48. “””””…..ferdberple says:

    September 13, 2012 at 8:47 am

    The problem with ABSMINMAX is that you do not know at what time the min and max occurs, so you cannot accurately reconstruct the signal, even though you are sampling at 2x the frequency.

    ABS6121824, ABS71421, ABS7142121 on the other hand are sampling on fixed intervals, similar to what is done with digital music. This allows you to reconstruct the temperature curve – fill in the missing time periods – as is done with digital music – to reconstruct the temperature (music) signal accurately.

    Once you have reconstructed the analog signal from the digital samples, you can then calculate accurately the average power (temperature) in the signal. It is interesting that in audio processing, RMS is considered a more accurate measure of power than the arithmetic mean, to account for differences in signal shape……”””””

    What science fiction have you been reading Ferd.

    POWER is a RATE of doing work or delivering/using energy. It is an INSTANTANEOUS quantity, so there ain’t no such thing as RMS power; no matter what the audio geeks say.

    That is Dr Trenberth’s hangup with the earth global “energy” budget. Energy is counted in Joules.
    watt’s per square metre, is an areal rate of power flow; also instantaneous; and the TSI is 1362 Watts per square metre, it is not 342 W/m^2, which is quite incapable of bringing earth’s Temperature up to 288 K, whereas we now know that that 1362 W/m^2 can bring some earth parts up to around 330 K. Takes at least 390 to get up to 288K.

    But back to the Nyquist sampling theorem (dunno what Claude Shannon had to do with it), even 3 or four times daily doesn’t take account of higher frequency signals such as when clouds come and go. An undersampling of a factor of two folds the frequency spectrum back to zero, creating unremovable aliassing noise a zero frequency, which of cours is the much sought after average. Since spectrum folding aliassing noise is now an inband noise it is inherently impossible to remove without removing real signal; and that too will corrupt the average.

    Only Mother Gaia knows what the real Temperature is, because she has a thermometer in every atom or molecule on the planet and reads them continually, so she always gets the right Temperature. Sadly, she is NOT ever going to tell us.

    Anyway, the spatial undersampling, joke, makes the time undersampling a non issue. Was that four or five thermometers they have up in Alaska ?

  49. Philip Bradley says:
    September 14, 2012 at 12:38 am
    “Its a pity that data isn’t available to the minute/second. If I am right, there will be a trend to earlier Tmins, but I’m not sure hourly buckets will pick it up. Anyway I have sent you an email and I’ll take a look at the data.”

    Philip: (and other readers)
    I have made the hourly, daily, and monthly data available on my public file at Dropbox:

    https://dl.dropbox.com/u/75831381/CRN%20DAILY.csv

    https://dl.dropbox.com/u/75831381/CRN%20MONTHLY.csv

    https://dl.dropbox.com/u/75831381/CRNH0202%20hourly%20Sept%2012%202012.csv

    These data contain everything available from NOAA except for the soil measurements. Also a few variables added by me such as the name of the station, and the errors due to Tminmax.
    To get the variable definitions, you can read the README.txt files for each at ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/

  50. Sampling theory does not require that samples be taken at specific times or other intervals (such as space), the crux of the matter is that there must be at least one sample taken in each half cycle of the highest frequency present in the band limited signal. So random sampling, where the spacing between samples is random, is permissible; but only so long as the spacing is never longer than a half cycle of the highest signal frequency. So random or non uniform sampling is less efficient than regularly spaced samples, because it requires that more samples be taken; not less, because the spacing can’t be longer than 1/2B, but it can be shorter; which means more samples.
    Random sampling has been used in sampling Oscilloscopes, as it gives certain advantages.

    But let’s not get too hung up on the temporal sampling issues; just think how absolutely crummy is the spatal sampling on earth. You have all those thermometers spread over the lower 55 states; and just 4 or 5 for the whole of Alaska; total insanity. I believe that at one time around the turn of the century (19 to 20) there were just 12 or so sampling sites in the whole of the Arctic; in this case that being everything North of +60 Latitude. That number increased over time to somewhere around 80, and then it subsequently dropped quite a bit; I’m guessing it was related to the collapse of the Soviet Union, and the subsequent (somewhat disastrous) collapse of their science.

    So Hansen thinks that once every 1200 km is perfectly acceptible sampling. Just watching the 6PM SF Bay arfea News/weather report would suggest that 5-10 km sample spacing wouldn’t be too close. In any case it is a joke to believe that meaningful data is gathered by such sloppiness.

Comments are closed.