Guest essay by Clyde Spencer 2017
I recently had a guest editorial published here on the topic of data error and precision. If you missed it, I suggest that you read it before continuing with this article. This will then make more sense. And, I won’t feel the need to go back over the fundamentals. What follows is, in part, prompted by some of the comments to the original article. This is a discussion of how the reported, average global temperatures should be interpreted.
Averages can serve several purposes. A common one is to increase accuracy and precision of the determination of some fixed property, such as a physical dimension. This is accomplished by confining all the random error to the process of measurement. Under appropriate circumstances, such as determining the diameter of a ball bearing with a micrometer, multiple readings can provide a more precise average diameter. This is because the random errors in reading the micrometer will cancel out and the precision is provided by the Standard Error of the Mean, which is inversely related to the square root of the number of measurements.
Another common purpose is to characterize a variable property by making multiple representative measurements and describing the frequency distribution of the measurements. This can be done graphically, or summarized with statistical parameters such as the mean, standard deviation (SD) and skewness/kurtosis (if appropriate). However, since the measured property is varying, it becomes problematic to separate measurement error from the property variability. Thus, we learn more about how the property varies than we do about the central value of the distribution. Yet, climatologists focus on the arithmetic means, and the anomalies calculated from them. Averages can obscure information, both unintentionally and intentionally.
With the above in mind, we need to examine whether taking numerous measurements of the temperatures of land, sea, and air can provide us with a precise value for the ‘temperature’ of Earth.
By convention, climate is usually defined as the average of meteorological parameters over a period of 30 years. How can we use the available temperature data, intended for weather monitoring and forecasting, to characterize climate? The approach currently used is to calculate the arithmetic mean for an arbitrary base period, and subtract modern temperatures (either individual temperatures or averages) to determine what is called an anomaly. However, just what does it mean to collect all the temperature data and calculate the mean?
If Earth were in thermodynamic equilibrium, it would have one temperature, which would be relatively easy to measure. Earth does not have one temperature, it has an infinitude of temperatures. In fact, temperatures vary continuously laterally, vertically, and with time, giving rise to an indefinite number of temperatures. The apparent record low temperature is -135.8° F and the highest recorded temperature is 159.3° F, for a maximum range of 295.1° F, giving an estimated standard deviation of about 74° F, using the Empirical Rule. Changes of less than a year are both random and seasonal; longer time series contain periodic changes. The question is whether sampling a few thousand locations, over a period of years, can provide us with an average that has defensible value in demonstrating a small rate of change?
One of the problems is that water temperatures tend to be stratified. Water surface-temperatures tend to be warmest, with temperatures declining with depth. Often, there is an abrupt change in temperature called a thermocline; alternatively, upwelling can bring cold water to the surface, particularly along coasts. Therefore, the location and depth of sampling is critical in determining so-called Sea Surface Temperatures (SST). Something else to consider is that because water has a specific heat that is 2 to 5 times higher than common solids, and more than 4 times that of air, it warms more slowly than land! It isn’t appropriate to average SSTs with air temperatures over land. It is a classic case of comparing apples and oranges! If one wants to detect trends in changing temperatures, they may be more obvious over land than in the oceans, although water-temperature changes will tend to suppress random fluctuations. It is probably best to plot SSTs with a scale 4-times that of land air-temperatures, and graphically display both at the same time for comparison.
Land air-temperatures have a similar problem in that there are often temperature inversions. What that means is that it is colder near the surface than it is higher up. This is the opposite of what the lapse rate predicts, namely that temperatures decline with elevation in the troposphere. But, that provides us with another problem. Temperatures are recorded over an elevation range from below sea level (Death Valley) to over 10,000 feet in elevation. Unlike the Universal Gas Law that defines the properties of a gas at a standard temperature and pressure, all the weather temperature-measurements are averaged together to define an arithmetic mean global-temperature without concern for standard pressures. This is important because the Universal Gas Law predicts that the temperature of a parcel of air will decrease with decreasing pressure, and this gives rise to the lapse rate.
Historical records (pre-20th Century) are particularly problematic because temperatures typically were only read to the nearest 1 degree Fahrenheit, by volunteers who were not professional meteorologists. In addition, the state of the technology of temperature measurements was not mature, particularly with respect to standardizing thermometers.
Climatologists have attempted to circumscribe the above confounding factors by rationalizing that accuracy, and therefore precision, can be improved by averaging. Basically, they take 30-year averages of annual averages of monthly averages, thus smoothing the data and losing information! Indeed, the Law of Large Numbers predicts that the accuracy of sampled measurements can be improved (If systematic biases are not present!) particularly for probabilistic events such as the outcomes of coin tosses. However, if the annual averages are derived from the monthly averages, instead of the daily averages, then the months should be weighted according to the number of days in the month. It isn’t clear that this is being done. However, even daily averages will suppress (smooth) extreme high and low temperatures and reduce the apparent standard deviation.
However, even temporarily ignoring the problems that I have raised above, there is a fundamental problem with attempting to increase the precision and accuracy of air-temperatures over the surface of the Earth. Unlike the ball bearing with essentially a single diameter (with minimal eccentricity), the temperature at any point on the surface of the Earth is changing all the time. There is no unique temperature for any place or any time. And, one only has one opportunity to measure that ephemeral temperature. One cannot make multiple measurements to increase the precision of a particular surface air-temperature measurement!
Caves are well known for having stable temperatures. Many vary by less than ±0.5° F annually. It is generally assumed that the cave temperatures reflect an average annual surface temperature for their locality. While the situation is a little more complex than that, it is a good first-order approximation. [Incidentally, there is an interesting article by Perrier et al. (2005) about some very early work done in France on underground temperatures.] For the sake of illustration, let’s assume that a researcher has a need to determine the temperature of a cave during a particular season, say at a time that bats are hibernating. The researcher wants to determine it with greater precision than the thermometer they have carried through the passages is capable of. Let’s stipulate that the thermometer has been calibrated in the lab and is capable of being read to the nearest 0.1° F. This situation is a reasonably good candidate for using multiple readings to increase precision because over a period of two or three months there should be little change in the temperature and there is high likelihood that the readings will have a normal distribution. The known annual range suggests that the standard deviation should be less than (50.5 – 49.5)/4, or about 0.3° F. Therefore, the expected standard deviation for the annual temperature change is of the same order of magnitude as the resolution of the thermometer. Let’s further assume that, every day when the site is visited, the first and last thing the researcher does is to take the temperature. After accumulating 100 temperature readings, the mean, standard deviation, and standard error of the mean are calculated. Assuming no outlier readings and that all the readings are within a few tenths of the mean, the researcher is confident that they are justified in reporting the mean with one more significant figure than the thermometer was capable of capturing directly.
Now, let’s contrast this with what the common practice in climatology is. Climatologists use meteorological temperatures that may have been read by individuals with less invested in diligent observations than the bat researcher probably has. Or temperatures, such as those from the automated ASOS, may be rounded to the nearest degree Fahrenheit, and conflated with temperatures actually read to the nearest 0.1° F. (At the very least, the samples should be weighted inversely to their precision.) Additionally, because the data suffer averaging (smoothing) before the 30-year baseline-average is calculated, the data distribution appears less skewed and more normal, and the calculated standard deviation is smaller than what would be obtained if the raw data were used. It isn’t just the mean temperature that changes annually. The standard deviation and skewness (kurtosis) is certainly changing also, but this isn’t being reported. Are the changes in SD and skewness random, or is there a trend? If there is a trend, what is causing it? What, is anything, does it mean? There is information that isn’t being examined and reported that might provide insight on the system dynamics.
Immediately, the known high and low temperature records (see above) suggest that the annual collection of data might have a range as high as 300° F, although something closer to 250° F is more likely. Using the Empirical Rule to estimate the standard deviation, a value of over 70° F would be predicted for the SD. Being more conservative, and appealing to Tschbycheff’s Theorem and dividing by 8 instead of 4, still gives an estimate of over 31° F. Additionally, there is good reason to believe that the frequency distribution of the temperatures is skewed, with a long tail on the cold side. The core of this argument is that it is obvious that temperatures colder than 50° F below zero are more common than temperatures over 150° F, while the reported mean is near 50° F for global land temperatures.
The following shows what I think the typical annual raw data should look like plotted as a frequency distribution, taking into account the known range, the estimated SD, and the published mean:
The thick, red line represents a typical year’s temperatures, and the little stubby green line (approximately to scale) represents the cave temperature scenario above. I’m confident that the cave-temperature mean is precise to about 1/100th of a degree Fahrenheit, but despite the huge number of measurements of Earth temperatures, the shape and spread of the global data does not instill the same confidence in me for global temperatures! It is obvious that the distribution has a much larger standard deviation than the cave-temperature scenario and the rationalization of dividing by the square-root of the number of samples cannot be justified to remove random-error when the parameter being measured is never twice the same value. The multiple averaging steps in handling the data reduces extreme values and the standard deviation. The question is, “Is the claimed precision an artifact of smoothing, or does the process of smoothing provide a more precise value?” I don’t know the answer to that. However, it is certainly something that those who maintain the temperature databases should be prepared to answer and justify!
The theory of Anthropogenic Global Warming predicts that the strongest effects should be observed during nighttime and wintertime lows. That is, the cold-tail on the frequency distribution curve should become truncated and the distribution should become more symmetrical. That will increase the calculated global mean temperature even if the high or mid-range temperatures don’t change. The forecasts of future catastrophic heat waves are based on the unstated assumption that as the global mean increases, the entire frequency distribution curve will shift to higher temperatures. That is not a warranted assumption because the difference between the diurnal highs and lows has not been constant during the 20th Century. They are not moving in step, probably because there are different factors influencing the high and low temperatures. In fact, some of the lowest low-temperatures have been recorded in modern times! In any event, a global mean temperature is not a good metric for what is happening to global temperatures. We should be looking at the trends in diurnal highs and lows for all the climatic zones defined by physical geographers. We should also be analyzing the shape of the frequency distribution curves for different time periods. Trying to characterize the behavior of Earth’s ‘climate’ with a single number is not good science, whether one believes in science or not!