Guest essay by Clyde Spencer 2017
Introduction
I recently had a guest editorial published here on the topic of data error and precision. If you missed it, I suggest that you read it before continuing with this article. This will then make more sense. And, I won’t feel the need to go back over the fundamentals. What follows is, in part, prompted by some of the comments to the original article. This is a discussion of how the reported, average global temperatures should be interpreted.
Averages
Averages can serve several purposes. A common one is to increase accuracy and precision of the determination of some fixed property, such as a physical dimension. This is accomplished by confining all the random error to the process of measurement. Under appropriate circumstances, such as determining the diameter of a ball bearing with a micrometer, multiple readings can provide a more precise average diameter. This is because the random errors in reading the micrometer will cancel out and the precision is provided by the Standard Error of the Mean, which is inversely related to the square root of the number of measurements.
Another common purpose is to characterize a variable property by making multiple representative measurements and describing the frequency distribution of the measurements. This can be done graphically, or summarized with statistical parameters such as the mean, standard deviation (SD) and skewness/kurtosis (if appropriate). However, since the measured property is varying, it becomes problematic to separate measurement error from the property variability. Thus, we learn more about how the property varies than we do about the central value of the distribution. Yet, climatologists focus on the arithmetic means, and the anomalies calculated from them. Averages can obscure information, both unintentionally and intentionally.
With the above in mind, we need to examine whether taking numerous measurements of the temperatures of land, sea, and air can provide us with a precise value for the ‘temperature’ of Earth.
Earth’s ‘Temperature’
By convention, climate is usually defined as the average of meteorological parameters over a period of 30 years. How can we use the available temperature data, intended for weather monitoring and forecasting, to characterize climate? The approach currently used is to calculate the arithmetic mean for an arbitrary base period, and subtract modern temperatures (either individual temperatures or averages) to determine what is called an anomaly. However, just what does it mean to collect all the temperature data and calculate the mean?
If Earth were in thermodynamic equilibrium, it would have one temperature, which would be relatively easy to measure. Earth does not have one temperature, it has an infinitude of temperatures. In fact, temperatures vary continuously laterally, vertically, and with time, giving rise to an indefinite number of temperatures. The apparent record low temperature is -135.8° F and the highest recorded temperature is 159.3° F, for a maximum range of 295.1° F, giving an estimated standard deviation of about 74° F, using the Empirical Rule. Changes of less than a year are both random and seasonal; longer time series contain periodic changes. The question is whether sampling a few thousand locations, over a period of years, can provide us with an average that has defensible value in demonstrating a small rate of change?
One of the problems is that water temperatures tend to be stratified. Water surface-temperatures tend to be warmest, with temperatures declining with depth. Often, there is an abrupt change in temperature called a thermocline; alternatively, upwelling can bring cold water to the surface, particularly along coasts. Therefore, the location and depth of sampling is critical in determining so-called Sea Surface Temperatures (SST). Something else to consider is that because water has a specific heat that is 2 to 5 times higher than common solids, and more than 4 times that of air, it warms more slowly than land! It isn’t appropriate to average SSTs with air temperatures over land. It is a classic case of comparing apples and oranges! If one wants to detect trends in changing temperatures, they may be more obvious over land than in the oceans, although water-temperature changes will tend to suppress random fluctuations. It is probably best to plot SSTs with a scale 4-times that of land air-temperatures, and graphically display both at the same time for comparison.
Land air-temperatures have a similar problem in that there are often temperature inversions. What that means is that it is colder near the surface than it is higher up. This is the opposite of what the lapse rate predicts, namely that temperatures decline with elevation in the troposphere. But, that provides us with another problem. Temperatures are recorded over an elevation range from below sea level (Death Valley) to over 10,000 feet in elevation. Unlike the Universal Gas Law that defines the properties of a gas at a standard temperature and pressure, all the weather temperature-measurements are averaged together to define an arithmetic mean global-temperature without concern for standard pressures. This is important because the Universal Gas Law predicts that the temperature of a parcel of air will decrease with decreasing pressure, and this gives rise to the lapse rate.
Historical records (pre-20th Century) are particularly problematic because temperatures typically were only read to the nearest 1 degree Fahrenheit, by volunteers who were not professional meteorologists. In addition, the state of the technology of temperature measurements was not mature, particularly with respect to standardizing thermometers.
Climatologists have attempted to circumscribe the above confounding factors by rationalizing that accuracy, and therefore precision, can be improved by averaging. Basically, they take 30-year averages of annual averages of monthly averages, thus smoothing the data and losing information! Indeed, the Law of Large Numbers predicts that the accuracy of sampled measurements can be improved (If systematic biases are not present!) particularly for probabilistic events such as the outcomes of coin tosses. However, if the annual averages are derived from the monthly averages, instead of the daily averages, then the months should be weighted according to the number of days in the month. It isn’t clear that this is being done. However, even daily averages will suppress (smooth) extreme high and low temperatures and reduce the apparent standard deviation.
However, even temporarily ignoring the problems that I have raised above, there is a fundamental problem with attempting to increase the precision and accuracy of air-temperatures over the surface of the Earth. Unlike the ball bearing with essentially a single diameter (with minimal eccentricity), the temperature at any point on the surface of the Earth is changing all the time. There is no unique temperature for any place or any time. And, one only has one opportunity to measure that ephemeral temperature. One cannot make multiple measurements to increase the precision of a particular surface air-temperature measurement!
Temperature Measurements
Caves are well known for having stable temperatures. Many vary by less than ±0.5° F annually. It is generally assumed that the cave temperatures reflect an average annual surface temperature for their locality. While the situation is a little more complex than that, it is a good first-order approximation. [Incidentally, there is an interesting article by Perrier et al. (2005) about some very early work done in France on underground temperatures.] For the sake of illustration, let’s assume that a researcher has a need to determine the temperature of a cave during a particular season, say at a time that bats are hibernating. The researcher wants to determine it with greater precision than the thermometer they have carried through the passages is capable of. Let’s stipulate that the thermometer has been calibrated in the lab and is capable of being read to the nearest 0.1° F. This situation is a reasonably good candidate for using multiple readings to increase precision because over a period of two or three months there should be little change in the temperature and there is high likelihood that the readings will have a normal distribution. The known annual range suggests that the standard deviation should be less than (50.5 – 49.5)/4, or about 0.3° F. Therefore, the expected standard deviation for the annual temperature change is of the same order of magnitude as the resolution of the thermometer. Let’s further assume that, every day when the site is visited, the first and last thing the researcher does is to take the temperature. After accumulating 100 temperature readings, the mean, standard deviation, and standard error of the mean are calculated. Assuming no outlier readings and that all the readings are within a few tenths of the mean, the researcher is confident that they are justified in reporting the mean with one more significant figure than the thermometer was capable of capturing directly.
Now, let’s contrast this with what the common practice in climatology is. Climatologists use meteorological temperatures that may have been read by individuals with less invested in diligent observations than the bat researcher probably has. Or temperatures, such as those from the automated ASOS, may be rounded to the nearest degree Fahrenheit, and conflated with temperatures actually read to the nearest 0.1° F. (At the very least, the samples should be weighted inversely to their precision.) Additionally, because the data suffer averaging (smoothing) before the 30-year baseline-average is calculated, the data distribution appears less skewed and more normal, and the calculated standard deviation is smaller than what would be obtained if the raw data were used. It isn’t just the mean temperature that changes annually. The standard deviation and skewness (kurtosis) is certainly changing also, but this isn’t being reported. Are the changes in SD and skewness random, or is there a trend? If there is a trend, what is causing it? What, is anything, does it mean? There is information that isn’t being examined and reported that might provide insight on the system dynamics.
Immediately, the known high and low temperature records (see above) suggest that the annual collection of data might have a range as high as 300° F, although something closer to 250° F is more likely. Using the Empirical Rule to estimate the standard deviation, a value of over 70° F would be predicted for the SD. Being more conservative, and appealing to Tschbycheff’s Theorem and dividing by 8 instead of 4, still gives an estimate of over 31° F. Additionally, there is good reason to believe that the frequency distribution of the temperatures is skewed, with a long tail on the cold side. The core of this argument is that it is obvious that temperatures colder than 50° F below zero are more common than temperatures over 150° F, while the reported mean is near 50° F for global land temperatures.
The following shows what I think the typical annual raw data should look like plotted as a frequency distribution, taking into account the known range, the estimated SD, and the published mean:

The thick, red line represents a typical year’s temperatures, and the little stubby green line (approximately to scale) represents the cave temperature scenario above. I’m confident that the cave-temperature mean is precise to about 1/100th of a degree Fahrenheit, but despite the huge number of measurements of Earth temperatures, the shape and spread of the global data does not instill the same confidence in me for global temperatures! It is obvious that the distribution has a much larger standard deviation than the cave-temperature scenario and the rationalization of dividing by the square-root of the number of samples cannot be justified to remove random-error when the parameter being measured is never twice the same value. The multiple averaging steps in handling the data reduces extreme values and the standard deviation. The question is, “Is the claimed precision an artifact of smoothing, or does the process of smoothing provide a more precise value?” I don’t know the answer to that. However, it is certainly something that those who maintain the temperature databases should be prepared to answer and justify!
Summary
The theory of Anthropogenic Global Warming predicts that the strongest effects should be observed during nighttime and wintertime lows. That is, the cold-tail on the frequency distribution curve should become truncated and the distribution should become more symmetrical. That will increase the calculated global mean temperature even if the high or mid-range temperatures don’t change. The forecasts of future catastrophic heat waves are based on the unstated assumption that as the global mean increases, the entire frequency distribution curve will shift to higher temperatures. That is not a warranted assumption because the difference between the diurnal highs and lows has not been constant during the 20th Century. They are not moving in step, probably because there are different factors influencing the high and low temperatures. In fact, some of the lowest low-temperatures have been recorded in modern times! In any event, a global mean temperature is not a good metric for what is happening to global temperatures. We should be looking at the trends in diurnal highs and lows for all the climatic zones defined by physical geographers. We should also be analyzing the shape of the frequency distribution curves for different time periods. Trying to characterize the behavior of Earth’s ‘climate’ with a single number is not good science, whether one believes in science or not!
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
Can anyone here provide a physical, not a mathematical, rationale for the notion that a whole bunch of readings from an instrument which is only graduated in units x, can yield a measurement that is ten times smaller than x?
It is easy to understand how making numerous measurements can lead one to have confidence in measurements that approach the resolution of the device.
How exactly, physically, does that device, or what you do with it, give you information that the device itself is incapable of capturing?
Can a bunch of grainy photographs be processed in such a way to give you a single photo with ten times the resolution of the pixels in each of the original photo?
Can a balance which is graduated in tenths of a gram allow you, by some means of repetition, to confidently declare the weight of a sample to within 100th of a gram?
It seems to me that this idea can be tested by means of the scale example.
Have some people measure a sample of something with a scale that has a certain resolution.
Have them do this a large number of times with a large number of samples, and perhaps with a large number of scales.
Have them report their readings and the results of calculations using the statistical means being discussed here.
Then check the actual sample weights with a far more sensitive instrument and see if this methods works.
Such an experiment can be done under closely controlled conditions…people in a sealed room, etc.
Assuming all of that is true, it sounds like the error in this method is a large fraction of the difference between the tallest and the shortest person in your sample.
IOW, a large fraction of the height anomaly.
But I would not assume any of that is necessarily so without seeing some data from someone who actually did it.
And, if the range of sizes of adult males were more like the range in temps all over the Earth and throughout the year, how would the numbers look then?
But Are you sure about that?
The vast majority of adult men are between 5’6″ and 6’6″, and hence the vast majority of readings will be 6′.
Perhaps all of them in some samples.
How many men have you ever met taller than 6’6″ or shorter than 5’6″?
For me, in my actual personal life, i think the answer may be zero, or maybe one or two.
I think you are unlikely to have enough men in your sample who measure in at the 5′ line to bring down the average much.
http://www.fathersmanifesto.net/standarddeviationheight.htm
Menicholas,
I think the simplest question to your answer is that before the advent of digital, laser theodolites, it was common procedure for surveyors to “accumulate” multiple readings on a transit turning an angle. The proof is in the pudding, as the saying goes.
Before I retired I was a remote sensing scientist. I can assure you that the resolution of an image can be improved with the use of multiple images, through several techniques.
However, key to all of these is the requirement that the object being measured not change!
“the object being measured not change!”
That is my understanding as well Clyde.
You have to be measuring the same thing in the same way.
Menicholas,
The average height, without a stated precision or uncertainty, is similar to what climatologists do routinely. The key to getting both a good estimate of the mean, and to be able to report it within a stated range with high probability, varies with both the variance of the population and the number of samples taken.
The difference with the temperature case is that the population of American males is essentially fixed during the interval that the sampling takes place.
It is one thing to measure a temperature with even a very precise and accurate thermometer. It is quite another to attribute any change over time to the correct causal factor.
The IPCC likes to say they have correctly adjusted the temperature measurements to account for non-CO2 influences, or biases. Yet there are many, many such factors that certainly influence the temperature but (as far as I can determine) are not properly considered. Below is a list of ten such non-CO2 factors that are known to cause an upward temperature trend.
1. Increased population density in the local area, cities (more buildings in a small area)
2. Increased energy use per capita (each building uses more energy, and people use more)
3. Increased local humidity due to activities such as lawn watering, industry cooling towers
4. Prolonged drought (the opposite, regular rain, reduces temperatures in arid regions)
5. Reduced artificial aerosols via pollution laws being enforced – since 1973 in the US
6. Change in character of the measurement site, from rural to more urban with pavement and other artificial heating
7. Wind shadows from dense buildings prevent cooling winds from reaching thermometer
8. El Niño short-term heating effect in many areas (e.g. the US South and Southeast)
9. Increased sunspot activity and number that allows fewer cloud-forming cosmic rays to reach Earth
10. Fewer large volcanoes erupting with natural aerosols flung high into the atmosphere
To have proper science to measure the impact of changes in CO2 on surface temperature, the data must exclude any sites that are affected by the above factors.
Instead, the IPCC and scientists that prepare the input to IPCC reports adjust the data, even though the data is known to be biased, not just by those ten factors, but probably others as well.
This is not science. It is false-alarmism.
Yep, that is the point I’ve been making. There is data that largely isolates the impact of CO2. That is the data that counts.
You do not even have to do such enumerations to spot the errors in judgement and the flaws in logic and thus the ridiculousness in the confidence of the conclusion that CO2 must be the cause of recent warming.
All you need to know is that is was warming and cooling on scales large and small prior to the advent of the industrial age and any additional CO2 in the air.
And that many of these warming and cooling events were both more rapid and of a higher magnitude than any recent warming.
Hence the hockey stick, and the “adjustments”, and the general rewriting of the relevant history.
The 30 year average temperature at my location is 19C. I measure the temperature today and get 19C. Is this any more likely correct than if I got a reading of 18C? How about -20C?
The problem with anomalies is that they statistically tell us that 19C is more likely correct than -20C, because the variance will be lower over multiple samples, which gives us a false confidence in the expected error. However, there is no reason to expect out reading of -20C is any less accurate than 19C.
The problem is similar to the gambler fallacy. We expect the highs and lows to average out, so a reading of 19C appears more likely correct than a reading of -20C. But in point of fact this is incorrect, because today’s temperature is for all practical purposes independent of the long term average.
ferdperple,
One must be careful in dealing with probabilistic events such as coin tosses, die tosses, and hands of cards. A very large number of trials are necessary for these probabilistic events to approach their theoretical distribution.
“The approach currently used is to calculate the arithmetic mean for an arbitrary base period, and subtract modern temperatures (either individual temperatures or averages) to determine what is called an anomaly. However, just what does it mean to collect all the temperature data and calculate the mean?:”
Wrong. That’s not what we do.
The other mistake you make is that averages in spatial stats are not what you think they are. And the precision is not what you think it is.
In spatial stats the area average is the PREDICTION of the unmeasured locations.
When we say the average is 9.8656c that MEANS this.
We predict that if you ta key a perfect thermometer and randomly place it at UN MEASURED locations you will find
That 9.8656 is the prediction that minimizes your error.
Measuring temperature is not doing repeated measurements of the same thing.
A simple example. You have a back yard pool.
You measure with a thermometer that records whole numbers. The shallow end is 70F. The deep end is 69F.
Estimate the temperature I will record if I take a perfect thermometer and jump into any random location of the pool?
THAT is the problem spatial stats solves.
What will we predict if you measure the temperature in the exact same location in the deep end? 69. Jump in the pool in a place we haven’t measured? A random location… we predict 69.5.. that will be wrong. .. but will be less wrong than other predictions.
You will be judged on minimizing the error of prediction. That’s the goal .. minimize error. ..
So you use the measured data to predict the unmeasured.
That’s spatial stats. You might average 69 and 70 and say…
I predict that if you jump into a random spot the temperature will be 69.5… The precision of the prediction should never be confused with the precision of the data. In other words 69.5 will minimize the error of prediction. It’s not about the precision of the measurements. .
When you do spatial stats you must nevER forget that you are not measuring the same thing multiple times. You are not averaging the known. You are predicting the un measured locations based on the measured. And yes you actually test your prediction as an inherent part of the process.
Here is a simple test you all can do.
Pretend CRN stations don’t exist.hide that data
Then take all the bad stations in the usa. Round the measurements to whole degrees. Then using those stations
PREDICT the values for CRN…
When you do that you will understand what people are doing in spatial stats. And yes your prediction will have more “precision” than the measurements. ..because it’s a prediction. It’s predicting what you will see when you look at the CRN data you hid. Go do that. Learn something.
So we dont know the global average to several decimal points. We estimate or predict unmeasured locations..those predictions will always show more bits…The goal is to reduce the error in the prediction.
Steven,
I applaud your attempt to estimate temperatures for locations that you don’t have data for. However, I see a number of problems. The Earth isn’t a smooth sphere and I don’t read anything about how you take into account changes in temperature for a varying elevation and unknown lapse rates. You also miss the reality of microclimates that are created by more than just the local topography, such as local water bodies. Lastly, because you are dealing with time series, you miss the abrupt changes that occur at the leading edge of a moving cold front. I have difficulty in putting much reliance in theoretical constructs when they aren’t firmly grounded in empirical data.
Now that I apparently have your attention, how does BEST justify listing anomalies in the late 19th Century with the same number of significant figures as modern temperature anomalies?
Steven,
You said, “Learn something.” Your arrogance contributes nothing to the discussion.
You also said, “And yes your prediction will have more “precision” than the measurements. ..because it’s a prediction. It’s predicting what you will see when you look at the CRN data you hid.” You want me to believe that a distance-weighted interpolation is going to be more trustworthy than the original data? That is why we see the world differently. Just because you can calculate a number with a large number of digits does not mean that the numbers are useful or even realistic.
Clyde,
“You want me to believe that a distance-weighted interpolation is going to be more trustworthy than the original data?”
Mosh’s advice may have been abrupt, but not without merit. You are persistently ignoring two major facts:
1. They average anomalies, not temperatures.
2. They are calculating a whole earth average, not a station average.
2 is relevant here. You can’t get a whole earth average without interpolating. It’s no use saying that the interpolates are less accurate. They are the only knowledge outside the samples that you have.
This is not just climate; it is universal in science and engineering. Building a skyscraper – you need to test the soil and rock. How? By testing samples. You can’t test it all. The strength of the base is calculated based on the strength of those few samples. The rest is inferred, probably by FEM, which includes fancy interpolation.
Nick,
I think that you and Mosh miss the point of my last two articles. Even if a more complex or sophisticated algorithm is being used to determine anomalies than a simple average and subtraction process, the result of the calculations are going to be limited by the (unknown) accuracy and precision of the raw data used as input to those algorithms. Thus, with raw measurements that are only precise to the nearest whole or one-tenth degree, there is no justification for reporting either current global temperatures or anomalies to three or even two significant figures to the right of the decimal point. Any claim to the contrary is a claim that a way has been found to make a silk purse out of a sow’s ear.
Nick,
With respect to point 1, the definition of an anomaly is the difference between some baseline temperature, and a temperature with which it is being compared. To come up with anomalies, it will be necessary to subtract the baseline from modern daily, monthly, and/or annual averages. That baseline will have to be either arbitrary, or more commonly a 30-year average. Thus, the claim that averages are not computed is false.
To address the problem of stations being at different elevations, it will be necessary to compute a baseline average for every station before station anomalies can be computed. If that is not being done, then things are even worse than I thought because lapse rates are not available for all stations for every reading.
Yes, uniform spatial coverage is necessary to compute a global anomaly average. However, as I have remarked before, problems with moving cold fronts, rain cooling the ground, clouds that are not uniformly distributed, topography, and microclimates introduce interpolate error that is larger than the error at individual stations, and the precision cannot be greater than what the stations provide. Again, my point is that claims for greater precision and confidence are being made than can be justified. I’m just asking for complete transparency in what is known and what is assumed.
But Nick, doesn’t a “whole earth average” equal a “station average”? You only need a certain random sampling to get your average, any more than that being redundant. That’s why pollsters can get a fairly accurate picture of the nation as a whole, assuming methodology is sound, with just a few hundred samples. (rasmussen nailed it with hillary up by two) Is that what you’re talking about here or am i missing something?
And Clyde, does the fact that 300 stations give the same result as 3,000 stations render your concerns moot (or no)?
afonzarelli,
I don’t think that Nick is making the claim that 300 stations are as good as 30,000. I believe he is claiming that they might be “good enough for government work.” However, I still have concerns about whether or not propagation of error is being given the rigorous attention that it deserves for such a convoluted attempt at trying to use data for a purpose that it was not intended.
Clyde (and fonz),
People have asked what is the point of averaging. Mindert identified it above. It is usually to estimate a population mean. And the population here is that of anomalies at points on Earth. The stations are a sample – a means to an end. Ideally the estimate will be independent of the stations chosen. The extent to which that is not true is the location uncertainty.
To make that estimate, you do a spatial integration. That can be done in various ways – the primitive one is to put them in cells and area-average that. But the key thing is that you must average members of the population (anomalies) and weight them to be representative of points on Earth (by area).
The prior calculation of anomalies is done by station – subtracting the 30-year mean of that station. It involves no property of the station as sample. Some methods fuzz this by using grids to embrace stations without enough data in the 30-year. BEST and I use a better way.
“there is no justification for reporting either current global temperatures or anomalies to three or even two significant figures to the right of the decimal point”
That’s wrong. The global mean is not the temperature of a point. It is a calculated result, and has a precision determined by the statistics of sampling. An example is political polling. The data is often binary, 0 or 1. But the mean of 1000 is correctly quoted to 2 sig fig.
“it will be necessary to compute a baseline average for every station before station anomalies can be computed”
Yes, that is what is done.
” moving cold fronts”
This matters very little in a global average. For time, it’s on a day scale when he minimum unit is a monthly average. And for space, it doesn’t matter where the front is; it will still be included.
“I’m just asking for complete transparency”
There are plenty of scientific papers, Brohan 2006 is often quoted. BEST will tell you everything, and provide the code. For my part, the code is on the Web too, and the method is pretty simple. I do try to explain. And I get basically the same results as the others (using unadjusted GHCN).
Nick,
I said, and you responded, “‘it will be necessary to compute a baseline average for every station before station anomalies can be computed’ Yes, that is what is done.”
However, [Nick Stokes April 23, 2017 at 3:13 am]: “Scientists NEVER average absolute temperatures.”
Which is it? Do ‘scientists’ NEVER average absolute temperatures or DO they average absolute temperatures?
Do you see why you might have a credibility problem with readers of this blog? It seems to me that you say things that support your claims when you want to try to shut off questioning, but then reverse yourself if backed into a logical corner. Is that transparency?
Clyde ==> Note that the denial that BEST etc are “:averaging averages” is simply false. Last I looked at the BEST methods paper, they deal with monthly averages from stations, Krieg values for unknown or questionable stations (with a known minimum error of 0.49 degrees C), etc etc. Read the BEST Methods paper..
Note, BEST will have made “improvements” to ythe original methods descrtibed, but they are not substantially changed.
Clyde,
“Which is it? Do ‘scientists’ NEVER average absolute temperatures or DO they average absolute temperatures?”
Scientists never do a spatial average of absolute temperatures, which is what we were talking about. For all the reasons of inhomogeneity that you go on about in your post. There is no point in that discussion, because they have an answer (anomalies) and you need to come to terms with it. It’s what they use.
Of course temperatures can be averaged at a single station. Daily temperatures are averaged into months, months to years (need to be careful about seasonal inhomogeneity). And you average to get an anomaly base.
The fundamental point is that spatial averaging is sampling, and you need to get it right. Averaging days to get a month is usually not sampling; you have them all. Sometimes you don’t, and then you have to be careful.
Clyde ==> What the averaging is doing is hiding, obscuring, obfuscating, overlaying, covering-up…I could go on…the real data about the state of the environment in order to make a basically desired politically-correct result appear — they need to have the Earth warming to support the CO2 warming hypothesis.
The fact that this is not strictly true — some places are warming (or getting less cold, really, like the Arctic) and some places are getting cooler — my piece on Alaska again — necessitates finding a way to be able to say (without outright lying) that “the Earth is Warming”.
Thus the dependence on averaging averages until the places that are warming make the average go up. When ;land values failed to provided enough up, sea values were added in.
None of this is a mystery — nor a conspiracy — just how it is.
The Earth warms and cools — all of these Numbers Guys believe that their derived numbers = truth = reality. They are, however, just numbers that may or not have the meaning that is claimed for them.
This is what the attribution argument is all about — it is the attribution that is the important (and almost entirely unknown) part — not whether or not Climate Numbers Guys can produce an “up” number this year or not.
But you never do jump in that pool with a perfect thermometer. Then you use that highly precise, entirely theoretical, prediction as data???
I’ll retire to Bedlam.
Nick,
You said,”An example is political polling. The data is [sic] often binary, 0 or 1. But the mean of 1000 is correctly quoted to 2 sig fig.”
Two significant figures is not the correct precision! In a poll, individual humans are being questioned, and they are either polled or not, and represented by an integer (counted) if polled. In the division of two integers to determine a fraction, there is infinite precision. That means, the CORRECT number of significant figures is whatever the pollster feels is necessary to convey the information contained in the poll. That means a percentage to units (or even tens) may be appropriate if there is a preponderance for one position. However, if it is very close, it may be necessary to display more than two significant figures to differentiate between the two positions. The uncertainty in the polling is a probability issue, and is related to the size of the sample. Of course, this doesn’t take into account the accuracy, which can be influenced by how the question is worded, and in the case of controversial subjects, the unwillingness of people to tell a stranger how they really feel.
“So you use the measured data to predict the unmeasured.”
Wow, one of the most unhinged and unscientific thing you have ever said.
75% of SST is made up. Phil Jones admitted as much,
“75% of SST is made up. Phil Jones admitted as much,”
He didn’t, and it isn’t.
He did say it.
And they are.
“He did say it.”
Quote, please.
date: Wed Apr 15 14:29:03 2009
from: Phil Jones subject: Re: Fwd: Re: contribution to RealClimate.org
to: Thomas Crowley
Tom,
The issue Ray alludes to is that in addition to the issue
of many more drifters providing measurements over the last
5-10 years, the measurements are coming in from places where
we didn’t have much ship data in the past. For much of the SH between 40 and 60S the normals are mostly made up as there is very little ship data there.
Cheers
Phil
pbweather,
Thank you. That is not saying that 75% of SST data is made up. It isn’t saying that any data was made up. He’s saying that normals were made up. He explains the history – we have a whole lot of new buoy data in a Southern Ocean region where there wasn’t much in the anomaly base period. Should we use it? Of course. Normals aren’t data – they are devices for making the anomaly set as homogeneous as possible. But it is better to allow a little inhomogeneity, with an estimated normal, than to throw away the data.
Normals are estimated for land stations too, when data in the base period is lacking. The methods have names like first difference method, reference station method. Zeke explains. Of course, the Moyhu/BEST method bypasses all this.
Mosher needs to read thus, apologize for his gross ignorance over the years regarding all things statistical, then shut up for the rest of forever. Every time he comments, a valid statistic somewhere dies.
It seems to me that the only reason for trying to come up with a “justifiable” one single temperature for the Earth is to then be able to blame any change on one thing, namely CO2.
+many
Clyde, I work with distributions all day long in a statistical sense. I may have missed your explanation some where above, but do you know what the distribution for temperatures is? It obviously has negative skewness, but I can invert a large number of distributions to get that figure (or stick with a bounded distribution with negative skewness.) Thanks.
John,
I have not seen a histogram of the binned global temperatures. The frequency distribution I supplied for the article was my construction based on a mean of about 50 deg F, a maximum range of about 250 deg F, and an estimated SD of about 30 deg F. The construction provided about 70% of the samples within +/- 30 deg F and about 95% within +/- 60 deg F. Its primary purpose was to compare the distribution of temperature data for the globe versus the hypothetical case of improving the precision of temperature measurement in a system with much smaller variance.
I tried to match those parameters with doubly bounded distributions. (Beta and JohnsonSB) I’d include images but ‘m not adept at inserting them. Send me an email at jmauer@geerms.com and I will mail them.
http://stoneforge.com/wp-content/uploads/2017/04/JohnsonSBSpencer0417.png
http://stoneforge.com/wp-content/uploads/2017/04/betaSpencer0417.png
John,
Except for the range of the horizontal scale, your constructions look very much like mine. Is there something that you wanted to point out?
Clyde,
Minor quibble about the ball bearing example. In such a measurement there are the instrumental errors (randomness in the micrometer reading) and placement errors. Averaging will tend, in the limit, to cancel out the instrumental errors. The placement errors, however, have a different property: they yield, by definition a number less than the actual diameter. This is because the the definition of the diameter is the maximum possible distance between two points on the sphere. Every other position measures something less than the diameter. Hence, an average of placement errors will yield a bias that increases with the placement error.
TGB
TB3200,
I would consider it a quibble because the purpose of the article was NOT to instruct the readers on best practices for determining the maximum diameter of a ball bearing. It was to demonstrate how precision can be increased with a fixed value being measured, versus what is done with a quantity that is always different.
But the diameter is fixed. I simply gave an example of how the measurement of a ‘fixed’ quantity may not fluctuate about the actual value in a way that the average converges to the actual diameter. For anyone thinking deeply about the meaning of the mean (apologies for the pun), it is important to think seriously about issues such as this.
http://fs5.directupload.net/images/170104/e74esgs9.jpg
(courtesy of bindidon)
Mosher once mentioned that sampling was not a problem. 300 stations give the same result as 3,000 stations. In the above graph (the blue question mark thingy), uah land is compared with uah grids at the temperature stations. And they look pretty close… Doesn’t this vouch for the accuracy as far as sampling goes?
No. If the complete set is invalid, any subset is likewise invalid.
Insufficient is probably a better word than invalid.
Right Mark, that’s why i said, “as far as sampling goes”…
Um, that’s actually what I was referring to. If whatever data you are analyzing is not sampled properly, then no subset of the data is sampled properly, either. If there is a systemic (or systematic) error in the data, it will show up in the subsets as well. If the average is bad across the whole, it is bad across a subset of the whole. The only thing such an analysis vouches for is the consistency in its inaccuracy.
Mark T on April 23, 2017 at 9:50 pm
No. If the complete set is invalid, any subset is likewise invalid.
Either you did not understand what afonzarelli presented, or you think that UAH is as invalid as is GHCN.
Maybe you try to explain your ‘thought’ a bit more accrately? You keep so carefully superficial here…
Hey there, Bindi… It seems to me that your comparison graph using the uah land data renders clyde’s concerns moot (with the possible exception of elevation when it comes to the land stations). UAH is like having thermometers EVERYWHERE. So when we use just the uah data at the GHCN stations and get the same result, that essentially says the same thing as mosher (regarding size of sample)…
Terrific graph, btw, was it your idea or did you get wind of it from someone else?
afonzarelli on April 24, 2017 at 5:39 pm
Hi again Fonzi,
Terrific graph, btw, was it your idea or did you get wind of it from someone else?
1. The very first reason to exploit UAH’s grid data was that I wanted to exactly know how UAH behaves above the mythic NINO3+4 region, whose SSTs are so determinant in computing ENSO signals. Roy Spencer gave me a hint on the readme file assciated to that grid data.
I thought: well, if UAH’s Tropics plot shows higher deviations during ENSO activities than for the whole Globe, then maybe they are even higher in the ENSO region. Bad catch:
http://fs5.directupload.net/images/170425/wzfccr9o.jpg
At least for 1998 and 2016, the Tropics plot keeps way ahead.
2. Then I wanted to compute anomalies and trends for the 66 latitude zones in the UAH grid data:
http://fs5.directupload.net/images/161028/g25fmuo9.jpg
where you see that the younger the trend period, the more it cools in the middle, and the more it warms at the poles, especially at SoPol.
It became suddenly interesting to compare 80-82.5N in UAH with the same zone in GHCN. And last not least I had the little idea to mix the UAH grid software with that I made for GHCN, in order to compare the trend for UAH’s 80-82.5 N zone with that of the average for the three cells encompassing the 3 GHCN stations there. The fit was good (0.46 °C / decade for the 3 cells over GHCN vs. 0.42 for the 144 grid cells).
3. The idea of comparing small, evenly distributed subsets of UAH’s grid with the full average is from commenter ‘O R’ (maybe it’s Olof R we know from Nick’s moyhu):
https://wattsupwiththat.com/2017/01/18/berkeley-earth-record-temperature-in-2016-appears-to-come-from-a-strong-el-nino/#comment-2401985
afonzarelli on April 23, 2017 at 5:45 pm
In the above graph … uah land is compared with uah grids at the temperature stations. And they look pretty close… Doesn’t this vouch for the accuracy as far as sampling goes?
Hello fonzi
Until last year I was quite convinced by the accuracy of such comparisons.
Simply because I thought that the UAH temperature record would have some little degree of redundancy, and thus comparing UAH’s Globe’s land temperature average time series with one made out those UAH grid cells encompassing the GHCN stations might give a hint on a correct GHCN station distribution over land surfaces.
But inbetween I produced time series of e.g. 32, 128 or 512 evenly distributed cells out of the 9,504 cells of UAH’s 2.5° grid.
The analogy of monthly anomalies, linear estimates and long time running means of the 512 cell selection is amazing. Steven Mosher is right: the Globe is heavily oversampled.
That however means that above comparison makes no longer sense, as the 2,250 grid cells encompassing actually 5,750 GHCN V3 stations can be accurately represented by far less cells over land surfaces.
Bindidon,
You said, “Steven Mosher is right: the Globe is heavily oversampled.”
The problem is that the areas where people live are oversampled, and the areas where few people live are undersampled.
Clyde, wouldn’t that be relatively easy to figure out? What i mean is, couldn’t the data from the most remote stations be compared with more typically located stations to see if there is a difference between the two? (or for that matter, a hundred or so stations could be placed in remote areas) Bindidon did it using the uah grids comparing uah land with the uah grids at the GHCN stations and got the same result…
http://fs5.directupload.net/images/170104/e74esgs9.jpg
afonzarelli,
First, the whole point of my articles is that we have the means to quantify just how accurate and precise our data and calculations are. Saying that two things look similar is only qualitative. I’m arguing that the bad habit of mathematicians and physicists often ignoring the kind of realities that engineers deal with in measurements has created a false belief in precision that isn’t there. Modern computers don’t help either because people get into the habit of typing in numbers and seeing a long string of numbers come out and don’t question whether or not they are meaningful. When strongly-typed languages like FORTRAN were in vogue, one had to pay attention to the type of variable (integer versus floating-point) and the number of significant figures for input and output. Today, we have programming languages that the variable type can change on the fly and it becomes more difficult to follow the propagation of error.
One should be careful about the proverbial comparing apples with oranges. If you want to compare two stations then there are a lot of tests that should be defined to be sure that they actually make good comparison samples. For example, are they in the same climate zone, same elevation, same distance from large bodies of water, same distance from mountains and on the same side, are there confounding effects such as one of them being downwind from a major source of air pollutants, do the surrounding areas have similar land use, etc. Anecdotally, where I live in Ohio, it is commonly believed that most of the snow falls north of Interstate 70 in the Winter. Assuming that common wisdom is correct, that means a change of a few miles can mean a big difference in snow cover and temperatures.
afonzarelli on April 25, 2017 at 10:29 pm
Please Fonzi: read again the end of the comment above. UAH’s grid data contains by far too many redundancy, as you need by far less grid cells than the 2,250 cells above GHCN stations in order to produce a time series showing e.g. a 98 % fit to the entire 9,504 set.
The only valid statement here would be the inverse: if the set of UAH grid cells encompassing GHCN stations of a given regional or latitudinal zone of the Globe gives a time series differing totally from the complete UAH subset for the zone, then it is likely that the GHCN set is not representative for that zone.
I’ll just repeat my call for even experimentally testable quantitative equations for the mean temperature of a radiantly heated ball . I have yet to see see that testable physics and until I do statistical estimates are of secondary interest because they have no theory against which to be judged .
minor comment: i do not see a “stubby green line” .
I think it’s on the x-axis above the 50. And it isn’t green.
Nick and anna,
Yes, it is the short, thin line above 50 on the x-axis. I’m sorry, but it looks green on my monitor and was selected from a palette of colors where green should have been.
Green on my screen as well.
Clyde ==> Don’t sweat it — I’m blue/green color blind (see the blue/green scale differently than others, apparently) and have several times directed readers to look at the green line when it was blue and vice versa.
A fantastic series of two articles. Very understandable basic statistics explained simply.
Nick Stokes: “Sometimes the readings that you would like to use have some characteristic that makes you doubt that they are representative”
Like a Global Average Temperature? And not even a reading. It’s a concoction.
Andrew
Think the other relevant point is the sheer extent of the averaging. Diurnal and seasonal variations are up to 50 times larger than the claimed change in average due to human activities. Even under controlled lab conditions, could accurate measurements be made from a system where the noise is so much stronger than the signal? I doubt it. Not even with massive low-pass filtering. In most scientific circles is is considered poor practice to infer anything from measurements made below the noise floor of the system. Here, they are at least 30dB below the noise floor.
Even if we were to assume that diurnal and seasonal variations average out, there are still the decadal and longer variations in the data set that we can’t control for. Until you can demonstrate that whatever caused the Little Ice Age and all the warm periods over the last 8000 years isn’t also causing the current warming, then you can’t assume that the warming must be caused by CO2.
Averaging is a very useful bit of technology, often somewhat mis-understood, and has as its intention (so I believe) the partial simplification of otherwise horrendously complicated assemblies of data. Averaging over time is a great help to politicians and media persons, both groups tending not to be overly expert arriving at reasonable conclusions from scattered original data, with the typical intent providing some sort of prognosis. Inevitably, whatever form of averaging is used, valid information is disguised or hidden, leaving scope for alternative opinions. Of great importance is exactly what is averaged.
In the discussions above we’ve seen some eloquent defences of various choices, instructive to me and I suspect instructive to some persons other than me.
What seems not to have been discussed explicitly is exactly what types of values are being addressed, and why. I and the vast majority of ordinary folks exist at the surface of the earth at elevations between -50m and perhaps 1000m (I’m guessing). A very few (relatively) spend their time on the oceans at sea level. The conditions that control what we can grow things in exist approximately in these bands – plenty of substantial exceptions of course – but the important temperatures and moisture conditions are roughly in these regions.
So I ask, why are we so concerned about conditions remote from these bands?
Taking averages is the the ultimate in data smoothing, with the linear fit next in severity.
When I see a plot like the 1979-2016 one above I really do despair. Who in their wildest imaginings can seriously propose that the straight line that has been “fitted” to the observations serves a useful purpose? A really worthwhile improvement would have been a pairs of lines representing confidence intervals (95% level?) for the least squares line and for single observations from the same series. Even the simple inferential statistics (Quenouille correction omitted) would have been a help, but no, there are none. We have no idea whether the published line has any value.
robinedwards36 on April 24, 2017 at 9:01 am
1. Inevitably, whatever form of averaging is used, valid information is disguised or hidden, leaving scope for alternative opinions. Of great importance is exactly what is averaged.
Feel free to have some closer look at e.g.
ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/ghcnm.tavg.latest.qcu.tar.gz
or at
http://www.nsstc.uah.edu/data/msu/v6.0/tlt/tltmonacg_6.0
There you’ll find all you need; but I guess you won’t enjoy it that much 🙂
2. I and the vast majority of ordinary folks exist at the surface of the earth at elevations between -50m and perhaps 1000m (I’m guessing)
Your guess isn’t so bad! The average GHCN land station height is around 410 m above sea level.
3. When I see a plot like the 1979-2016 one above I really do despair. Who in their wildest imaginings can seriously propose that the straight line that has been “fitted” to the observations serves a useful purpose?
Aha. Excel’s linear estimate function creates “straight lines “fitted” to the observations”.
Very interesting…
4. A really worthwhile improvement would have been a pairs of lines representing confidence intervals (95% level?)
Do you really think that we here don’t know about CI’s? I can’t imagine that.
Want such a chart Sah? Here is one.
http://fs5.directupload.net/images/170425/s7qhyvnn.png
The problem is not to show CI’s. The problems of doing that is:
– who is interested in such information (roughly 1% of the commenters I guess);
– I for example often publish here charts with a comparison of various plots. If I show all them with the CI intervals, you soon stop understanding the info.
I wonder about the very concept of ‘average’ as applied to temperatures…over the whole globe…what can it mean? Then I look at the distribution of temperatures: fairly fat-tailed. Looks somewhat like a Cauchy distribution, which as we know, does not entertain an average…if this is correct, then the ‘average’ temperature of the globe slips like sand through our fingers.
Something which I believe illustrates what is being said here is the relative nature of the meteorological measurements being taken. When we take measurements of length and weight then we have a standard weight or length to compare them with which does not change but when we measure the temperature of air circulating over the point where we have placed our thermometer we compare it with measurements at different places or at different times . Suppose we used the same relative convention when measuring length of objects then we would be saying things like the object were measuring today is longer than the object we measured yesterday at this point or the object you are measuring over there is longer than the object I am measuring here and then try to take the average of the length of all the objects being measured here in order to get a clearer picture of the length of objects generally.
It seems to me that taking the average Tmax and Tmin would be a better use of the data than trying to generate the mythical Tavg. From the amount of statistical thrashing performed on the raw data, it’s obvious that a square peg is trying to be formed from a round hole. There are 5700-some temperature stations in the GHCN system, mostly in the temperate areas and heavily in the US and Europe. All kinds of shenanigans are performed with the data to create data for areas where no data exists, and then this data is used in the anomaly calculations, unless I’m misunderstanding what I’ve read to this point. I really hope I am, because using predictions as data violates pretty much every rule of measurement.
So rather than doing all of this work for dubious results, why not just use the data one has to generate an anomaly, or Tmax and Tmin, for those stations and call a duck a duck. “Here is the anomaly for the xxxx number of weather stations in the GHCN system for the month of March.”
Then one isn’t projecting temperatures for grid cells up to 1200 km away and calling it “data.”
HEAR HEAR!!!! +1000
Why don’t we put a satellite orbiting Mars so it has constant view of Earth and so we can use Wien’s Law to measure the temperature of the “Blue Dot” (Sagan) by dividing (the constant) 2.898 by the peak frequency of the energy radiated? This is one of the fundamental laws used by astronomers to measure the temperature of stars. This gives the temperature of the (black) body and is an “all over” reading – hence a proper “average” temperature.
This could be easily accomplished, and could be done in a few years. Then all the argy bargy about average temperatures, bias, not enough stations, projecting onto non station shells etc just falls away. Readings would soon accumulate to give us a definitive take on Earth’s “average” temperature and at a fraction of the annual cost of the climate alarmist establishment’s terrestrial measurements.
BJ in UK.
How about just using one of the GOES weather satellites in geosynchronous orbit for a lot less money?
I’m not sure, but I think that the reported “temperature” is simply (TMax +Tmin)/2, In effect some information has already been discarded. Whenever I’ve decided to look at both independently the grand scale result turns out to be effectively the same, so I scarcely bother and more unless seasonal effects are what is under scrutiny. Normally I work with monthly data, reducing these to what I call monthly differences and what some would call monthly anomalies – which to me implies that something is wrong with them!
James,
Basically, I agree with you. Instead of trying to demonstrate what the average global temperature is, and how it has changed, I think that it would be preferable to just select the best and longest recording stations and state something to the effect, “Our analysis indicates that the most reliable temperature stations for the last xxx years have a trend in temperature change of x.x degrees C per century, with a 95% certainty of +/- x.x degrees C.”
James Schrumpf on April 25, 2017 at 3:00 am
It seems to me that taking the average Tmax and Tmin would be a better use of the data than trying to generate the mythical Tavg.
I can’t agree with you.
Firstly because firstly, Tmax and Tmin measurements didn’t exist in earlier times. Moreover, building their mean to construct the average would lead to errors.
Here you see for example some number columns
18.43 25.71 21.89 22.07 -0.18
18.86 26.10 22.27 22.48 -0.21
17.56 25.25 21.16 21.41 -0.25
15.40 24.07 19.49 19.74 -0.25
13.04 23.29 17.84 18.17 -0.32
10.63 21.44 15.95 16.04 -0.09
10.27 21.44 15.54 15.86 -0.32
11.19 21.38 16.23 16.29 -0.05
12.56 21.54 16.90 17.05 -0.15
14.14 21.90 17.81 18.02 -0.21
15.68 23.18 19.34 19.43 -0.09
17.28 24.57 20.80 20.93 -0.13
representing from left to right
– the monthly absolute value averages for Tmin, Tmax and Tavg of a randomly chosen GHCN station (EAST LONDON, SA) for the period 1981-2010 (i.e. their so called baselines)
– the mean of Tmin and Tmax
– the difference between that mean and Tavg.
The mean of these differences in turn is 0.19 °C. In 30 years! This means that choosing, for a recent period where Tmin and Tmax measurement exist, the mean of Tmin and Tmax to represent Tavg leads to an average error of 0.06 °C per decade.
That is as much as the trend for GISTEMP in the whole XXeth century, or half that of UAH during the satellite era.
My point was that we should not be generating a global average temp at all; however, if the urge to quantify just can’t be resisted, a Tmax and Tmin average would be better than Tavg.
It’s obvious that merely averaging the monthly high and low together is a very imprecise and inaccurate average. But so is generating an average monthly anomaly by itself. Without the Tmax and Tmin, one has no context for Tavg, no sense of what’s really happening. It’s also obvious that if one takes a sample of temps and calculates Tavg, and then raises the lowest of the temps for each month by 0.5 degree, the Tavg will increase without the Tmax changing at all. It got “warmer,” but not really.
All these problem as obvious. However, it seems that the preferred choice is the least informative value that could be calculated from a large data sample.
I have really enjoyed reading all of the comments in this thread. I’d love to see this tacked so that the discussion doesn’t get lost just because the thread falls further and further back in time.
James Schrumpf on April 25, 2017 at 9:27 pm
It’s also obvious that if one takes a sample of temps and calculates Tavg, and then raises the lowest of the temps for each month by 0.5 degree, the Tavg will increase without the Tmax changing at all. It got “warmer,” but not really.
Why to do that? Who should do that? I’m afraid that’s no more than one of these ugly ‘realsclimatecience’ mares. And btw: what increases since a while is not Tmax! It is Tmin:
http://fs5.directupload.net/images/170426/mcjprbl5.jpg
I have really enjoyed reading all of the comments in this thread.
Me too! I mostly enjoy the comments here by far more than many of the guest posts. That’s in some sense Anthony’s secret: to let us have heavy, often controversial but mostly fruitful discussions about matters having sometimes few to do with their “official” context 🙂
Bindidon, Thanks for your reply and references. Haven’t so far been able to look at the first of these due to its format. The second seems to be text with numbers (a lot) that has no key whatsoever as to what they are or their format, so cannot yet say whether I’ll enjoy them!
3. Indeed, Most people (Prof Jones excluded) can make Excel produce linear fits, I understand from the numerous references to them in threads such as this one, though seldom does anyone see fit to display any of the inferential statistics. I never use spreadsheets, despite their power, preferring the much quicker route for stats/graphics of this sort offered by my own stats package. Thanks for your diagram showing that it can be done. I wondered if you may have omitted a “sarc” after the first line of your 4. If I were doing the fitting using my own stats package I’d also have probably included the CIs for single observations from the same data source (in this case a time series) together with the Quenouille adjustment if it made a real-life difference. As you’ll have noticed, Quenouille has virtually no effect when applied to long time series even if the serial correlation of the linear residuals is very substantial. I see from your annotation that the t ratio for the regression coefficient is very close to 4.03, implying a probability of around 6E-5. I like to include this sort of thing, in the hope that at least some of my readers gain something from it!
I’ll try to download the data from UAH – may already have it – but can’t post diagrams here. I have to use email.
Hope to see a further comment from you.
Robin
robinedwards36 on April 25, 2017 at 7:59 am
Thanks for the reply!
But please don’t try to use the data stored in the UAH reference, I deliberately chose the ugliest one, sorry. It was my sarc method 🙁
It is the UAH baseline (1981-2010 average of the absolute temperatures of each 2.5° grid cell in each month) and of no use except for people who want/need to reconstruct UAH absolute temperatures out of their anomalies.
And the GHCN data looks quite rebarbative as well: you have to write some software in R, C++ or whatever else to process it adequately. Please use data processed out of it instead.
*
If you want to start somewhere at UAH’s data in a meaningful way, take e.g.
http://www.nsstc.uah.edu/data/msu/v6.0/tlt/uahncdc_lt_6.0.txt
The file contains the anomalies and trends for 8 zones (each global plus land and ocean) ans 3 regions.
Similar data exists for RSS, UAH’s satellite concurrence, for radio-sonde balloons, for surfaces (GISTEMP, NOAA, BEST, HadCRUT, JMA etc).
GISS land+ocean for example is in
https://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.txt
I have no math or deep stat education either. Thus Quenouille is an unknown matter to me; I just know that Nick Stokes is quite aware of what it is for:
https://moyhu.blogspot.de/2013/09/adjusting-temperature-series-stats-for.html
I guess Nick is the right interlocutor for such things…
A little addendum:
… but can’t post diagrams here. I have to use email.
Why not?
“Averages can serve several purposes. A common one is to increase accuracy and precision of the determination of some fixed property, such as a physical dimension.”
I disagree that it “increases” accuracy or precision. It abandons accuracy and precision in an effort to summarize and simplify.
It throws out precise and accurate observations in an effort to see through both noise and volatility to try to model, elicit, and compare more general aggregate trends.
mib8,
Did you read my first essay with the citations?
mib8 on April 25, 2017 at 9:46 am
I disagree that it “increases” accuracy or precision.
Firstly: to ‘disagree’ has nothing to do with to ‘falsify’.
And that is, on a science site, what in theory you should do.
Please read this below, and… try to manage NOT to draw the wrong conclusions.
http://www.ni.com/white-paper/3488/en/
Hi mib8 and Clyde,
When i wrote (jerry krause April 26, 2017 at 1:51 pm) I had not read mib8’s comment and your (Clyde) reply.
We three are possibly both right and wrong. I would say the averaging process referred by Clyde in his first article and in my comment serves the purpose of showing the precision of direct measurements of variables (results). If the precision is not good, one cannot pretend that the accuracy is good. However, if the precision is ‘good’ one cannot claim the accuracy is good. One must calibrate the instruments used and the process used with some known standard or standards to begin to make the ‘argument’ that the measurements are accurate.
In “Surely You’re Joking, Mr. Feynman!”, Richard Feynman in his account–The 7 Percent Solution–considers his reasoning is likely correct because his hypothesis consistently explains several different results even though there was a 9 percent difference between his predicted results than those which he considered to have explained because so many things ‘fit’.
Feynman wrote: “The next morning when I got to work I went to Wapstra, Boehm, and Jensen, and told them, “I’ve got it all worked out. Everything fits.” Christy, who was there, too, said, “What betap-decay constant did you use?” “The one from So-and-So’s book.” “But that’s been found out to be wrong. Recent measurements have shown it’s off by 7 percent.”
This created that possibility that Feynman’s hypothesis was off by 16 percent or only 2 percent. So, he and Christy went into separate rooms and pondered.
“Christy came out and I came out, and we both agreed. It’s 2 percent …. (Actually, it was wrong: it was off, really, by 1 percent, for a reason we hadn’t appreciated, which was only understood later by Nicola Cabibbo. So that 2 percent was not all experimental.)”
mib8, you wrote: “It throws out precise and accurate observations in an effort to see through both noise and volatility to try to model, elicit, and compare more general aggregate trends.” This refers to the averaging commonly done with climatic data and you can read that I totally agree with you before I read your statement.
And I am pretty sure Clyde would agree with you and I.
Have a good day, Jerry
Thank you, Bindidon for your replies and comments. The plot you provide is of course exactly what I would produce from this data set, although I would normally also supply the confidence ranges for single observations from the same data -which tend to be a nasty surprise for anyone expecting to be able to generate a useful (at the practical level) prognostication for the net available observation.
I suppose that I have reluctantly to agree more or less with your estimate of how many readers are interested in confidence intervals. This does not mean that they should not be interested in them! It simply shows just how little understanding readers have of even simple statistical concepts, and even more that they are unlikely to be able to compute the necessary stuff. The 1% that you guess may have some understanding of the background to statistical fitting are surely worth catering for. We have to try to get some of it across to those who regularly display their indifference to or ignorance of what may legitimately be construed from statistical analyses.
I don’t use Excel for any stats. Having written and sold a fairly general stats package some years ago I find it to be more than adequate to do anything in stats and graphics that could conceivably be useful at the levels we are talking about (and vastly simpler to use!)
I may already have posted this inadvertently! However, I find I do have several copies of the UAH assemblies of data for various regions, and have already, over the years, done many analyses based on them. What I’ll do now is to exit into RISC OS and put my latest version into FIRST – my regression package – and look at the Global Land data only. Where can I send this output as an email, please? You are likely to be a bit surprised by my take on the data.
Robin
robinedwards36 on April 25, 2017 at 11:43 am
Where can I send this output as an email, please?
Please simply contact Anthony Watts via
https://wattsupwiththat.com/about-wuwt/contact-2/
and ask him for my email address: this comment will be my agreement for him to do so.
But I must confess that I still don’t understand why you can’t publish your data directly.
I can’t imagine your system unable to produce some graphics in png, jpeg or even pdf format which you easily might upload using a web site like
http://www.directupload.net/index.php?mode=upload
providing this little service here in Germany for free.
I don’t use Excel for any stats.
If I wasn’t so lazy, I would have been using R, Matlab and stuff processing netcdf formats since longer time! But a hobby must keep a hobby.
Did you have ever had a look at e.g.
https://moyhu.blogspot.de/p/temperature-trend-viewer.html
Maybe you appreciate this interface…
Hi Clyde,
A problem with a blogsite such as this is: too many articles, too many comments. However, I see that you do continue to review the comments and respond as you see need.
You first article, which you requested one to read, began as the first couple of lectures in my Chemistry Quantitative Analysis course began the fall of 1960 when I was a 2nd year university student. In this course we learned the need to do at least three sets of analysis to see what our precision might be and calculated the simple average of the three results so we could calculate the deviation of each result from this average value. And we did this so we could statistically determine if the one with the greatest deviation from the average could be dismissed and the average of the two closer results be used. Because the unknown we quantitatively analysed was a standard sample whose composition had been confirmed by ‘trained’ chemists our results were graded as to the deviation of our averaged results from this standardized value which was considered to be the ‘accurate’ result.
So, I consider this to be the primary useful purpose of the averaging process in science.
However, when the science is climatology, the average values of variables, measured over a long period of years, like temperature and precipitation are necessary to characterize the yearly ‘climate’ , as commonly divided into months,of a given location. However, given the long term average, a monthly average of a given year can vary widely from the monthly value of the previous and/or following year and from that of the long term average. The critical point in use of the averaging process is that there is no accurate value of any of this monthly and yearly fundamental variables. Nor, is there one expected.
Hence, the common practice of averaging the temperature of a day, of month, of a year destroys the information of what is actually occurring during a given day, month, or year. At most commercial airports, fundamental meteorological variables are measured (observed) and reported hourly. This practice quickly generates a lot of numbers which can quickly become mind-blogging. But if we are ever to understand how one day can be greatly unlike the previous and/or following or that which occurred the same day last year and/or the previous year, we must study at least the hourly data which is readily available if a scholars wants to make the effort to actually understand this.
Have a good day, Jerry