There is much criticism here of the estimates of global surface temperature anomaly provided by the majors – GISS, NOAA and HADCRUT. I try to answer these specifically, but also point out that the source data is readily available, and it is not too difficult to do your own calculation. I point out that I do this monthly, and have done for about eight years. My latest, for October, is here (it got warmer).
Last time CharlesTM was kind enough to suggest that I submit a post, I described how Australian data made its way, visible at all stages, from the 30-minute readings (reported with about 5 min delay) to the collection point as a CLIMAT form, from where it goes unchanged into GHCN unadjusted (qcu). You can see the world’s CLIMAT forms here; countries vary as to how they report the intermediate steps, but almost all the data comes from AWS, and is reported at the time soon after recording. So GHCN unadjusted, which is one of the data sources I use, can be verified. The other, ERSST v5, is not so easy, but there is a lot of its provenance available.
My calculation is based on GHCN unadjusted. That isn’t because I think the adjustments are unjustified, but rather because I find adjustment makes little difference, and I think it is useful to show that.
I’ll describe the methods and results, but firstly I should address that much-argued question of why use anomalies.
Anomalies
Anomalies are made by subtracting some expected value from the individual station readings, prior to any spatial averaging. That is an essential point of order. The calculation of a global average is inevitably an exercise in sampling, as is virtually any continuum study in science. You can only measure at a finite number of places. Reliable sampling is very much related to homogeneity. You don’t have to worry about sampling accuracy in coin tosses; they are homogeneous. But if you want to sample voting intentions in a group with men, women, country and city folk etc, you have inhomogeneity and have to be careful that the sample reflects the distribution.
Global temperature is very inhomogeneous – arctic, tropic, mountains etc. To average it you would have to make sure of getting the right proportions of each, and you don’t actually have much control of the sampling process. But fortunately, anomalies are much more homogeneous. If it is warmer than usual, it tends to be warm high and low.
I’ll illustrate with a crude calculation. Suppose we want the average land temperature for April 1988, and we do it just by simple averaging of GHCN V3 stations – no area weighting. The crudity doesn’t matter for the example; the difference with anomaly would be similar in better methods.
I’ll do this calculation with 1000 different samples, both for temperature and anomaly. 4759 GHCN stations reported that month. To get the subsamples, I draw 4759 random numbers between 0 and 1 and choose the stations for which the number is >0.5. For anomalies, I subtract for each place the average for April between 1951 and 1980.
The result for temperature is an average sample mean of 12.53°C and a standard deviation of those 1000 means of 0.13°C. These numbers vary slightly with the random choices.
But if I do the same with the anomalies, I get a mean of 0.33°C (a warm month), and a sd of 0.019 °C. The sd for temperature was about seven times greater. I’ll illustrate this with a histogram, in which I have subtracted the means of both temperature and anomaly so they can be superimposed:
The big contributor to the uncertainty of the average temperature is the sampling error of the climatologies (normals), ie how often we chose a surplus of normally hot or cold places. It is large because these can vary by tens of degrees. But we know that, and don’t need it reinforced. The uncertainty in anomaly relates directly to what we want to know – was it a hotter of cooler month than usual, and how much?
You get this big reduction in uncertainty for any reasonable method of anomaly calculation. It matters little what base period you use, or even whether you use one at all. But there is a further issue of possible bias when stations report over different periods (see below).
Averaging
Once the anomalies are calculated, they have to be spatially averaged. This is a classic problem of numerical integration, usually solved by forming some approximating function and integrating that. Grid methods form a function that is constant on each cell, equal to the average of the stations in the cell. The integral is the sum of products of each cell area by that value. But then there is the problem of cells without data. Hadcrut, for example, just leaves them out, which sounds like a conservative thing to do. But it isn’t good. It has the effect of assigning to each empty cell the global average of cells with data, and some times that is clearly wrong, as when such a cell is surrounded with other cells in a different range. This was the basis of the improvement by Cowtan and Way, in which they used estimates derived from kriging. In fact any method that produced an estimate consistent with nearby values has to be better than using a global average.
There are other and better ways. In finite elements a standard way would be to create a mesh with nodes at the stations, and use shape functions (probably piecewise linear). That is my preferred method. Clive Best, who has written articles at WUWT is another enthusiast. Another method I use is a kind of Fourier analysis by fitting spherical harmonics. These, and my own variant of infilled grid, all give results in close agreement with each other; simple gridding is not as close, although overall the method often tracks NOAA and HADCRUT quite closely.
Unbiased anomaly formation.
I described the benefits of using anomalies in terms of reduction of sampling error, which just about any method will reflect. But there is care needed to avoid biasing the trend. Just using the average over the period of each station’s history is not good enough, as I showed here. I used the station reporting history of each GHCN station, but imagined that they each returned the same, regularly rising (1°C/century) temperature. Identical for each station, so just averaging the absolute temperature would be exactly right. But if you use anomalies, you get a lower trend, about 0.52°C/century. It is this kind of bias that causes the majors to use a fixed time base, like 1951-1980 (GISS). That does fix the problem, but then there is the problem of stations with not enough data in that period. There are ways around that, but it is pesky, and HADCRUT just excludes such stations, which is a loss.
I showed the proper remedy with that example. If you calculate the incorrect global average, and then subtract it (and add later) and try again, you get a result with a smaller error. That is because the basic cause of error is that the global trend is bleeding into the anomalies, and if you remove it, that effect is reduced. If you iterate that, then within six or so steps, the anomaly is back close to the exactly correct value. Now that is a roundabout way of solving that artificial problem, but it works for the real one too.
It is equivalent to least squares fitting, which was discussed eight years ago by Tamino, and followed up by Romanm. They proposed it just for single cells, but it seemed to me the way to go with the whole average, as I described here. It can be seen as fitting a statistical model
T(S,m,y) = G(y) + L(S,m) +ε(S,m,y)
where T is the temperature, S,m,y indicate dependence on station, month and year, so G is the global anomaly, L the station offsets, and ε the random remainder, corresponding to the residuals. Later I allowed G to vary monthly as well. This scheme was later used by BEST.
TempLS
So those are the ingredients of the program TempLS (details summarized here) which I have run almost every night since then, when GHCN Monthly comes out with an update. I typically post on about 10th of the month for the previous month’s results (October 2018 is here, it was warm). But I keep a running report here, starting about the 3rd, when the ERSST results come in. When GISS comes out, usually about the 17th, I post a comparison. I make a map using a spherical harmonics fit, with the same levels and colors as GISS. Here is the map for October:
The comparison with GISS for September is here. I also keep a more detailed updated Google Earth-style map of monthly anomalies here.
Clive Best is now doing a regular similar analysis, using CRUTEM3 and HADSST3 instead of my GHCN and ERSST V5. We get very similar results. The following plot shows TempLS along with other measures over the last four years, set to a common anomaly base of 1981-2010. You can see that the satellite measures tend to be outliers (UAH below, RSS above, but less so). The surface measures, including TempLS, are pretty close. You can check other measures and time intervals here.
The R code for TempLS is set out and described in detail in three posts ending here. There is an overview here. You can get links to past monthly reports from the index here; the lilac button TempLS Monthly will bring it up. The next button shows the GISS comparisons that follow.
“How in world of declining global temperatures can they keep the adjusting observations in order to keep the anomalies close to the models?”
My main point here is that you can calculate the average yourself, from unadjusted data, and it makes very little difference.
The graph shows 2015 to 2018.
Thank you Nick.
Still didn’t not understand your fierce opposition to sensible skepticism.
Do appreciate the work you put in. Any comment on how temple s and uah can be so alike at times and so different at others.
Also Mosher has commented in the past on the 100,000 plus stations used for estimation but you cite less than 5000.
The data on which these temperature ‘anomalies’ are based is so diverse and rough, so fragmented and spotty that to quote the accuracy of the result other than in whole degrees is a systematic misrepresentation.
I can’t find the original WUWT article by Anthony where he lambasted infilling and pointed out examples where a location was infilled by data from a station across a range of mountains. The figure, 1200 (km or miles?) is mentioned in a number of comments over the years.
In digital signal processing it isn’t uncommon to employ a process similar to infilling. The thing is that a simple signal in a well behaved, linear, bandwidth limited channel is miles away from the conditions you have with temperatures on the surface of the planet.
What am I saying? Just because something sounds reasonable, that doesn’t mean it actually is. The difference between the satellite (and balloon) datasets and those derived from surface station data is disturbing.
Infilling seems like a overly simplistic approximation when you consider factors that can change surface temperature between a temperature station and infilled location. Factors could include elevation differences, latitude differences, wind speed and direction, topographical variations, average cloud cover, differences in precipitation, vegetation, etc. etc. (not to mention UHI bias).
As just one example, elevation differences on land will result in about 3.5 F difference w/ 1000 ft elevation change. This is for an adiabatic system where no energy is added or removed and the only difference is the absolute air pressure (or altitude).
If the many factors are actually accounted for in the data infilling calculation, this process quickly becomes very complicated – mathematical gymnastics so to say.
> Anomalies are made by subtracting some expected value from the individual station readings…
That there invalidates the data. The very worst error in all of science is expectations bias.
Feel free to explain you didn’t mean it this way but don’t deny that results expectations is a serious issue.
“Feel free to explain you didn’t mean it this way”
The point is that you calculate the difference from expected. This is close to the classic definition of information, and also corresponds to what we want to know in everyday life. If I tell you the average in Athens for October was 19.45°C, what can you make of that? Nothing much, unless you know what is normal there for October (19.6).
Nick, I have a problem with anomalies. Your graph shows a high temperature anomaly 0.8-1.1 degrees in March(?) 2016. Does it mean
a) that the global temperature in March 2016 was that much higher than a global average temperature in 1981-2010 , OR
b) that the global temperature in March 2016 was that much higher than a global average March temperature in 1981 – 2010?
I had a problem with Bob Tisdale’s post https://wattsupwiththat.com/2018/11/05/do-doomsters-know-how-much-global-surface-temperatures-cycle-annually/ which shows a global average temperature peaking in July. The Earth is closest to the Sun in January, furthest in July, why would the global average temperature peak then? I guess that his method might be skewed towards the northern hemisphere. Your result with a peak in March(?) look more like it.
George,
On your a/b, it means b. Anomaly should be calculated relative to the best prior estimate for that number.
“why would the global average temperature peak then?”
Anomalies can’t tell you anything about that effect – because each March (or whichever) is relative to previous Marches, all of which are similarly affected.
Thank you Nick, so March 2016 was unusually warm for March but not necessarily the warmest month of 2016.
George,
In most years the warmest month by absolute is August, maybe July. So learning that in 2018 August was the warmest month conveys little information about 2018. But the high anomaly in March 2016 does tell you something, mainly about El Nino.
I believe this is an important question. Could it be too do with a greater number of weather stations in the northern hemisphere?
I suspect it would be because of the greater landmass in the nh. The sea takes longer to heat up, so summer temperatures do not have as much effect on the overall temperatures. That also highlights the problems with using a ‘global temperature’ since such a thing does not really exist, nor mean anything if made up.
“But fortunately, anomalies are much more homogeneous.”
Not necessarily, if you examine anomalies for local regions, they’re all over the place. When one place is anomalously cold, another is anomalously hot, even across meaningful intervals of time. Even otherwise similar parts of the globe can have wildly different anomalies and they can even differ between adjacent micro-climates. When integrated over time, homogeneous anomalies can only result when the sample space is a normal distribution. If you already have a normal distribution of sample sites, why bother with anomalies? This same flaw applies to Hansen/Lebefeff homogenization which is only valid when homogenizing a normal distribution of sites. Keep in mind that it’s this same distribution of sites that establishes the predicted behavior which is subtracted from the measured data to produce the anomaly.
A big problem with anomalies is that they remove seasonal variability which eliminates an objective perspective about what change means relative to reality and the perception thereof. Furthermore, the differences between hemispheres are so large and they respond in such an independent manner from each other, any kind of averaging across them will misrepresent what’s actually occurring which is only visible when analyzing hemispheres or portions of hemispheres by themselves.
“Once the anomalies are calculated, they have to be spatially averaged.”
Spatially averaging temperature is an issue by itself, even when averaging anomalies. A linear average of temperature is the temperature that would arise if you uniformly combined all the matter from which the temperature arises, assuming that it all has the same heat capacity and the mixing process added no new heat. A linear average of anomalies is even worse, since the baseline temperature goes away, a 1C change from 270K is considered the same as a 1C change at 300K, even though a 1C change at 300K requires 50% more forcing to achieve and maintain.
Neither of these are the average that’s relevant to the physical mechanics of how the climate operates where W/m^2 are linear to each other and are all that matters relative to the energy balance and the sensitivity, thus local temperatures must be converted into W/m^2 of emissions, which are then linearly averaged and the result converted back into an EQUIVALENT temperature. In fact, the entire analysis should be done in the linear domain of W/m^2 and converted into a change in degrees K only at the very end. W/m^2 are linear to what satellite sensors are measuring directly anyway and is the natural way to process weather satellite data.