By Andy May
Generally, it is agreed that the Earth’s top-of-atmosphere (TOA) energy budget balances to within the margin of error of the estimates (see Kiehl and Trenberth, 1997). The incoming energy, after subtracting reflected energy, is thought to be roughly 239 W/m2 which matches, within the margin of error, the outgoing energy of roughly 239 W/m2. Satellite data suggest TOA energy imbalances of up to 6.4 W/m2 (Trenberth, et al., 2008). However, Zhang, et al. (2004) suggest that the uncertainty in the TOA measurements is 5-10 W/m2 and the uncertainty in the surface radiation absorption and emission is larger, 10-15 W/m2. We examine some potential causes for these uncertainties.
To compute the magnitude of the greenhouse effect, the TOA incoming and outgoing radiation is usually compared to the Earth’s radiation emissions due to its overall average surface temperature of approximately 288K (14° to 15°C) according to the HADCRU version 4 1961-1990 baseline absolute temperature dataset. Using Planck’s function or the similar Stefan-Boltzmann law (see figure 1), the radiation emitted by the Earth can be calculated from its temperature (T), if we assume the Earth acts like a blackbody. Normally the radiation calculation is done assuming an emissivity (e) of 1, which means the Earth is a perfect blackbody that emits as much energy as it receives. The area used is one square meter, so the result is given in Watts/m2. Using these assumptions, the calculation results in the Earth emitting about 390 W/m2 (Kiehl and Trenberth, 1997) for a surface temperature of 288K.

Figure 1 and Equation 1 (source)
The greenhouse effect (GHE), when calculated this way, shows an imbalance of 390-239=151 W/m2. Kiehl and Trenberth, 1997 calculated a similar overall forcing of 155 W/m2 using the same procedure. This GHE calculation makes a lot of assumptions, not the least of which is assuming the Earth has an emissivity of 1 and is a blackbody. But, here we want to consider the problem of using a global average temperature (T) for the Earth, which is a rotating sphere, with only one-half of the sphere facing the Sun at any one time.
One specific problem is that the Earth is not at a uniform global temperature. If it averages 288K, then there will be places on the planet that are 288K and those spots will emit roughly 390 W/m2. But, much of the planet will be at a different temperature and will emit energy proportional to T4. The average of T taken to the fourth power is not the same as the average of T4. This is clear from basic high school algebra, so how much difference does this make?
To answer that we will turn to the Hadley Climate Research Unit (HADCRU) version 4 global temperature database. We will use their version 4 baseline 1961-1990 absolute temperature dataset and their 1850 to 2017 temperature anomaly dataset. The construction of the baseline and the anomaly datasets is described in Jones, et al. (2012). Since the temperature series anomalies are anomalies from each series’ 1961-1990 average, we should be able to use the series baseline temperature to convert the anomalies to actual temperatures. These are both 5° x 5° gridded datasets. Anomalies are computed for each station to avoid problems with elevation differences, etc. This is done before they are gridded. Thus, adding the baseline temperature to the anomaly does not restore the original measurements. To quote from the HADCRU web site:
“Stations on land are at different elevations, and different countries measure average monthly temperatures using different methods and formulae. To avoid biases that could result from these problems, monthly average temperatures are reduced to anomalies from the period with best coverage (1961-90). For stations to be used, an estimate of the base period average must be calculated. Because many stations do not have complete records for the 1961-90 period several methods have been developed to estimate 1961-90 averages from neighbouring records or using other sources of data (see more discussion on this and related points in Jones et al., 2012). Over the oceans, where observations are generally made from mobile platforms, it is impossible to assemble long series of actual temperatures for fixed points. However, it is possible to interpolate historical data to create spatially complete reference climatologies (averages for 1961-90) so that individual observations can be compared with a local normal for the given day of the year (more discussion in Kennedy et al., 2011).
It is possible to obtain an absolute temperature series for any area selected, using data from the absolute file, and then add this to a regional average of anomalies calculated from the gridded data. If for example a regional average is required, users should calculate a regional average time series in anomalies, then average the absolute file for the same region, and lastly add the average derived to each of the values in the time series. Do NOT add the absolute values to every grid box in each monthly field and then calculate large-scale averages.”
By the way, “NOT” is capitalized on the website, I did not change this. My plan was to add the grid 1961-1990 temperature to the grid anomaly and get an approximate actual temperature, but they say, “do NOT” do this. Why tell the reader he can add the absolute 1961-1990 baseline temperature average to averaged anomalies and then expressly tell him to not add the absolute temperature grid to an anomaly grid? Every anomaly series must be referenced to its own 1961-1990 average, why does it matter if we average the anomalies and the absolute baseline temperatures separately before adding? So, naturally, the first thing I did was add the absolute 1961-1990 grid to the anomaly grid for the entire Earth from 1880 to 2016, precisely what I was instructed “NOT” to do. The absolute temperature grid is fully populated and has no missing values. The year-by-year anomaly grids have many missing values and the same cells are not populated in all years, it turns out this is the problem that HADCRU are pointing to in this quote.
Figure 1 shows the 1880 to 2016 global average temperatures computed the way HADCRU recommends. I first averaged the anomalies for each year, weighted by the cosine of latitude because it is a 5° x 5° grid and the area of each grid cell decreases from the equator to the poles with the cosine of the latitude. Then I add the global average 1961-1990 temperature to the average anomaly. While the baseline temperature grid is fully populated with absolute temperatures, the yearly anomaly grids are not. Further the populated grid cells come and go from year to year. This process mixes a calculation from a fully populated grid with a calculation from a sparsely populated grid.

Figure 1, Average the anomalies and then add the average 1961-1990 global temperature
By doing it in the way they expressly advise us not to do, we obtain figure 2. In figure 2, I add the appropriate 1961-1990 absolute cell average temperature to each populated grid cell, in each year, to create a grid of absolute temperatures and then average that, ignoring null cells. In this process, the absolute temperature grid matches the anomaly grid.

Figure 2, Convert each grid cell to actual temperature, then average
The difference between figures 1 and 2 is most apparent prior to 1950. After 1950, the lower plot is a few tenths of a degree lower, but the trend is the same. With perfect data, the two plots should be the same. Each time series is converted to an anomaly using its own 1961-1990 data, multiple series in each grid cell are merged using straight averages. But, the data are not perfect. Grid cells are populated in some years and not in other years. Prior to 1950, northern hemisphere coverage never exceeds 40% and southern hemisphere coverage never exceeds 20%. Given the wide discrepancy between figures 1 and 2, it is not clear the data prior to 1950 are robust. Or, stated more clearly, the data prior to 1950 are not robust. It is also not clear why the period 1950 to 2016 is 0.2 to 0.3°C cooler in figure 2 than in figure 1, I’m still scratching my head over that one.
The HADCRU procedure for computing global temperatures
The procedure for computing the HADCRU version 4 grid cell temperatures is described on their web site as follows:
“This means that there are 100 realizations of each [grid cell] in order to sample the possible assumptions involved in the structure of the various components of the error (see discussion in Morice et al., 2012). All 100 realizations are available at the above Hadley Centre site, but we have selected here the ensemble median. For the gridded data, this is the ensemble median calculated separately for each grid box for each time step from the 100 members. For the hemispheric and global averages, this is again the median of the 100 realizations. The median of the gridded series will not produce the median of the hemispheric and global averages, but the differences will be small.”
Thus, the HADCRU version 4 global average temperature is not a true average. Instead it is the median value of 100 statistical realizations for each populated grid cell and both hemispheres. Every temperature measurement contains error and is uncertain. The 5° x 5° latitude and longitude grid created by HADCRU contains, for a 12 month calendar year, 31,104 grid cells. Most of these have no value, figure 3 shows the number of these cells that are null (have no value) by year from 1880 through 2016.

Figure 3: (Data source)
As you can see, most of the cells have no data, even in recent years. In figure 4 we can see the distribution of populated grid cells. The cells with adequate data are colored, those with insufficient data are left white. Coverage of the northern hemisphere approaches 50% from 1960-1990, coverage of the southern hemisphere never exceeds 25%.

Figure 4 (source: Jones, et al., 2012)
So, the data are sparse and most of the data is on land and in the northern hemisphere. Both poles have little data. So HADCRU has two problems, first how to deal with measurement uncertainty and second how to deal with the sparse and uneven distribution of the data. Measurement uncertainty is dealt with by requiring that each grid cell have a sufficient number of stations reporting over the year being considered. Since the baseline period for the temperature anomalies is 1961-1990, sufficient measurements over this period are required also. Generally, they require the stations to have at least 14 years of data between 1961 and 1990. Stations that fall outside five standard deviations of the grid mean are excluded.
The monthly grids are not contoured to fill in the missing grid values as one might expect. Once the median temperature is computed for each grid cell with sufficient data, the populated grid cells are cosine-weighted and averaged, see equation 9 in Morice, et al., 2012. The area varies as the cosine of the latitude, so this is used to weight the grids. The weighted grid values are summed for each hemisphere, averaging the hemispheres results in a global average temperature. Seasonal and yearly averages are derived from monthly grid values.
Most of the populated grid cells are on land because this is where we live, yet 71% of the surface of the Earth is covered by ocean. Currently, this is not a problem because we have satellite estimates of the sea-surface temperature and the atmosphere above the oceans. In addition, we have the ARGO buoy network that provides high quality ocean temperatures. Yet, historically it has been a problem because all measurements had to be taken from ships. The critical HADSST3 dataset used to estimate ocean temperatures is described by Morice, et al., 2012. A fuller explanation of the problems estimating ocean grid cell historical temperatures is found in Farmer, et al., 1989. The data used prior to 1979, are from ship engine intakes, drifting buoys, and bucket samples taken over the sides of ships. These sources are mobile and prone to error. The ocean mixed layer is, on average, 59 meters thick (JAMSTEC MILA GPV data). See more on the JAMSTEC ocean temperature data here. The mixed layer is the portion of the ocean that is mostly in equilibrium with the atmosphere. This layer has 22.7 times the heat capacity of the entire atmosphere and exerts considerable influence on atmospheric temperatures. It is also influenced by the cooler, deeper ocean waters and can influence them due to ocean upwelling and downwelling (see Wim Rost’s post here).
My calculations
I started with the 1961-1990 baseline temperature data, called “Absolute” and found here. This is a series of monthly 5°x5° global temperature grids for the base period. Unlike the anomaly datasets, these grids are fully populated and contain no null values, how the Absolute dataset was populated is explained in Jones, et al., 2012. Figure 5 is a map of the average Absolute temperature grid.

Figure 5, Map of the “Absolute” data (data source: HADCRU)
My procedure is like the one used by HADCRU. I first read the Absolute grid, it populates an array dimensioned by 72 longitude 5° segments, 36 5° latitude segments, and 12 months or one year. Next, I break the HADCRUT4 global anomaly grid down year-by-year, average the populated cells, and then add the average Absolute 1961-1990 temperature to the average anomaly. The results are shown in figure 1. As discussed above, I also spent some time doing exactly what the HADCRU web site says I should “NOT” do, this result is shown in figure 2.
The HADCRU data go back to 1850, but there is very little global data before 1880 and much of it was taken in the open air. Louvered screens to protect the thermometers from direct sunlight were not in wide use until 1880, this adversely affects the quality of the early data. So, I only utilize the data from 1880 through 2016.
The surprising thing about the graph in figure 2 is that the temperatures from 1890 to 1950 are higher than any temperatures since then. Refer to figure 3 for the number of null values. There are 31,104 cells total, the maximum number that are populated is around 11,029 in 1969 or 35%. Figure 6, inverts figure 3 and shows the number of populated cells for each year.

Figure 6
Is the higher temperature from 1890 to 1950 in figure 2, due to the small number of populated cells? Is it due to the uneven distribution of populated cells? There is a sudden jump in the number of populated grid cells about 1950 that coincides with an anomalous temperature drop, what causes this? Is it due to an error I made in my calculations? If I did make an error (always possible) I have every confidence someone will find it and let me know. I’ve been over and over my R code and I think I did it correctly. I’ve read the appropriate papers and can find no explanation for these anomalies. All the data and the R code can be downloaded here. Experienced R users will have no problems, the zip file contains the code, all input datasets and a spreadsheet summary of the output.
Power and Temperature
The original reason for this study was to see what difference the computational sequence makes in computing the energy emissions from the Earth. That is, do we take the fourth power of an average temperature as done by Kiehl and Trenberth, 1997? Or, do we take each grid cell temperature to the fourth power and then average the Stefan-Boltzmann (SB) power from equation 1? The average of the 2016 HADCRU temperatures is 15.1°C. The SB energy emissions computed from this temperature (288K) are 391 W/m2 as commonly seen in the literature. If we compute the SB emissions from all the populated HADCRU grid cells in 2016 and average them, weighted by area, we get 379 W/m2. This is a small difference unless we compare it to the estimated difference that increasing CO2 might have. In the IPCC AR5 report, figure SPM.5 (page 14 of the report or you can see it here in the third figure) suggests that the total effect of man’s CO2 emissions since 1750 has been 2.29 W/m2, much less than the difference between the two calculations of the Earth’s emissions.
The comparison gets worse when we look at it over time. Figure 7 shows the computation of power emissions computed using a global average temperature or (Mean T)4. Figure 8 shows the calculation as done on each populated grid cell and then averaged or (Mean T4).

Figure 7

Figure 8
It seems likely that the differences from 1880 to 1950 are related to the number of populated cells and their distribution, but this is speculation at this point. One must wonder about the accuracy of this data. The comparison since 1950 is OK, except for the algebraic difference due to averaging temperature first or taking each temperature to the fourth power first and then averaging power. From 1950 to 2014, this difference averages 13 W/m2.
Discussion and Conclusions
I do not challenge the choice HADCRU made when they decided to create 100 statistical realizations of each grid cell and then choose the overall median value, weighted by cosine(latitude), as the average temperature for each hemisphere and then combine the hemispheres. This is a reasonable approach, but why is the result so different from a straightforward weighted average of the populated grid cells? To me, any complicated statistical output should line up with the simple statistical output, or the difference needs to be explained. The comparison between the two techniques over the period 1950 to 2016 is OK, although the HADCRU method results in a suspiciously higher temperature. I suspect the data from 1950 to 2016 is much more robust than the prior data. I would doubt any conclusions dependent upon the earlier data.
Their recommended calculation process is a bit troubling. They recommend averaging a sparse anomaly grid, then averaging a completely populated absolute temperature grid, and then sum the two averages. Then they explicitly instruct us not to select the same population of grid cells (anomaly and absolute), sum those, and average. Yet, the latter technique sums apples to apples.
Finally, it is very clear that using the SB equation to compute the Earth’s energy emissions with an estimated global average temperature is incorrect, this is how the emissions were computed in figure 7. When we compute the SB emissions from each HADCRU populated grid cell and then average the result, which basic algebra tells us is the correct way, we get the result in figure 8. Comparing the two suggests that there are significant problems with the data prior to 1950. Is this the number of null grid cells? Is it the areal distribution of populated grid cells? Is it a problem with estimated sea surface temperatures? Or perhaps some other set of problems? Hard to say, but it is difficult to have much confidence in the earlier data.
We are attempting to determine the effect of an increase in CO2. This results in an estimated “forcing” of about two W/m2. We also want to know if temperatures have increased one-degree C in the last 140 years. Is this data accurate enough to even resolve these effects? It is not clear to me that it is.
The R code and the data used to make the figures in this post can be downloaded here.
“Why tell the reader he can add the absolute 1961-1990 baseline temperature average to averaged anomalies and then expressly tell him to not add the absolute temperature grid to an anomaly grid?”
Andy, your Figure 2 is wrong. The 1961-1990 baseline average is a single number. Your 1961-1990 grid temperature baseline varies per grid. Of course they are not the same and your results are wrong because the grid anomalies are all referenced to the baseline average not the grid temperature.
To illustrate mathematically
Let:
a, b, c = anomalies
k = baseline average
A, B, C = absolute temperatures
A’, B’, C’ = your computed temperatures
Correct method
a = A – k
b = B – k
c = C – k
k = (A + B + C)/3
A = a + k
B = b + k
C = c + k
Your wrong method
A’ = a + A
B’ = b + B
C’ = c + C
Not sure if Andy’s right, but your method is absolutely wrong. Your ‘k’ is a single number (constant). The whole point of anomalies is to remove the location specific average and compare the location specific variations, which requires your baseline average for each location. Assume for example the baseline temperature averages for A B C are 270K, 285K, 300K. If at time ‘n’ the recorded temps are 271K, 286K, and 301K, their anomalies are 1,1,1 (all are 1 degree warmer than the baseline average). By your method, k is 285K and the anomalies are -14,1,16.
That’s the HADCRU method. You may disagree with it but that’s how they do it. I’m just pointing out why it’s different from Andy’s method. I will show later which is better
Not seeing the forest for all the trees? Breathing in tio much co2?
Dr. Strangelove, Figure 2 is the preferred technique, but the result is wrong due to poor data. With better data it would be correct. Read the post again and note several things. 1) the anomalies are computed for each series, using the series’ own 1961-1990 average. 2) The series within each grid cell are then averaged for every month where they have data, it is the anomalies that are averaged – not actual temperatures. 3) The 1961-1990 data for all series with data are used to make the absolute temperature grid for 1961-1990. 4) That grid is fully populated. 5) The anomaly grids are sparse, in early years sometimes only 10% of the grid cells have values.
Figure 1 uses a constant value (the average of the Absolute grid) to adjust the anomaly grids to actual temperatures. Here the 1961-1990 grid is reduced to one average and added to the average anomaly values for the target year. Just a shift in the Y axis from anomalies to average anomaly + k. This is the technique that HADCRU recommend. Not very satisfying since we need absolute temperature to compute power. Figure 1 is adding oranges to apples, they are different sets of grids.
Figure 2 is a better technique, since each grid’s 1961-1990 average is added to that grids anomaly, not a single global k, but the k for the specific grid cell. Apples added to apples, not simply a shift in the y axis. Nick Stokes and I had an exchange on all of this earlier in the comments. Check it out.
Andy
I see your anomalies are different from HADCRUT anomalies since you used a different computation method. I suppose the reason for your method is you think average (T^4) is more accurate than (average T)^4
I will show the latter is more accurate. Suppose you have two contiguous grids A and B, one sq. km. each. You measure the temperatures Ta = 300 K and Tb = 301 K. Using your method, the implicit assumption is grid A has uniform 300 K temperature throughout its surface. Then one centimeter across the grid border, temperature suddenly jumps to 301 K since you’re already in grid B.
Here’s a more realistic model. Ta and Tb are the temperatures at the centers of grids A and B respectively. Connect these two center points and you have a one-kilometer line. Divide this line into ten segments with linearly varying temperatures from Ta to Tb.
Temperature of the line segments:
300.0
300.1
300.2
300.3
300.4
300.5
300.6
300.7
300.8
300.9
301.0
Now you can apply your formula, X = average (T^4)
You can be more accurate by dividing the line into infinitely small segments and integrating T from Ta to Tb then calculate X
Exact formula for calculating X
X = (Tb^5 – Ta^5) / 5 (Tb – Ta)
Approximate formula for calculating X (HADCRU method)
Xa = ((Ta + Tb)/2)^4
Error = Xa – X
Your formula for calculating X (Andy’s method)
Xu = (Ta^4 + Tb^4)/2
Error = Xu – X
Compute the errors and you will see the error in your method is twice bigger than the error in HADCRU method
Dr Strangelove, your “more realistic” model makes assumptions not in evidence and facially unlikely.
First, your example uses a small delta (1K) between only two points. The “grid” contains thousands of points spread between 200 and 330 K. Even accepting your linear interpolation, the naive assumption that T(mean)^4 is better than (mean)T(i)^4 {for i =0 to n} requires that the delta T be small. Otherwise, your error grows exponentially.
Second, the assumption that a linear interpolation between grid cell centers is better than assigning the grid temperature to the whole cell might be true between cells in the North Pacific Gyre just before typhoon season, but mostly it’s not. (true, that is) When you’re comparing coastal desert plains to adjacent oceans, and mountains to either, we can be quite sure the temperature does not change linearly between grid centers. And, the cell temp is not determined by a single measurement in the center of the cell (save by chance). Rather, the cell temp is determined by temperatures reported from anywhere in the cell, and frequently adjusted by temperatures reported from ‘nearby’ (as in up to 1200km) cells.
It may be true in some instances that a naive assumption will yield a result closer to reality than the most meticulous application of the correct techniques. But, that only means it was a lucky guess, the error(s) cancelled out.
If the error grows exponentially in (mean T)^4 the error in mean (T^4) grows exponentially times 2
Your “most meticulous application of the correct techniques” assumes uniform temperature in the whole grid. More unrealistic than my “naive assumption”
Figure 5 and 7 speaks volumes. Five says, geometrical, evenly distributed, constant power of the heat.
Figure 7 says, someone is fudging. Because power cannot change without a change of source power. Source power is TSI. Nuff said. Over and out.
cheers/lit
It would appear that the “temperature” “data” used by climate “scientists” contains no actual temperature measurements. It only contains interpolations between temperature points which have themselves been “adjusted” to account for whatever. To top it off, the “scientists” make liberal use of averaging in the blind (and futile) hope that it will increase accuracy – hence the reporting of “anomaly” data to two decimal places, when the original measurements weren’t good to even one decimal place.
It occurred to me, however, that temperature isn’t the only measurement that is relevant here. We are, in essence, trying to determine whether the atmosphere is trapping more heat than normal. Temperature isn’t the only indicator. As any HVAC engineer will tell you, humidity is a huge factor.
I generated an example from one of the online psychrometric calculators (I beg forgiveness in advance for using British Engineering [or, as I call them, real] units, but I am a 1978 Purdue ME grad, so sue me). The example is for air of constant energy (enthalpy), the only thing that matters when looking at the Earth’s energy balance. At a constant 22 BTU/lb of humid air, the temperature could range from 67 F, 38.8% relative humidity, to 71 F, 28.5% relative humidity, and the energy content would be the same. (The extremes are 53 F at 100% RH, 92 F at 0% RH)
Until the temperatures in the record include both wet and dry bulb temperatures, we have absolutely no way of assessing whether the energy content of the atmosphere is changing.
Calculating an object as emitting as a black body doesn’t mean that emission= absorption. Emission, as you showed, is determined by emissivity (1in the case of black body) and temperature. Given that the earth average emissivity falls in the .7-.8 range means that using an emissivity of 1 adjusts the outgoing emission by approx 1%. If you are wanting to create a more accurate model, be sure to take into account the 20-30% of incoming solar that is being reflected back up and its effect on the atmosphere, else keep it simple and stick to the black body calculation.
Also, I realize it’s late for me so maybe I just can’t find it, but you are talking a lot about differences in calculating emitted radiation from surface, GHE, and impacts co2 maybe has on that effect but I don’t see any data about top of atmosphere radiation readings. If you are going to talk about co2 and its potential impacts on temperature, or lack thereof, why leave out data regarding how much outgoing long wave has been measured?
Brad Schrag, I mentioned TOA input and output at the top of the post and just assumed that they were 239 each +-5 to 10. I noted that some assume the magnitude of the GHE is the difference between this value and surface emissions ~390. Then I focused the discussion on the accuracy of the 390, which is a computed number based on dodgy input and theory. The TOA values can be discredited as well, but it is much easier to discredit the surface emissions.
So why not paint a bigger picture of consideration for surface temps? After all, if the earth were a black body (absorptivity of 1), sb says that the surface temp would be approx
279K (6C)
We know of course that it’s not. Given reflection of clouds, snow, ice, and absorptivity of the surface, we can safely put the absorptivity of the surface at .7, this changes surface temps to
255K (-18C).
It’s obvious that surface temps are no where near these levels on average. Using a modeled temp of 255 it’s very easy to see how GHE has kept temps above that. Using this model, black body emission from the surface would be about
240 W/m^2
So given that perfect black body surface emission equates to flux exiting ToA, how do you reconcile the fact that we are no where near 255K surface temp?
Andy, sorry for coming late to the game, but you might want to look at using one of the climate reanalysis data sets for your exercise, like CFSR or ERAI.
http://cci-reanalyzer.org/about/datasets.php