Those of you that have been with WUWT for a few years know that I often like to do hands-on experiments to illustrate and counter some of the most ridiculous climate change claims made on both sides of the aisle. On the alarmist side, you may remember this one:
Al Gore and Bill Nye FAIL at doing a simple CO2 experiment
Replicating Al Gore’s Climate 101 video experiment (from the 24 hour Gore-a-thon) shows that his “high school physics” could never work as advertised
Unfortunately, YouTube has switched off the video, but I’m going to try getting it posted elsewhere such as on Rumble. The graphs of temperature measurements and other images are still there.
Despite the fact that I proved beyond a shadow of a doubt that the experiment was not only fatally flawed, but actually FAKED, they are still using it as propaganda today on Al Gore’s web page.
They never took it down. Schmucks.
So along those lines, like Willis often does, I’ve been thinking about the recent paper published in Atmosphere by some of our brothers-in-arms (Willie Soon, The Connallys, etc) in climate skepticism,
Abstract
The widely used Global Historical Climatology Network (GHCN) monthly temperature dataset is available in two formats—non-homogenized and homogenized. Since 2011, this homogenized dataset has been updated almost daily by applying the “Pairwise Homogenization Algorithm” (PHA) to the non-homogenized datasets. Previous studies found that the PHA can perform well at correcting synthetic time series when certain artificial biases are introduced. However, its performance with real world data has been less well studied. Therefore, the homogenized GHCN datasets (Version 3 and 4) were downloaded almost daily over a 10-year period (2011–2021) yielding 3689 different updates to the datasets. The different breakpoints identified were analyzed for a set of stations from 24 European countries for which station history metadata were available. A remarkable inconsistency in the identified breakpoints (and hence adjustments applied) was revealed. Of the adjustments applied for GHCN Version 4, 64% (61% for Version 3) were identified on less than 25% of runs, while only 16% of the adjustments (21% for Version 3) were identified consistently for more than 75% of the runs. The consistency of PHA adjustments improved when the breakpoints corresponded to documented station history metadata events. However, only 19% of the breakpoints (18% for Version 3) were associated with a documented event within 1 year, and 67% (69% for Version 3) were not associated with any documented event. Therefore, while the PHA remains a useful tool in the community’s homogenization toolbox, many of the PHA adjustments applied to the homogenized GHCN dataset may have been spurious. Using station metadata to assess the reliability of PHA adjustments might potentially help to identify some of these spurious adjustments.
In a nutshell, they conclude that the homogenization process introduces artificial biases to the long-term temperature record. This is something I surmised over 10 years ago with the USHCN, and published at AGU 2015 with this graph, showing how the final product of an homogenized data is so much warmer than stations that have not been encraoched upon by urbanization and artificials urfaces such as asphalt, concrete, and buildings. By my analysis, almost 90% of the entire USHCN network is out of compliance with siting, and thus suffers from spurious effects of nearby heat sources and sinks.

In the new paper, here is a relevant papragraph that speaks to the graph I published in 2015 at AGU:
As a result, the more breakpoints are adjusted for each record, the more the trends of that record will tend to converge towards the trends of its neighbors. Initially, this might appear desirable since the trends of the homogenized records will be more homogeneous (arguably one of the main goals of “homogenization”), and therefore some have objected to this criticism [41]. However, if multiple neighbors are systemically affected by similar long-term non-climatic biases, then the homogenized trends will tend to converge towards the averages of the station network (including systemic biases), rather than towards the true climatic trends of the region.
The key phrase is “multiple neighbors, i.e. nearby stations.
Back on August 1, 2009, I created an analogy to this issue with homgenization by using bowls of dirty water. If the cleanest water (a good station, properly sited) is homgenized with nearby stations that have varying degrees of turbidity due to dirt in the water, with 5 being the worst, homgenization effectively mixes the clean and dirty water, and you end up with a data point for the station labeled “?” that is some level of turbidity, but not clear. Basically a data blend of clean and dirty data, resulting in muddy water, or muddled data.
In homgenization the data is weighted against the nearby neighbors within a radius. And so a station the might start out as a “1” data wise, might end up getting polluted with the data of nearby stations and end up as as new value, say weighted at “2.5”.

In the map below, applying a homogenization smoothing, weighting stations by distance nearby the stations with question marks, what would you imagine the values (of turbidity) of them would be? And, how close would these two values be for the east coast station in question and the west coast station in question? Each would be closer to a smoothed center average value based on the neighboring stations.
Of course, this isn’t the actual method, just a visual analogy. But it is essentially what this new paper says is happening to the temperature data.

And, it just isn’t me and this new paper saying this, back in 2012 I reported on another paper that is saying the same thing.
New paper blames about half of global warming on weather station data homogenization
Authors Steirou and Koutsoyiannis, after taking homogenization errors into account find global warming over the past century was only about one-half [0.42°C] of that claimed by the IPCC [0.7-0.8°C].
Here’s the part I really like: of 67% of the weather stations examined, questionable adjustments were made to raw data that resulted in:
“increased positive trends, decreased negative trends, or changed negative trends to positive,” whereas “the expected proportions would be 1/2 (50%).”
And…
“homogenization practices used until today are mainly statistical, not well justified by experiments, and are rarely supported by metadata. It can be argued that they often lead to false results: natural features of hydroclimatic times series are regarded as errors and are adjusted.”
So, from my viewpoint, it is pretty clear that homgenization is adding a spurious climate warming where there actually isn’t a true climate signal. Instead, it is picking up the urbanization effect which leads to warming of the average temperature, and adding it to the climate signal.
Steve McIntyre concurs in a post, writing:
Finally, when reference information from nearby stations was used, artifacts at neighbor stations tend to cause adjustment errors: the “bad neighbor” problem. In this case, after adjustment, climate signals became more similar at nearby stations even when the average bias over the whole network was not reduced.
So, I want to design an experiment to simulate and illustrate the “bad neighbor” problem with weather stations and create a video for it.
I’m thinking of the following:
- Use the turbidity analogy in some way, perhaps using red and blue food coloring rather than a suspended particulate, which will settle out. This is purely for visualization.
- Using actual temperature, by creating temperature controlled vials of water at varying temperature.
- Mixing the contents of the vials, and measuing the resultant turbidy/color change and the resultant temperature of the mix.
The trick is how to create individual temperature controlled vials of water and maintain that temperature. Some lab equipment, some tubing and some pumps will be needed.
Again purely for visual effect, I may create a map of the USA or the world, place the vials within it, and use that to visualize the results and measure the results.
I welcome a discussion of ideas on how to do this accurately and convincingly.
Anthony,
I would suggest taking known historical weather station data, randomly deleting a few, and see if the homogenization algorithms can recover the deleted data. This might be tried for areas with different topographic relief, areas with different humidity (enthalpy), different seasons (storminess), and areas with different variances.
I would be surprised if you would get agreement to the same precision as the original data. However, that in itself would be instructive as it would give insight on how homogenization might degrade the reliability of the station data, when the purpose is to improve the database.
I agree with this. Points to consider:
When you homogenize the temps the uncertainties in each station will combine, probably by root-sum-square, making the final uncertainty of the sum used in the average higher than the uncertainty of each individual homogenization station. Not much of any way to reduce that.
May I suggest Spell Check be used when typing your title, Einstein? There’s no such word as “homgenization”.
Never worked anywhere dairies exist have you homogenization is all over the place there and interestingly if it doesn’t exist why doesn’t my spellchecker flag it?
My posting (dated 2/23) appeared when the misspelled word appeared in the title. Now, (as of 2/24) that misspelling has been mysteriously corrected. Way ahead of you, Pauli!
Some people use programs that “translate” the spoken word into text. I don’t know if such programs include “spell check”.
I believe Anthony uses such a program. (Dragon Speak?)
Have you ever watched an old movie on, say, YouTube and turned on “closed caption” where it’s done “on the fly”? The results can be amusing, especially if the movie is, say, an old Sherlock Holmes trying to trying to caption a Scottish accent!
Generally authors’ of post appreciate and address typos being pointed out.
But the “Einstein” crack was way out of line.
The problem in using dirt is that the bigger particles will settle to the bottom faster and the lighter will also settle but much slower. Food coloring is better but the turbidity is only linear at higher concentration. Moreover, the temperature will affect the absorption maxima, which is not a linear relationship either. The standard material used to calibrate turbidimeter is Formazine. It gives a better linear relationship and can be opaque at high concentrations.
I would suggest you look into fluorescence. The fluorescence of compounds is affected by concentration and temperature. Fluorescein is an easy and good molecule to work with. You can play with these parameters to give you the best effect.
Colored water won’t work. When you mix you’ll be taking away some liquid from the dirty water to mix it with the clean thus reducing the amount of dirty water . The fraudsters aren’t doing that they are keeping the dirty value and changing the clean value … so the weight of the dirty never goes down …
Might be worthwhile (if no one has done this yet) to pull the UHSCN data and assign quality factors of 1 through 5 to the monitoring stations based on Anthony’s earlier siting analysis. Do a graph of the data, and resulting trend over time, using only quality 1 (best) stations, then 1 and 2, then 1, 2, and 3, etc. This would graphically demonstrate how the trend changes as lower and lower quality data is allowed to intrude.
I suspect homogenisation is amplifying urban heating.
Rural stations are more likely that city stations to suffer a break, but city stations also suffer breaks.
So the city urban heating lifts rural stations when they break, and growing urbanisation around rural stations lifts city stations when they break, resulting in an amplified uplift caused by UHI growing at different rates at in different regions.
I’ve never been certain about homogenisation works. Does it lower the hotspots while raising the cool spots? Or does it just raise the cool spots without touching the hotspots?
I would use vertical bar graphs indicating measured and homogonized data across a longitude line, such as San Francisco, Sacramento, Reno.
Of course, to any Californian, we see that San Francisco temperatures are moderated by proximity to the global oceanic heat sink, Sacramento on the other hand is in a very dry valley merely 3′ above sea level, yet 90 miles from the sea. Reno on the other hand sits at about 4,400′ elevation.
On any given summer day, San Francisco will be in the 60s, Sacramento above 100, Reno in the 80s. All for different climatic reasons.
Does Sacramento data pull up San Francisco and Reno? Does SF & Reno pull down Sac? Or perhaps these things “just are” … with apologies to Thoreau.
The the problem I’ve been wrestling with is the containers. If they are open, as they are heated to the different desired temperatures, I’m not sure what the effects of the varying rates of evaporation might be? Maybe significant, and maybe not.
So, spitballing here, how about filled-to-the-top closed containers with the desired colored water? A tube at the top allows the colored water to be forced out in proportion to the heat applied. Apply sufficient heat to each container such that more colored solution is dumped into the (?) container from the warmer containers
Run the tubes to the (?) container.
I’d think you’d have to account for the lengths of the tubes as either another variable or if the heat loss in the run of the tube is negligible, the tube lengths could be longer or shorter to represent the varying distances from the (?) jar.
Already I’m seeing details I’ve left out. Of course there’s a bit of calculating of the thermal expansion coefficients, tube inside diameters, lengths and whatnot.
Those aquarium heaters may be useful for getting the filled jars to desired nice, warm, different temperatures. There would need to be a valve on each tube to hold in the contents of the various jars. I’d think the valves would need a mechanism to allow them to all be opened at once when the jar temperatures stabilized so you’d get instantaneous ‘homogenization’ into the (?) jar, as that is what occurs mathematically with temperature data homogenization.
That’s my ball of yarn for others to pick apart or add a few wraps.
Good try. Based on the complexity of your design, I would simply add a tesla coil emitting sparks for added effect.
I am joking. At least you have come up with something.
You have come up with an idea but realise it’s shortcomings.
Thanks, Alex. Yes, just spitballing, but enough of it and someone is bound to see the path forward.
The box is painted matt black inside.
Dimensions of box- essentially a cube about one foot on a side.
The mirror should be a sheet of mirror plastic (easy to cut to whatever size you want and also not fragile).
LEDs should be white and adjusted for intensity.
The membrane should be a reflective type and stretched enough to be uniform. It could possibly be a white elastic sheet.
The video camera is a small variety that is connected to a laptop/PC.
Pressing on the membrane with your finger should make a dark spot appear on the screen. Greater or lesser pressure would make the spot larger or smaller.
The analogy is not the pressure applied but the size and intensity of the dot and how it can be reduced by homogenising with the background. Several pressure points at different ‘intensities’ can be introduced simultaneously.
The apparatus is not fragile and the components can be easily replaced/retensioned. Filters and solutions with pumps can get very complicated.
Almost any distribution that is “non-normal” requires special handling.
This came up when our mate Nick Stokes was blending temperatures over land and water cells on one of his stupid blog pages.
Don’t mention Old Nick. When you mention Old Nick then Old Nick comes.
Paraphrasing a Chinese saying.
I would second this. From Tmax/Tmin to seasonal differences to hemisphere differences there are large temperature differences that result in non-normal distributions. Anomalies attempt to remove the variance between absolute temperatures so averaging can be used to make “trends” can be discerned. Statistically this removes so much information that any calculated metric is questionable as to its value.
A way to show this is to carefully make trends at various stations in various regions. For every station that has little warming or even cooling over the last 150 years there must be also be a station with twice the warming to reach the average that is portrayed in the GAT.
I have become convinced that this is only possible by using UHI stations that have considerable warming.
Anthony, humanity is carrying out major climate experiments of AGW already and no-one’s looking.
Please, please look at the way enclosed bodies of water are warming at double or treble the rate expected. The Sea of Marmara is a perfect example.
In articles at the blog TCW Defending Freedom I suggest the oil, surfactant and lipid smoothing of the water surface is lowering albedo, reducing evaporation and preventing the production of salt aerosols by wave suppression.
You could start by repeating Benjamin Franklin’s oil drop experiment on Mount Pond. All you need is a lake and 5ml of olive oil.
Latedt blog post is titled Cold Comfort.
A search will find an image of a smooth on FEN Lake at UEA which you will find amusing, right under their noses.
JF
Anthony, closer to home at WUWT you will find an entry to the completion you ran some time ago which covers the territory with references. Since then the role of oleaginous plankton has been emphasised by Marmara’s outbreak of mucilage caused by diatom blooms. I believe that a couple of the Great Lakes are warming at a high rate.
Rember Tom Wigley’s ‘Why the blip? ‘ Sunk oil carriers during WWII.
I’ve seen a smooth over a hundred miles across from a beam Porto to a couple of hundred miles short of Madeira, ten of thousand square miles. Franklin should be alive this day.
Any non-CO2 contribution to AGW reduces the need to trash civilisation. This mechanism explains a lot about the anomalies in the one control knob theory and can be addressed by reducing oil surfactant and sewage dumping into rivers and the oceans.
JF
see https://en.wikipedia.org/wiki/UEA_Broad, top image, for the UEA smooth.
JF
Anthony,
Ross McKitrick is your Expert go-to source on how to do this. As you know, he has studied all the paleoclimate quackery statistical methods the ‘consensus” has used create hockey sticks.
Anthony, can you point me to an up to date compilation of classifications (1-5) for California weather stations? I am conducting some research on the 102 California stations with data extending back at least 90 years.
Homogenization includes two different corrections:
1- Correction for breakpoints – e.g. when a station is moved. That is corrected by PHA.
2- Correction for trends – e.g. UHI effect or that trees has grown around the station.
Both those corrections make sense if they are applied in a correct way, but they open possibilities if the adjuster wants to “prove” global warming.
Some years ago I found a paper describing NOAA´s homogenization method. Unfortunately I have not been able to find it again.
What really struck me was that the included a “regional climate trend” in their method.
In short it worked like this – the model included a climate trend saying that the temperature should increase with X C/ year. If a station did not show that increase there was something wrong with the measurement. The value should have been higher (due to the climate trend).
The algorithm works the way that it keeps todays value unchanged. Consequently the historical values should be adjusted downwards.
I have tried hard to find a present detailed description of NOAA´s homogenization algorithm in order to see if they really use the “climate trend”-correction. Since I have not found the description, I do not really know if they apply this today.
Tony Heller has done a very interesting work where he shows that the corrections are proportional to the increase in atmospheric carbon dioxide.
One can suspect that the algorithm includes a “climate trend” that is assumed to be proportional to atmospheric CO2. If that is the case it can explain most of the so called “adjustments”.
My question is thus if NOAA use a “climate trend”- correction in their homogenization algorithm ?
My own experience of UHI, mainly from rural France, is that it affects even small villages and hamlets of a few hundred people. In areas like the East Midlands of England where towns and villages have little or no open spaces between them then it is much harder to detect without the aid of a thermometer.
The other thing about UHI is it is less noticeable on windy days/nights.
I think that virtually the whole of England is affected by UHI, as is Central Scotland, and South Wales. The Benelux countries will have similar issues.
How you model the wind related contamination I don’t know.
Don’t think your proposed experiment will show much if anything. You have to work with the real world data, and show that there is an effect in it. What you’re proposing is argument by analogy, which rarely convinces and never proves anything anyway.
Could you just iteratively to plot, for each station, what its temperature is with and without homogenization?
The proposition you are trying to falsify is this: with homogenization, the mean of the temperature of the set of stations is higher than the mean of the stations taken individually from a set of the same stations each of which has had no homogenization.
So if you take homogenization off one station at a time, and see what the results are, noting the temp of each station for which its been removed as you do so, you should be able to see whether the mean of the set rises or falls.
Maybe I am missing something. In any case, I think the most important first step is to state clearly in quantitative terms what the proposition is that you are seeking to falsify.
Don’t even think you need do it one at a time, do you? Why not just take the entire set, take the mean with and without homogenization. Is it higher or lower, should be quite simple.
If I understand the argument, you are saying that when you take 10 stations raw scores, and homogenize, the average temperature of the ten rises. This should be quite easy to check without any physical experiment.
Or have I missed something?
If you don’t take the uncertainties of the measurements into account then you won’t be able to tell if the mean of the set rises and falls an amount outside the propagated uncertainty.
With all the different ideas, it occurs to me that another persuasive factor can be to see the same point made by a series of different analogies. Picking two (or even three) ideas could be more powerful than just one.
This also allows for a kid-friendly chocolate milk experiment and a more elaborate instrumented experiment that “confirms our simple experiment”
A fundamental problem with homogenization is the rate at which the correlation decays with distance from the source station. It is assumed for simplicity to imagine a circle around the point measured with the fall-off in correlation corresponding only with the distance from the centre. However in the real world this is demonstratably not so!
For example, Australian stations correlate poorly east or west of each other but are more strongly correlated north or south.
The only way to approach reality is to allow for this anisotropy and the best way to do that is to measure the covariance of temperature in the real world via observations of the shape of its atmospheric fields and by mapping the real terrain!
Anthony, you could make a compelling but simple – perhaps animated – demonstration on paper with a number of points marking stations with circles or ellipses around them and comparing the result when the correlation decay is circular or elliptical.This would demonstrate the error of homogenised data created by assuming that temperature fields are isotropic*, which assumes a constant correlation decay in all directions.
*Hansen and Lebedeff (1987) assume the isotropy of the covariance of temperature!
**Jones, DA & Trewin, Blair. (2000). The spatial structure of monthly temperature anomalies over Australia.
The daily temperature profile is based on the rotation of the earth which determines the angle of the sun. The profile is close to sin(t) during the day. It’s not quite a sine at night but its close.
If you take two locations separated east-west you have two profiles, sin(t) and sin(t + a). The correlation of those two curves is cos(a). As I have calculated it that’s about 50 miles for a correlation factor of 0.8. Anything less than 0.8 I would consider to be poor correlation.
The vertical factor north/shouth is also a sine function because the angle of incidence of the sun insolation changes as you move away from the equator. That’s probably a cos(a) function. At the equator you get max insolation, i.e. cos(0), and as you go north or south the insolation amount goes down. The difference between two locations would be cos(a) and cos(a + b). I haven’t calculated that correlation factor, maybe sin(b)?
This is just distance. Of course for sin(t + a) “a” is a function of several factors such as elevation, humidity, terrain, distance, etc.
Turbidity mixing and color mixing cannot be measured in the same way. Let’s take turbidity as a measure of light passing through something: you measure total light. On the other hand, mixing different colors will need measurements at different wavelenghts and this introduces too much confusion and explanations needed. My bet would be to something creating “shades of gray” or of color in a certain small range of wavelenghts. In a way that the increase in gray or color can be directly related to an increase (or a decrease) of temperature.
Another vote for chocolate milk. 😜
If you mix different liquid colors, you’re going to end up with black eventually.
If color is used for the visual analogy, it will need to be the same color but different amounts added to produce stronger of lighter colors that would easily show the differences visually. (Communicate the point “at a glance”.)
What Anthony didn’t tell is what resources he has available.
And yet another vote for chocolate milk. 😜😜
I wondered about your votes for chocolate milk so I did a Crtl-F and put in “chocolate milk”.
I seem to have missed a number of comments about it.
Yummm …er … Hmmm … if the desired visual is to show that the empty “vial” in the center is filled in with some from the surrounding vials, I don’t think chocolate milk will show enough of a contrast. (Unless one or more of the vials was straight chocolate syrup.) 😎 😎
What adjustments do you expect to have a bell curve +/- adjustment with a net0 result?
What adjustments do you expect to have a trend that’s almost linear going up?
How likely would you expect the adjustments to be responsible for the majority of the calculated temperature rise?
How much of the calculated temperature has been filled/adjusted by using bogus climate models that are giving circular validation (predict a rise so the infill/adjustments enforce that assumption & verify the model, circular reasoning)?
I think a good visual analogy may be sanding a wood part with some fonctionality.
Homogenizing may be likely to sanding out the roughness of the surface of a wood part. The roughness would be “unwanted noise on the signal”, we just prefer to have a soft surface in our wood part.
But sanding should not change the functional shape of the part, because then the part losses its functional meaning, it does not work any more. So if our part has a relief which is functional, like the relief in a key that opens a lock, or teeth that assemble with another part to transmit movement, rotation, for instance, then that relief is significant, and sanding can erode the meaninful shape of the part.
So homogenization, as sanding, can be eliminating the sense of the signals we have.
In the case of the temperature record, the stations contain two meaninful signals at least: one is the global temperature evolution (which exists and depends in different factors, including sun irradiation as well as CO2 and CH4 and H2O vapour) aka Global Warming, sorry, Climate Change, but it also includes an strong signal that is linked to urbanization induced warming (which is different in rural areas and city areas, obviously), and that includes heating, Air Conditioning, and Industrial heat sources.
Homogenization that “kills” the urbanization warming signal detracts from dataset value, and changes the average. When homogenizing, the average should be preserved, as well as local signals, and the algorithm should only chase noise generated by changes in sensors, for instance, or stations movements (very careful here).
Thanks for your work. I appreciate very much your sensible approach to these issues.
The need for PAH comes from the bad idea of using the average of long continuous temperature (anomaly) time series. You should refute the continuous time series method by using a method that does not use time series and thus does not need homogenization.
It goes like this: Calculate the regional area weighted average temperature each month. A new triangular grid is needed each month for the region. That takes care of missing temperature measurements, so you don’t have to take care of continuity at each measurement location. As a result you will get the best estimate of the average temperature for the studied region for each month. So, you have a continuous temperature time series without any homogenization.
Homogenization should include site quality ratings and high rated sites should be used to influence low rated site readings and not vice versa.
If homogenization is done blindly and there are more low quality sites than high quality sites, the reverse happens and the high quality sites adjusted data are corrupted while the low quality sites have much smaller corrections simply because they are in the majority.
Chart 1 clearly shows this. The corrected data tracks the low quality stations and the high quality stations are labeled as the bald faced liars.
Curiously, the adjusted data trend is even a bit higher than what the low quality stations indicate. (Those who write the homogenization code have also put their finger on the scales?)
The GHG hypothesis is wrong if it says adding CO2 MUST raise average temperature. That is a simple matter to prove false mathematically.
There is a 4th power relation between radiation and temperature but a linear relation between average temperature and actual temperature.
Thus by changing the distribution of temperature while decreasing the average temperature you can actually increase the radiation!!!! Completely opposite to GHG theory.
This follows from the Holder Inequality for Sums for those seeking a formal proof.
The missing ingredient is variance. By reducing the variance pair-wise homogenization maintains the average temperature but reduces calculated outgoing radiation. However, if one then corrects for radiation, since energy must be conserved, you must increase the average temperature to maintain the same radiative balance.
At fist look, pairwise homogenization looks like a way to increase average temps by manipulating variance while maintaining the same radiative balance.
A neat trick whomever figured it out. Of course it is mathematical trickery (nonsense) but the layman would never catch it.
Take two identical objects. Temp 0K and 10K. Average temp 5K. Radiation is proportional to 0^4 + 10^4 = 10,000.
Now homogenize these 2 objects to 5K each. Average temp remains 5K. But radiation is proportional to 5^4 + 5^4 = 1250.
But hang on. Homogenization had resulted in a loss of energy. Which violates a fundamental requirement that energy must be conserved.
So homogenization required that we raise the average temperature so as to not violate the conservation of energy.
How much? Well it turns out we must raise the average temp from 5K to 8.4K during homogenization to avoid creating/destroying energy.
I would have thought the error was their calculation of radiation forcing when based on average temperatures instead of smaller grid squares or actual values. If you underestimate the long wave radiation leaving earth, your model will run hot amongst other problems.
Actual energy content of a volume of space/matter depends on many factors. For gasses it includes the mixture of gasses & water content which each have their own heat capacity, pressure changes temperature without changing energy content per kg (but higher temps will increase radiation & convection), higher pressure means more matter in the same area so contains more energy per volume. Temperature is not a universal scale or measure of energy.