Correcting and Calculating the Size of Adjustments in the USHCN
By Anthony Watts and Zeke Hausfather
A recent WUWT post included a figure which showed the difference between raw and fully adjusted data in the United States Historical Climatology Network (USHCN). The figure, used in that WUWT post was from from Steven Goddard’s website, and in addition to the delta from adjustments over the last century, included a large spike of over 1 degree F for the first three months of 2014. That spike struck some as unrealistic, but knowing that a lot of adjustment goes into producing the final temperature record, some weren’t surprised at all. This essay is about finding the true reason behind that spike.
One commenter on that WUWT thread, Chip Knappenberger, said he didn’t see anything amiss when plotting the same data in other ways, and wondered in an email to Anthony Watts if the spike was real or not.
Anthony replied to Knappenberger via email that he thought it was related to late data reporting, and later repeated the same comment in an email to Zeke Hausfather, while simultaneously posting it to Nick Stokes blog, who had also been looking into the spike.
This spike at the end may be related to the “late data” problem we see with GHCN/GISS and NCDC’s “state of the climate” reports. They publish the numbers ahead of dataset completeness, and they have warmer values, because I’m betting a lot of the rural stations come in later, by mail, rather than the weathercoder touch tone entries. Lot of older observers in USHCN, and I’ve met dozens. They don’t like the weathercoder touch-tone entry because they say it is easy to make mistakes.
And, having tried it myself a couple of times, and being a young agile whippersnapper, I screw it up too.
The USHCN data seems to show completed data where there is no corresponding raw monthly station data (since it isn’t in yet) which may be generated by infilling/processing….resulting in that spike. Or it could be a bug in Goddard’s coding of some sorts. I just don’t see it since I have the code. I’ve given it to Zeke to see what he makes of it.
Yes the USHCN 1 and USHCN 2.5 have different processes, resulting in different offsets. The one thing common to all of it though is that it cools the past, and many people don’t see that as a justifiable or even an honest adjustment.
It may shrink as monthly values come in.
Watts had asked Goddard for his code to reproduce that plot, and he kindly provided it. It consists of a C++ program to ingest the USHCN raw and finalized data and average it to create annual values, plus an Excel spreadsheet to compare the two resultant data sets. Upon first inspection, Watts couldn’t see anything obviously wrong with it, nor could Knappenberger. Watts also shared the code with Hausfather.
After Watts sent the email to him regarding the late reporting issue, Hausfather investigated that idea, and ran some different tests and created plots which demonstrate how the spike was created due to that late reporting problem. Stokes came to the same conclusion after Watts’ comment on his blog.
Hausfather, in the email exchange with Watts on the reporting issue wrote:
Goddard appears just to average all the stations readings for each year in each dataset, which will cause issues since you aren’t converting things into anomalies or doing any sort of gridding/spatial weighting. I suspect the remaining difference between his results and those of Nick/myself are due to that. Not using anomalies would also explain the spike, as some stations not reporting could significantly skew absolute temps because of baseline differences due to elevation, etc.”
From that discussion came the idea to do this joint essay.
To figure out the best way to estimate the effect of adjustments, we look at four difference methods:
1. The All Absolute Approach – Taking absolute temperatures from all USHCN stations, averaging them for each year for raw and adjusted series, and taking the difference for each year (the method Steven Goddard used).
2. The Common Absolute Approach – Same as the all absolute approach, but discarding any station-months where either raw and adjusted series are missing.
3. The All Gridded Anomaly Approach – Converting absolute temperatures into anomalies relative to a 1961-1990 baseline period, gridding the stations in 2.5×3.5 lat/lon grid cells, applying a land mask, averaging the anomalies for each grid cell for each month, calculating the average temperature for the whole continuous U.S. by a size-weighted average of all gridcells for each month, averaging monthly values by year, and taking the difference each year for resulting raw and adjusted series.
4. The Common Gridded Anomaly Approach – Same as the all-gridded anomaly approach but discarding any station-months where either raw and adjusted series are missing.
The results of each approach are shown in the figure below, note the spike has been reproduced using method #1 “All Absolutes”:
The latter three approaches all find fairly similar results; the third method (The All Gridded Anomaly Approach) probably best reflects the difference in “official” raw and adjusted records, as it replicates the method NCDC uses in generating the official U.S. temperatures (via anomalies and gridding) and includes the effect of infilling.
The All Absolute Approach used by Goddard gives a somewhat biased impression of what is actually happening, as using absolute temperatures when raw and adjusted series don’t have the same stations reporting each month will introduce errors due to differing station temperatures (caused by elevation and similar factors). Using anomalies avoids this issue by looking at the difference from the mean for each station, rather than the absolute temperature. This is the same reason why we use anomalies rather than absolutes in creating regional temperature records, as anomalies deal with changing station composition.
The figure shown above also incorrectly deals with data from 2014. Because it is treating the first four months of 2014 as complete data for the entire year, it gives them more weight than other months, and risks exaggerating the effect of incomplete reporting or any seasonal cycle in the adjustments. We can correct this problem by showing lagging 12-month averages rather than yearly values, as shown in the figure below. When we look at the data this way, the large spike in 2014 shown in the All Absolute Approach is much smaller.
There is still a small spike in the last few months, likely due to incomplete reporting in April 2014, but its much smaller than in the annual chart.
While Goddard’s code and plot produced a mathematically correct result, the procedure he chose (#1 The All Absolute Approach) comparing absolute raw USHCN data and absolute finalized USHCN data, was not, and it allowed non-climatic differences between the two datasets, likely caused by missing data (late reports) to create the spike artifact in the first four months of 2014 and somewhat overstated the difference between adjusted and raw temperatures by using absolute temperatures rather than anomalies.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.



Hi Anthony,
It looks like that there is still a little bit less that 1°F temperature adjustment with your last 3 methods since the 40’s. Why did not you comment about that ?
REPLY: I did, see –“The one thing common to all of it though is that it cools the past, and many people don’t see that as a justifiable or even an honest adjustment.” – Anthony
So there is a need to adjust some data even after the year 2000. Is there any chance that the meteorologist in the USA will learn to report their results so that no adjustments are necessary?
And will the temperatures from about 2005 to 2014, which at this point of time seem unadjusted, stay that way?
So there is a need to adjust some data even after the year 2000. Is there any chance that the meteorologists in the USA will learn to report their results so that no adjustments are necessary?
And will the temperatures from about 2005 to 2014, which at this point of time seem unadjusted, stay that way?
So what i take away from this analysis is, under any of the 4 approaches, the raw temperature data has been adjusted to reflect an additional temperature increase of about 1 degree F over the past 60 to 70 years. Why is this? It does not on its face seem to be reasonable. It needs to be explained in a clear and believable fashion so that I and others do not conclude that, intentional or not, it reflects the biases of those who want to believe in the worst case scenarios of climate change.
While Goddard’s analysis may have exaggerated the differences between raw and adjusted data, I still find it more than curious that past temperatures are lowered and present or near-present temperatures are adjusted higher, always resulting in a rising trend. Sorry, I find it hard to believe that this is coincidental.
Shouldn’t the effects of UHI dictate that adjustments should be cooling the present to reflect the UHI phenomenon?
The fact that they adjust data at all means they aren’t measuring it correctly.
Andrew
So, an increase of ~ 0.7 deg F (< 0.4 deg C) in 115 years, of which 1/2 (~ 0.2 deg C) can be attributed to mankind. Phew, it's making me sweat!
Why is the most negative adjustment centered around 1940? Seems convenient to apply the largest cooling adjustment to the hottest period of the century because it came before significant CO2.
From here:
http://www.ncdc.noaa.gov/oa/climate/research/ushcn/
Under “Station siting and U.S. surface temperature trends”
“Photographic documentation of poor siting conditions at stations in the USHCN has led to questions regarding the reliability of surface temperature trends over the conterminous U.S. (CONUS). To evaluate the potential impact of poor siting/instrument exposure on CONUS temperatures, The Menne et al. (2010) compared trends derived from poor and well-sited USHCN stations using both unadjusted and bias-adjusted data. Results indicate that there is a mean bias associated with poor exposure sites relative to good exposure sites in the unadjusted USHCN version 2 data; however, this bias is consistent with previously documented changes associated with the widespread conversion to electronic sensors in the USHCN during the last 25 years Menne et al. (2009) . Moreover, the sign of the bias is counterintuitive to photographic documentation of poor exposure because associated instrument changes have led to an artificial negative (“cool”) bias in maximum temperatures and only a slight positive (“warm”) bias in minimum temperatures.”
Are they saying that a poorly sited station, say one near an airport tarmac, does this:
On a day with an actual high of 95 F, it reports 98 F and later when the low is 85 F it reports 89 F, which then appears that the max temp is “only” being reported 3 degrees high while the low is 4 degrees high. Would the average bias then be 3.5 degrees, making bias in the max temp (3 degrees) “cool” while the mim temp (4 degrees) be a “warm” bias?
Just wondering.
This issue speaks clearly to the problem of method. Without a standard anyone can pretty much mash the data as they like and adjust away. This makes it easy for alarmists to start their caterwauling. Without a standard there is a justifiable call of foul when anyone publishes an adjusted data set. So sorry Steve Mosher but your BEST aint! It also legitimizes the discussion that an average temperature is meaningless and has no purpose. Perhaps an interim solution would be to publish a Steve Goddard style absolute raw vs adjusted as a fair warning device. I think his method has the most merit in terms of showing what is going on. What it demonstrates is the total adjustment and the fact that stations are massaged even when they are putting out good data. I think we should remember the only actual purpose for a large area average is to observe change over time. The more you hash and adjust the less likely you are to actually observe that difference and the more likely your going to see artificial non-real changes. Particularly in a chaotic system.
v/r,
David Riser
Gridding, infilling, adjusting, why is this so complicated?
Step one: read thermometer.
Step two: compare to previous reading.
Everything else is nonsense.
If you don’t have data, don’t make it up.
Honestly I can’t wait till Anthony’s paper comes out!
I mean, it’s hard to take steps 2-400 of temperaturology seriously when step 1 is known to be wrong.
Andrew
Interesting post. However, could I suggest not misusing the term “absoute temperature” in this context. It has a well defined meaning: the temperature in kelvin, it does not seem to be an approriate choise to refer to a real temperature as opposed to a temperature anomaly.
Could I suggest “actual temperature” to emphasise that you are refering to a real temperature measurement and not anomalies, where that is needed.
Two weeks ago I was called crazy on Steve Goddard’s blog for suggesting that his new hockey stick of adjustments was due indeed to late data reporting as I pressed him to also display both raw/final plots and single station plots.
(A) I wrote:
“There’s just no easy five minute way to confirm your claim, so far, something that can be conveniently plotted using data plotted outside of your own web site. Normally such a jump would be considered a glitch somewhere, possibly a bug, and such things periodically lead to correspondence and a correction being issued. But here, motive is attached and around in circles skeptics spin, saying the same thing over and over that few outside of the blogosphere take seriously. I can’t even make an infographic about this since I am currently not set up as a programmer and I can’t exactly have much impact referencing an anonymous blogger as my data source. How do I easily DIY before/after pairs of plots to specific stations? Or is this all really just a temporary artifact of various stations reporting late etc? As a serious skeptic I still have no idea since the presentation here is so cryptic. I mean, after all, to most of the public, “Mosher” is an unknown, as is he to most skeptical Republicans who enjoy a good update for their own blogs.”
(B) After pressing the issue, asking for before/after its, Goddard replied:
“I see. Because you have zero evidence that I did anything wrong, I must be doing something wrong. / Sorry if the data upsets your “real skeptic” pals.”
“Take your paranoid meds.”
“I believe you have completely lost your mind, and are off in Brandonville.”
(C) His cheerleader team added:
“Jesus, is PCP unpredictable.”
“I’m gonna have to call transit police to have you removed from the subway.”
http://stevengoddard.wordpress.com/2014/04/26/noaa-blowing-away-all-records-for-data-tampering-in-2014/
(D) I think Goddard has one of the most important blogs of all since his ingenious variations on a theme are captivating and often quite entertaining and he’s great at headline writing and he is responsible for many DrudgeReport.com links to competent skeptical arguments and Marc Morano of ClimateDepot.com along with Instapundit.com and thus many conservative bloggers too often pick up on his posts. He has recently added a donation button and certainly deserves support, overall. He is highly effective exactly because his output is never buried in thousand word essays full of arcane plots, like is necessarily the case with this clearinghouse mothership blog.
JohnWho says:
May 10, 2014 at 7:06 am
From here:
http://www.ncdc.noaa.gov/oa/climate/research/ushcn/
Under “Station siting and U.S. surface temperature trends”
“Photographic documentation of poor siting conditions at stations in the USHCN has led to questions regarding the reliability of surface temperature trends over the conterminous U.S. (CONUS). To evaluate the potential impact of poor siting/instrument exposure on CONUS temperatures, The Menne et al. (2010) compared trends derived from poor and well-sited USHCN stations using both unadjusted and bias-adjusted data. Results indicate that there is a mean bias associated with poor exposure sites relative to good exposure sites in the unadjusted USHCN version 2 data; however, this bias is consistent with previously documented changes associated with the widespread conversion to electronic sensors in the USHCN during the last 25 years Menne et al. (2009) . Moreover, the sign of the bias is counterintuitive to photographic documentation of poor exposure because associated instrument changes have led to an artificial negative (“cool”) bias in maximum temperatures and only a slight positive (“warm”) bias in minimum temperatures.”
Are they saying that a poorly sited station, say one near an airport tarmac, does this:
On a day with an actual high of 95 F, it reports 98 F and later when the low is 85 F it reports 89 F, which then appears that the max temp is “only” being reported 3 degrees high while the low is 4 degrees high. Would the average bias then be 3.5 degrees, making bias in the max temp (3 degrees) “cool” while the mim temp (4 degrees) be a “warm” bias?
Just wondering.
——————–
They’re saying that their historical instrument calibration history is so sketchy that they have to try to adjust for an unmeasured (and probably unmeasurable) instrument bias over time, complicated by new biases introduced with new instruments, in uncalibrated and nonstandardized data environments. In short, the record is GIGO…
…so, we’re still too stupid to read a thermometer
There was just this short window around 2002, when we were able to
It’s a hockey stick! How dare you question it!
All adjustments are negative and generally greater the farther back from present. The supposedly “problematic” 1940’s show greater adjustments in all but S. Goddard’s method. This doesn’t explain the rationale of seemingly always cooling the past with lesser adjustments as the climate “warms.”
Nik, Goddard is correct, he did nothing wrong, he explained his methodology clearly. Is the hockey stick real in this case, it would depend on your standard….. oh there isn’t one. So yes over time the hockey stick may go down but that will be because of the “readjustment” of the record and the completion of the 2014 year. There is probably a good case that it sensationalizes the issue, which creates discussion. hmmm not a bad thing I think, thanks Anthony for this continuation of the discussion!
Much of the apparent spike in absolute temperatures is a artifact of infilling coupled with late reporting. USHCN comprises 1218 stations, and was a selected subset of the larger ~7000 station coop network chosen because nearly all the stations have complete data for 100 years or more. After homogenization, NCDC infills missing data points in the data with the average monthly climatology plus a distance weighted average of surrounding station anomalies. Unfortunately not all stations report on time, so in the last few months a lot more stations are infilled. These infilled values will be replaced with actual station values once those stations report.
Infilling has no real effect on the overall temperature record when calculated using anomalies, as it mimics what already happens during gridding/spatial interpolation. However, when looking at adjusted absolute temperatures vs raw absolute temperatures it can wreck havoc on your analysis, because suddenly for April 2014 you are comparing 1218 adjusted stations (with different elevations and other factors driving their climatology) to only 650 or so raw stations reporting so far. Hence the spike.
Chip Knappenberger also pointed out that a good test of whether or not the spike was real was to compare the USHCN adjusted data to the new climate reference network. NCDC conveniently has a page where we can see them side-by-side, and there is no massive divergence in recent months (USHCN anomalies are actually running slightly cooler than USCRN):
http://www.ncdc.noaa.gov/temp-and-precip/national-temperature-index/
Unfortunately there has been relatively little adjustments via homogenization from 2004 to present (the current length of well-spatially-distributed CRN data), so comparing raw and adjusted data to USCRN doesn’t allow us to determine if one or the other fits better.
For folks arguing that we should be using absolute temperatures rather than anomalies: no, thats a bad idea, unless you want to limit your temperature estimates only to stations that have complete records over the timeframe you are looking at (something that isn’t practical given how few stations have reported every single day for a century or more). Anomalies are a useful tool to deal with a changing set of stations over time while avoiding introducing any bias.
As far as the reasons why data is “adjusted”, I don’t really want to rehash everything we’ve been arguing about on another thread for the last two days, but the short version is that station records are a bit of a mess. They were set up as weather stations more than climate stations, and they have been subject to stations moves (~2 per station over its lifetime on average), instrument changes (liquid in glass to MMTS), time of observation changes, microsite changes over 100 years, and many other factors. Both time of observation changes and MMTS changes introduced a cooling bias; station moves are a mixed bag, but in the 1940s and earlier many stations were moved from building rooftops in city centers to airports or wastewater treatment plants, also creating a cooling bias. Correcting these biases is why the adjustments end up increasing the century-scale trend, particularly in the max data (adjustments apart from TOBs actually slightly lower the century scale trend in min data).
Its also worth pointing out that both satellite and reanalysis data (not using the same surface records) both agree better with adjusted data than raw data: http://rankexploits.com/musings/2013/a-defense-of-the-ncdc-and-of-basic-civility/uah-lt-versus-ushcn-copy/
Anthony or Zeke: For those who suspect there are similar results globally, is there also a GHCN Final MINUS Raw Temperature analysis graph anywhere?
What is the rationale for adjusting data in the first place? Why not just use the raw data?
Anthony,
Thanks for the explanation of what caused the spike.
The simplest approach of averaging all final minus all raw per year which I took shows the average adjustment per station year. More likely the adjustments should go the other direction due to UHI, which has been measured by the NWS as 8F in Phoenix and 4F in NYC.
@stevengoddard
Your are welcome, you should make a note of the issue on those posts that use the graph so that people that see it don’t think it is a data tampering issue, but simply an artifact of a method combined with missing data due to late reporting.