Correcting and Calculating the Size of Adjustments in the USHCN
By Anthony Watts and Zeke Hausfather
A recent WUWT post included a figure which showed the difference between raw and fully adjusted data in the United States Historical Climatology Network (USHCN). The figure, used in that WUWT post was from from Steven Goddard’s website, and in addition to the delta from adjustments over the last century, included a large spike of over 1 degree F for the first three months of 2014. That spike struck some as unrealistic, but knowing that a lot of adjustment goes into producing the final temperature record, some weren’t surprised at all. This essay is about finding the true reason behind that spike.
One commenter on that WUWT thread, Chip Knappenberger, said he didn’t see anything amiss when plotting the same data in other ways, and wondered in an email to Anthony Watts if the spike was real or not.
Anthony replied to Knappenberger via email that he thought it was related to late data reporting, and later repeated the same comment in an email to Zeke Hausfather, while simultaneously posting it to Nick Stokes blog, who had also been looking into the spike.
This spike at the end may be related to the “late data” problem we see with GHCN/GISS and NCDC’s “state of the climate” reports. They publish the numbers ahead of dataset completeness, and they have warmer values, because I’m betting a lot of the rural stations come in later, by mail, rather than the weathercoder touch tone entries. Lot of older observers in USHCN, and I’ve met dozens. They don’t like the weathercoder touch-tone entry because they say it is easy to make mistakes.
And, having tried it myself a couple of times, and being a young agile whippersnapper, I screw it up too.
The USHCN data seems to show completed data where there is no corresponding raw monthly station data (since it isn’t in yet) which may be generated by infilling/processing….resulting in that spike. Or it could be a bug in Goddard’s coding of some sorts. I just don’t see it since I have the code. I’ve given it to Zeke to see what he makes of it.
Yes the USHCN 1 and USHCN 2.5 have different processes, resulting in different offsets. The one thing common to all of it though is that it cools the past, and many people don’t see that as a justifiable or even an honest adjustment.
It may shrink as monthly values come in.
Watts had asked Goddard for his code to reproduce that plot, and he kindly provided it. It consists of a C++ program to ingest the USHCN raw and finalized data and average it to create annual values, plus an Excel spreadsheet to compare the two resultant data sets. Upon first inspection, Watts couldn’t see anything obviously wrong with it, nor could Knappenberger. Watts also shared the code with Hausfather.
After Watts sent the email to him regarding the late reporting issue, Hausfather investigated that idea, and ran some different tests and created plots which demonstrate how the spike was created due to that late reporting problem. Stokes came to the same conclusion after Watts’ comment on his blog.
Hausfather, in the email exchange with Watts on the reporting issue wrote:
Goddard appears just to average all the stations readings for each year in each dataset, which will cause issues since you aren’t converting things into anomalies or doing any sort of gridding/spatial weighting. I suspect the remaining difference between his results and those of Nick/myself are due to that. Not using anomalies would also explain the spike, as some stations not reporting could significantly skew absolute temps because of baseline differences due to elevation, etc.”
From that discussion came the idea to do this joint essay.
To figure out the best way to estimate the effect of adjustments, we look at four difference methods:
1. The All Absolute Approach – Taking absolute temperatures from all USHCN stations, averaging them for each year for raw and adjusted series, and taking the difference for each year (the method Steven Goddard used).
2. The Common Absolute Approach – Same as the all absolute approach, but discarding any station-months where either raw and adjusted series are missing.
3. The All Gridded Anomaly Approach – Converting absolute temperatures into anomalies relative to a 1961-1990 baseline period, gridding the stations in 2.5×3.5 lat/lon grid cells, applying a land mask, averaging the anomalies for each grid cell for each month, calculating the average temperature for the whole continuous U.S. by a size-weighted average of all gridcells for each month, averaging monthly values by year, and taking the difference each year for resulting raw and adjusted series.
4. The Common Gridded Anomaly Approach – Same as the all-gridded anomaly approach but discarding any station-months where either raw and adjusted series are missing.
The results of each approach are shown in the figure below, note the spike has been reproduced using method #1 “All Absolutes”:
The latter three approaches all find fairly similar results; the third method (The All Gridded Anomaly Approach) probably best reflects the difference in “official” raw and adjusted records, as it replicates the method NCDC uses in generating the official U.S. temperatures (via anomalies and gridding) and includes the effect of infilling.
The All Absolute Approach used by Goddard gives a somewhat biased impression of what is actually happening, as using absolute temperatures when raw and adjusted series don’t have the same stations reporting each month will introduce errors due to differing station temperatures (caused by elevation and similar factors). Using anomalies avoids this issue by looking at the difference from the mean for each station, rather than the absolute temperature. This is the same reason why we use anomalies rather than absolutes in creating regional temperature records, as anomalies deal with changing station composition.
The figure shown above also incorrectly deals with data from 2014. Because it is treating the first four months of 2014 as complete data for the entire year, it gives them more weight than other months, and risks exaggerating the effect of incomplete reporting or any seasonal cycle in the adjustments. We can correct this problem by showing lagging 12-month averages rather than yearly values, as shown in the figure below. When we look at the data this way, the large spike in 2014 shown in the All Absolute Approach is much smaller.
There is still a small spike in the last few months, likely due to incomplete reporting in April 2014, but its much smaller than in the annual chart.
While Goddard’s code and plot produced a mathematically correct result, the procedure he chose (#1 The All Absolute Approach) comparing absolute raw USHCN data and absolute finalized USHCN data, was not, and it allowed non-climatic differences between the two datasets, likely caused by missing data (late reports) to create the spike artifact in the first four months of 2014 and somewhat overstated the difference between adjusted and raw temperatures by using absolute temperatures rather than anomalies.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.



“station records are a bit of a mess”
But still useful. Kinda like climate models. Known to be wrong, but still in use.
Andrew
Zeke and I are in agreement, especially about
“…the short version is that station records are a bit of a mess. They were set up as weather stations more than climate stations, and they have been subject to stations moves (~2 per station over its lifetime on average), instrument changes (liquid in glass to MMTS), time of observation changes, microsite changes over 100 years, and many other factors.”
But there is more to it than that., and cooling biases are smaller compared to other biases that are not being dealt with properly, or at all.
Every time I analyse USHCN stations on a state by state basis, and compare with NCDC figures, I come up with the same sort of discrepancy of about 1F, when comparing the change in temperatures since the 1930’s.
For instance, in Alabama NCDC have cooled the past by 1.3F.
http://notalotofpeopleknowthat.wordpress.com/2014/03/29/temperature-adjustments-in-alabama/
Zeke,
I get what your saying but you don’t get it! Anomalies or no anomalies is not the issue. Adjusting the record destroys the usefulness and creates bias to the record. Yes your limited to stations that have records over the period you want to compare – that is good science. Making things up is not, it performs no useful purpose other than to create debate. So your point about Climate is valid, we have weather stations, so does the rest of the world. live with it. Making things up does not tell a useful story.
As for the satellite record, that has been adjusted/calibrated as well, covers a short period etc. I will leave it to others to discuss that (Roy Spencer is the top dog for that). Overall I don’t think its particularly valid to use the satellite record to validate your adjustment/homogenization when that was used to calibrate the things in the first place.
v/r,
David Riser
@David Riser. The satellite data is not calibrated using surface data, see this post and note the section on calibration.
“Once every Earth scan, the radiometer antenna looks at a “warm calibration target” inside the instrument whose temperature is continuously monitored with several platinum resistance thermometers (PRTs). PRTs work somewhat like a thermistor, but are more accurate and more stable. Each PRT has its own calibration curve based upon laboratory tests.”
http://wattsupwiththat.com/2010/01/12/how-the-uah-global-temperatures-are-produced/
Bob, see the presentation by Steriou and Koutsoyiannis at the European Geosciences Union Assembly 2012, session HS7.4/AS4.17/CL2.10. It is available on line at itia.ntua.gr/en/docinfo/1212. Compares a large Global GHCN homogenization sample. The cool the past bias is everywhere. My favorite example from this paper is Sulina Romania, a Danube delta town of 3500 reachable only by boat. No change raw became 4C of warming in GHCN v2 in a small town surrounded by water.
David Riser,
Using anomalies rather than absolute temperatures isn’t adjusting the data per se. Homogenization does adjust the data, but the alternative is only using stations with no moves, instrument changes, time of observation changes, etc. These simply do not exist, at least over the last 100 years. The U.S. has arguably the best network in the world and even here most of our “best” sited stations have moved at least once and changed instruments as well.
Some adjustment is necessary (even Anthony’s new paper adjusts for MMTS transitions), and I’d argue that the automated pair-wise approach does a reasonably good job. I’d suggest reading the Williams et al 2012 paper for some background on how its tested to make sure that both warming and cooling biases are properly addressed: ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/papers/williams-etal2012.pdf (NCDC’s site is experiencing some issues at the moment, but hopefully the link will work soon).
Thankfully going forward we won’t need any adjustments for U.S. data, as we have the climate reference network. The difference between raw and adjusted USHCN data and the climate reference network should provide a good empirical test for the validity of adjustments to the USHCN network. Unfortunately the last 10 years is still inconclusive; while the adjusted data has a trend closer to the climate reference network than the raw data, the results are not statistically significant.
[snip -off topic -mod]
Bob Tisdale,
Globally adjustments do increase the trend, but the net effect is much smaller than in the U.S. (e.g. ~0.15 C per centry vs. 0.4 C per century in the U.S.). I haven’t done the analysis recently, but you can see a 2010 plot of the difference between raw and adjusted GHCN data here: http://rankexploits.com/musings/wp-content/uploads/2010/09/Screen-shot-2010-09-17-at-3.48.19-PM.png
“Thankfully going forward we won’t need any adjustments for U.S. data”
LOL
Starting… now.
Wow. No more adjustments. I don’t believe it. Seriously.
Andrew
Thanks Anthony for the post about the calibration of the satellite record, but after reading the post I still don’t believe that a comparison of the two is valid. But I will say that after reading it I feel much better about using that record for climate variability over the long haul and would think that if were going to sink money into understanding climate the satellite record is much more important than ground based. Folks just need to be a bit more patient in terms of getting a handle on what the actual natural variability is. Ground based is still useful for regional weather forecasting but all the computer time spent torturing the data is probably misplaced.
v/r,
David Riser
Could the removal, in the last year or so, of reporting stations play a role? I recall some 600 stations impacted.
Zeke, see my post to Anthony above (when it gets out of moderation, hehe the curse of using Anthony’s name). I understand the desire to know, but I take exception to wasting my tax money on a fanciful experiment using station records when we have satellites that will probably tell us the answer long before the climate stations do. Particularly when a subset of people use this travesty as a means of destroying ours and others economy.
v/r,
David Riser
Bad Andrew,
Its easy enough to use USHCN data up to 2004 and CRN data after 2004. Thats the magic of anomalies :-p
Here is what that graph looks like: http://rankexploits.com/musings/wp-content/uploads/2013/01/Screen-Shot-2013-01-16-at-10.40.46-AM.png
Of course, the CRN can’t help us improve our estimates of temperatures before 2004, apart from validating (or invalidating) our adjustments to the USHCN network post-2004. If we leaned that our pairwise homogenization methods after 2004 were systemically wrong (or right), it would shed light on whether they were similarly biased (or unbiased) prior to 2004 since its the same automated method detecting and correcting for breakpoints. Thats why the CRN will be a good empirical experiment for the validity of homogenization in the U.S.
ossqss,
Some people have the unfortunate tendency to conflate late reporting with station removal (which, as far as I know, has generally not occurred). USHCN stations can take a few months to report and be processed, and GHCN stations can take much longer (some countries are pretty lax about sending in CLIMAT reports).
If anything, we should get a lot more stations to work with in the next year, since GHCN version 4 will have abour 25,000 more stations than the current 7,000 or so in GHCN version 3.
Congratulations to the authors for running this check, and also to Nick Stokes.
Although it is obvious that the enormous data sets used in climate research need adjustments to correct systematic errors, the fact that adjustments have been made doesn’t reassure me. I guess I’m thinking about it this way. There is a large number of potential errors that creep up in such data sets, some known and some unknown. The surface stations project provides conclusive evidence that this is true. When a correction is made one (or at most a small number) of such possible errors is taken as the basis for modifying the data. Making such a correction then makes a small selection from a large population of potential errors. What assurance can there be that such a correction does not in fact increase the bias in the data rather than reduce it? Probably there is an extensive literature dealing with this problem, but when I see that data has been adjusted I find that I have less confidence in it, despite the fact that the adjustment may have been perfectly appropriate.
Zeke,
I’m not here to question your squiggology. I defer to your skill in that area. 😉
Andrew
There is no other field in science where the data is routinely corrected in one direction only. In real science you only correct your data for analyzer drift or bias from a calibration standard. And that sort of adjustment will always be random in both directions, like you see in the data prior to 1960. The consistent adjust up of the temperature datain one direction is bogus; there can be no valid reason for always adjusting your data up, and conveniently the amount needed to prove your AGW hypothesis.
David Riser asserts: “Nik, Goddard is correct, he did nothing wrong, he explained his methodology clearly.”
Goddard attached motive and called this “tampering” and in that he was indeed very wrong and nearly caused a PR disaster had this claim appeared more widely before it was debunked here. Given that he has the full technical background to within minutes determine the reason for the glitch, yet he proceeded to push it repeatedly while publicly asserting that I was a lunatic for asking for before/after plots, means that your assertion is false. This claim has made skeptical criticism of overall adjustments less credible and has provided ammo to online activists who relish such incidents of gross incompetence.
Why is it okay to make adjustments to the global temperature record which causes the world to go out and waste $358 billion per year, …
… but if you did the same with your company’s financial statements, or your prospectus or your country’s economic data (just to prove a personal pet economic theory), and you caused someone to lose $358 billion, there would be serious repercussions.
I have a few comments still in moderation so they may end up after this one. However I would like to point out that this silliness of homogenization and adjustment feeds the monster found here http://www.climate.gov/ for which I am both horrified and upset about. That whole website is the biggest pack of bs I have ever seen and It is being paid for by my tax dollars. Thank god for someone like Steve Goddard who is at least willing to show this stuff for what it is.
That website is the biggest reason that I oppose any kind of adjustment without a standard. Zeke you may think that the data is done being adjusted but I guarantee you that is not the case. You’re kind of like a frog in a boiling pot of water, you agree to things because on the surface it appears to be reasonable but if you take a closer look you will realize that politics is involved.
v/r,
David Riser
Zeke says:
“If anything, we should get a lot more stations to work with in the next year, since GHCN version 4 will have abour 25,000 more stations than the current 7,000 or so in GHCN version 3.”
The real issue though is quality of these stations, not quantity. My surfacestations work in Fall et al. showed that over 3/4 of the stations had siting issues in the USA. That number is expected to be the same or worse in GHCN.
Adding noisy signals doesn’t improve the accuracy of the signal being extracted.
For me, it is a bit frustrating that folks do not remember that the US temperature monitoring network was not and is not broken. It is performing as designed. It was put in place over a century ago when USA climate information was primarily anecdotal and inconsistent. It has provided a wealth of information to science. Urban Heat Island: not a problem in system design as people want to know what the temperature actually is to go about their daily lives, not what it might be if there was no city there.
The problem everyone in the climate science world is that that climate monitoring system was never designed to reliably detect long term temperature trends of only a degree or two Fahrenheit. After all, with daily high to low temperature swings in the range of twenty degrees and annual high to low swings of over one hundred degrees Fahrenheit not at all unusual, an overall accuracy of plus or minus a couple degrees Fahrenheit was certainly considered adequate. (+/- 0.5 degrees accuracy thermometers plus siting, installation, and human errors)
So, we should always take any statements about temperature or temperature trends with a grain of salt if the accuracy claimed is much better than plus or minus two degrees Fahrenheit or one degree Celsius. It is always possible that all the various manipulations of our historic temperature data are actually useless at improving our understanding of global climate.
Sometimes I get the impression that we lose the forest for the trees. Data manipulation is a dangerous game. Explaining an “adjustment” doesn’t necessarily justify an adjustment, and once it becomes the “official” set of data, all other interpretations and uses are based on the assumption that it is “right”. I have trouble getting past the missing 1930s. My family were settlers who lost everything in the late 30s. Also, I think we lose sight of the fact that an adjustment of 0.4C is a high percentage of projected change if one buys the 2C increase projected. Year by year, “new record high temps” can be announced based on changes of hundredths of a degree. Any adjustment upward could allow this game to be continued virtually without notice if temps continue to be “flat”.
NIk,
I wouldn’t call this debunked. Goddard explained his methodology and provided his data. His method was used to reproduce the graph. This post explains the hockey stick, which is fine. Its good for clarity, it doesn’t change the message. When the climate report comes out and its way hot its published as such, when it comes back down to earth nothing is said. This post and the previous one generated a ton of discussion which is also good. I would say that Steve Goddard is correct, the adjustments are deliberate, they tell an incorrect story and that story is shouted to the masses. So I say again, Thank You Steven Goddard!
v/r,
David Riser