From Tom Nelson
Email 600, Sept 2007: Watts expose makes NOAA want to change entire USA method
[Tom Karl, Director of the National Climatic Data Center] We are getting blogged all over for a cover-up of poor global station and US stations we use. They claim NCDC is in a scandal by not providing observer’s addresses. In any case Anthony Watts has photographed about 350 stations and finds using our criteria that about 15% are acceptable. I am trying to get some our folks to develop a method to switchover to using the CRN sites, at least in the USA.
Hat tip: AJ
===============================================================
Note this email, because it will be something I reference in the future. – Anthony
Related articles
- Widespread Flaws in Weather Stations Networks Used to Track National Temperature Trends, Says New Study (prweb.com)
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
[Tom Karl, Director of the National Climatic Data Center] We are getting blogged all over for a cover-up of poor global station and US stations we use. They claim NCDC is in a scandal by not providing observer’s addresses. In any case Anthony Watts has photographed about 350 stations and finds using our criteria that about 15% are acceptable. I am trying to get some our folks to develop a method to switchover to using the CRN sites, at least in the USA.
Finally, homogenization does not change the overall average. What it does do is smear SHAP bias around so it cannot be distinguished. We are going to hear a LOT more of about this in the fairly near future. But that is a story for another day . . .
It is only you guys that focus so much on the surface network and in most cases even only on the surface network in the USA. Which is not very smart: Even if you could show that your national weather service is in a big conspiracy, it would hardly change the global warming signal. America is not that large. If you would like to contribute to science and find a reason why the the global warming signal is too strong, you’d better think of reasons that apply globally.
We concentrate on the USA for for the following reasons:
It is very difficult to locate even US stations. Having run down over 200 of them, I can speak to this personally. It is exceedingly difficult and timeconsuming, given that NOAA has pulled the curators’ names and addresses from the MMS website. Also, the coordinates they provide are often faulty in the extreme, though there has been some improvement of late. It has taken us years and years to run down the bulk of USHCN stations.
Foreign stations are a near-impossible task. We do not have an international network of volunteers. As for locating them by satellite, GHCN provides coordinates to only two or three decimal places. That is entirely useless for our purposes. Some can be identified by airport, WWTP, or some other industrial structure, but satellite resolution outside the US (even inside the US) is generally so poor as to make distinguishing stations impossible. On top of that, there is no conformity of equipment, so unless it is a Stevenson Screen or an ASOS, we wouldn’t recognize them if they showed up on the blurry map images — which they generally don’t.
In any event, the USA is an excellent sample. First, the US shows much the same overall warming trend for the 20th century as does the world, overall (c. +0.7C / century for adjusted data — and much less for raw data), though the “1940 bump” is higher. Second, with the possible exception of Australia, the US has the highest quality historical station network in the world. This assertion appears to be supported by what few foreign stations we have actually managed to locate.
Furthermore, we are not dedicated to proving that the NOAA is a “big conspiracy”. What we are after is determining if their precedures are tight enough to cut the mustard in the private sector and whether output is correct. There is a lot riding on the answers. Yet we are excoriated for even asking the question. That violates both Scientific Method (and mores) and the principles of Liberalism by which I was raised and educated.
And finally, we have a surveyed and rated sample of just a bit over 1000 stations. That will allow us to examine well sited stations vs. poorly sited stations. It does not matter statistically whether only the US is covered or whether the sample is scattered over the world. What matters is the number of stations evaluated and how consistent the equipment is.
The question is: How does site quality affect the readings? I’d prefer to be looking at 6000 stations worldwide, but 1000 within the US will suffice to answer that question.
Previously, we used Leroy (1999) ratings, though that was a poor metric as it accounted only for distance from heat sink and made no account for area within radius, as does Leroy (2010).
The oceans are lately covered by satellites and way back into the past by ocean weather ship and voluntary observations ships: International Comprehensive Ocean-Atmosphere Data Set (ICOADS).
The methods of measuring ocean temperatures prior to ARGO (2005) are both inconsistent and abominable (c.f., the bucket/bag/bilge controversy). UAH and RSS provide reasonably reliable atmospheric readings over the oceans, but not prior to December 1978.
The vertical dimension is covered by the radiosonde network, many of them on islands to cover the atmosphere above the oceans
Radiosonde readings show so little warming (or even cooling) that one must be suspicious of them. If the radiosonde readings are accurate, we have nothing whatever to worry about. I’d go with UAH and RSS for atmospheric readings until more is known.
evanmjones wrote: “But homogenization is just one step in a long process. There is infilling, SHAP, TOBS, UHI, and equipment, to name just a few. (Not to mention the initial tweaking — outliers, etc.)”
What do you mean with SHAP and how is it different from homogenization? TOBS (Time of Observation bias), UHI (Urban Heat Island), changes is the equipment (instruments and weather shelters) can be corrected for using parallel measurements. But if not corrected that way, these errors are corrected in the normal statistical homogenization, by comparison with neighboring stations, which was tested in the blind validation study.
evanmjones wrote: “And, of course, homogenization is not supposed to increase the trend. After all, if all you are doing is, in effect, providing a weighted averaging of stations within a given radius (or grid box or whatever), the overall average would not change (or at least not much, depending on the weighting procedures). Yet the adjusted data is considerably warmer than the raw data. ”
Homogenization is supposed to change the trend in the raw data in case this trend is wrong. If a station is moved from the city to the airport, there is typically a drop in temperature. If this drop is sufficiently large, you may find an erroneous cooling temperature trend in the raw data.
The mentioned weighted average of stations is used as a reference time series (for some homogenization methods). You compute a difference time series of this reference with your candidate time series. If there is just one jump, due to the move to the airport, this difference time series looks like a step function with some weather noise. From the size of the step you determine the temperature difference between the city and the airport. This step size is added to the data to correct for the relocation of the station.
The reference time series thus does not replace the data of the candidate station, which some people seem to assume, but is only used to compute the size of the jump. Thus if the other stations would also all move the airports, the results would still be right (as long as they do not all move to the airport on the same day).
Why is there a difference between the trends in the raw and the homogenized data?
Menne et al. (2009): “The largest biases in the HCN are shown to be associated with changes to the time of observation and with the widespread changeover from liquid-in-glass thermometers to the maximum–minimum temperature system (MMTS). ”
Menne, M. J., Williams, C. N. Jr., and Vose, R. S.: The U.S. historical climatology network monthly temperature data, version 2, B. Am. Meteorol. Soc., 90, 993–1007, doi:10.1175/2008BAMS2613.1, 2009.
I wrote: “This test showed that the USHCN homogenization software improves the homogeneity of the data and did not introduce any artificial (warming) trends.”
evanmjones wrote: “In that case, homogenization is not going to explain the differences. So homogenization code is not terribly relevant to my objections. We need the full and complete adjustment code. The part that creates the — very large — differences between aggregate raw and adjusted data.”
The differences between the aggregate raw and adjusted data are due to homogenization. The blind validation study showed that the adjusted trends are closer to the true trends as the trends in the raw data. Thus there are changes in the tends, but no *artificial* additional warming trends, as many of you guys like to assume. My advice would be to look for weak points in the global warming theory elsewhere.
“Because he knows, a frightful fiend Doth close behind him tread……..”
Anthony, like it or not, they are having to look over their shoulders and keep an eye on you and WUWT.
You have arrived.
What do you mean with SHAP and how is it different from homogenization?
By SHAP, I mean changing microenvironment over time. Station History Adjustment Procedure. This is entirely unrelated to homogenization.
TOBS (Time of Observation bias), UHI (Urban Heat Island), changes is the equipment (instruments and weather shelters) can be corrected for using parallel measurements.
Precisely. And we need to examine and audit NOAA procedure for doing so. Of course, it would be better to have automated Class 2-sited stations or better, with no adjustment needed or applied. Last I heard, NOAA no longer applies an adjustment for UHI. But without their algorithm, code, and manuals, we have no way of knowing the details.
But if not corrected that way, these errors are corrected in the normal statistical homogenization, by comparison with neighboring stations, which was tested in the blind validation study.
Incorrect. Homogenization has nothing to do with correcting for those factors. It just smears the error around between x number of stations so the problem shows up less per station — by a factor of x. Sort of like correcting a 5 point grading error by changing the grades of five students by 1 point.
Homogenization is supposed to change the trend in the raw data in case this trend is wrong. If a station is moved from the city to the airport, there is typically a drop in temperature. If this drop is sufficiently large, you may find an erroneous cooling temperature trend in the raw data.
Of course homogenization will alter the trends of every individual station in the network. however, you have stated definitively (with NOAA citation) that homogenization does not alter the overall trend average. Therefore, by definition, homogenization is merely distributing the errors of each individual station among all nearby stations, resulting in a net zero change in average.
Therefore, the 20th century raw trend anomaly being increased by adjustment by over 400% for the 20th century and by over 40% for the past 30 years needs to be subject to audit. A FULL and COMPLETE audit.
Why is there a difference between the trends in the raw and the homogenized data?
Menne et al. (2009): “The largest biases in the HCN are shown to be associated with changes to the time of observation and with the widespread changeover from liquid-in-glass thermometers to the maximum–minimum temperature system (MMTS). ”
Yes, I mentioned both TOBS and equipment issues earlier. Code, please!
I notice that they adjust MMTS trends UP to match CRS rather than adjusting CRS trends DOWN to match MMTS. Despite the fact that MMTS is probably a better instrument (discounting siting issues, of course).
And nothing, of course, for microsite. Just upward adjustments to stations moved to airports (which show a large warming trend bias, didn’t you know?).
As Al Gore once put it, so far as adjustment procedure is concerned, everything that’s UP is supposed to be DOWN and everything that’s DOWN is supposed to be UP.
And since homogenization, in and of itself, does not affect the overall trend for USHCN, homogenization code is not relevant to the question.
The NOAA homogenization software was just subjected to a blind test with artificial climate data with inserted inhomogeneities. This test showed that the USHCN homogenization software improves the homogeneity of the data and did not introduce any artificial (warming) trends.
. . .
The differences between the aggregate raw and adjusted data are due to homogenization. The blind validation study showed that the adjusted trends are closer to the true trends as the trends in the raw data. Thus there are changes in the tends, but no *artificial* additional warming trends, as many of you guys like to assume. My advice would be to look for weak points in the global warming theory elsewhere
Follow the pea.
What is going on, then, is that well sited stations that are running cooler are adjusted so their trends are as warmy as poorly sited stations (which also have been adjusted warmier).
Actually, good stations are adjusted even slightly warmer than bad stations. Quite a bit warmer, if airports are excluded. And, yes, I’ve checked.
Thanks for the advice, but I think we had better look for weak points in global warming right here.
Dear evanmjones, if you are not willing to invest a little time into understanding the main principle behind homogenization and how it is implemented (including how it can improve the aggregate trend), do not expect people to waste their precious life time for a complete audit to satisfy your unfounded distrust.
Your last two comments are so full of plainly wrong statements, so clearly display that you have no idea how homogenization is performed and no willingness to learn, that I do not expect that further clarifications would bring anything.
“During the past few years I recruited a team of more than 650 volunteers to visually inspect and photographically document more than 860 of these temperature stations. We were shocked by what we found. We found stations located next to the exhaust fans of air conditioning units, surrounded by asphalt parking lots and roads, on blistering-hot rooftops, and near sidewalks and buildings that absorb and radiate heat. We found 68 stations located at wastewater treatment plants, where the process of waste digestion causes temperatures to be higher than in surrounding areas.
In fact, we found that 89 percent of the stations – nearly 9 of every 10 – fail to meet the National Weather Service’s own siting requirements that stations must be 30 meters (about 100 feet) or more away from an artificial heating or radiating/ reflecting heat source. In other words, 9 of every 10 stations are likely reporting higher or rising temperatures because they are badly sited.
http://wattsupwiththat.files.wordpress.com/2009/05/surfacestationsreport_spring09.pdf
It was WUWT evidence that station data was skewed by poor siting and selectivity that made me take the step from skeptical questioning of climate change models using this data to absolute skepticism of the theory itself.
In this day and age, with access to refined technology and enhanced communications, the global network of sites reporting raw data should be expanding, rather than shrinking. I’m not into conspiracy theories, but while ever the reverse is true, it is hard not to conclude that a caucus is trying to control the data, and to manipulate it to meet its own agenda.
Victor Venema says:
I don’t understand why you say “If a station is moved…”. What you describe cannot be defined in any way as movement.
Rather, one station is discontinued and a new station is built at a second location. Keeping the same name or identification number does not make it the “same station”. Does it make sense to you to then “adjust” the recorded values at either site in the name of homogenization? Would you do this for two previously “unrelated” sites?
Furthermore, you state:
You, of course realize that an adjustment would rarely be a single value added or subtracted. The difference would be unlikely to remain constant over the various months if the geographic characteristics change. This means that the temporal structure of anomalies calculated for the appended series could also be affected even if such adjustments were done on a monthly basis. Doing multiple adjustments then becomes much more arbitrary.
Nor would any of this explain those adjustments which have a trend already built into them…
“We are getting blogged all over…”
Blogs = artillery
Free Speech = WMD
Great news. Congratulations to all involved! Made my Sunday! But,,, This post added an exclamation point to my day:
“John Billings says:
February 4, 2012 at 5:37 pm
The new improved HADCRUT4 lowers earlier recorded temps to make current actual temps look higher. It’s been found already, it’s old news. There was an Iceland story here on WUWT a week or two back on exactly that.
Two hundred years ago, people took to the streets. Now we take to the blogs. The military-industrial complex must be pissing their pants.
We’ve got nice graphs though.”
You see, although my nic here is HuDuckXing, my real name is John Billings! So;
Hello to John Billings!
from,
John Billings
Your last two comments are so full of plainly wrong statements, so clearly display that you have no idea how homogenization is performed and no willingness to learn, that I do not expect that further clarifications would bring anything.
Unless homogenization includes SHAP, FILNET, outliers, UHI, TOBS, and microsite effects, it is not full disclosure.
All I see is adjustments that increase good site trends to greater than bad site trends. After the bad site trends themselves have been increased.
That’s a fact. We have the raw and adjusted data trends. We have determined the ratings.
Unless that can be replicated and the code inspected — line by line — there can be no independent review. By definition. I do not see how you can dispute that. Yet you said earlier that LT was correct in saying that there is no need to check out NOAA adjustments, which hike 20th century temperature trends by over 0.4C per century and the last 30-year trends by over twice that amount.
As I say, we have the raw and adjusted data trends.
So we’ll just toodle along, as we have been, and submit my clearly wrong statements for peer review. No need to clarify.
Meanwhile, we would like a FULL and COMPLETE adjustment procedure including any/all working code, manuals and methods involved. If independent review demonstrates that NOAA’s procedures are legit (for example, that Time of Observation is taken for each individual station directly from B-91 and B-44 forms), then there is no problem. But until we can do that, there can be no independent review, and the adjustments, by definition, cannot be considered scientifically valid, much less legitimately used as a basis for multi-trillion dollar policy.
I don’t understand why you say “If a station is moved…”. What you describe cannot be defined in any way as movement.
That’s how NOAA defines it. If the station does not receive a new COOP number it is not considered to be a new station but, rather, a station move.
Stations move rather frequently. Sometimes they are merely localized equipment moves, particularly if there is a conversion from CRS to MMTS. Sometimes a curator passes away or moves, so they find another volunteer (or go for the old standbys of either an airport or WWTP) and relocate the station accordingly. More often than not, NOAA does not consider this to be a “new” station, only as a station move.
Victor Venema says:
February 5, 2012 at 11:55 am
Evan has spent plenty of his precious life time working on various WUWT endeavors. Don’t gripe about losing some of yours – you seem to cover the subject fairly well (at least for having no source code) at your blog.
More importantly, thousands of people read this blog every day. You’re losing out on a good chance to explain to a lot more people than you reach on your blog how raw data gets processed into climate data in Germany.
Also, please take some time checking out http://chiefio.wordpress.com/gistemp/ . EM Smith spent a lot of time studying the GISS adjustments, enough time to warrant starting his own blog. You might want to see if some of his criticisms of GISS also apply to German data.
Ric Werme says:
February 5, 2012 at 8:11 pm
———————————–
Ric, you’re a pretty decent guy.
Mike McMillan says:
February 5, 2012 at 12:07 am
Steve from Rockwood says:
If it were me, I would stop for a minute and have a scotch.
…
Is were even a word?
Yes. Subjunctive mood. Seldom used these days except by the highly educated, pirates, and in Ebonics.
Stop the press. I’m a pirate!
REPLY — Would it were. Arrrr. — Evan.
To: Victor Venema
I note in http://www2.meteo.uni-bonn.de/mitarbeiter/venema/themes/homogenisation/HOME/ you say:
I confess that I’ve forgotten how some of the steps Evan mentioned are applied, but one adjustment in particular by GISS is really annoying. It’s the backfilling of missing data in a station’s record, something that I don’t think is covered by homogenization as you understand it.
EM Smith’s blog probably goes into much better detail, but essentially when a new month’s data is out, GISS code looks through the record for missing data for the month, and if it finds it, recomputes an estimate for that month. An effect of that code, is that the historical record keeps changing, and so for anyone wanting to reproduce research that used GISS data, they have to know the month an year it was released in order to stay in sync. Worse, the adjustment tend to make the old data colder, thereby increasing the rate of temperature increase in the record.
So, put me in the camp that thinks climate change is occuring (well, not very quickly the last decade or so) and that adjustments lead to overestimates of recent global warming.
Oh, I understand how homogenization can “improve” the trends, all right. It identifies stations that are running cooler and “adjusts” them so they are warmer.
That is pretty much the only way that the few good stations start out with much lower trends than the bad stations and then somehow wind up with higher trends than than the (upwardly) adjusted data of bad stations.
Yes, you read it correctly: somehow the bad stations wind up with higher trends as well. And, yes, the adjusted trends for the good stations are adjusted even higher than that.
To: Victor Venema
One more thing.
A lot of people here have been moving away from manual measurements to more automatable measurements with a more even coverage. For example, 10.7 cm microwave emissions instead of sunspot counts, satellite-derived temperature estimates of the lower troposphere instead of the ill sited US weather station network, and ocean energy storage instead of atmospheric temperature estimates. Perhaps you can compare your data with those other sources.
LazyTeenager says:
February 4, 2012 at 3:22 pm
>evanmjones says
>> As NOAA has refused to release its adjustment code, we cannot reproduce the adjusted data, >>and therefore, of course, any results are Scientifically Insignificant.)
> I was under the impression that it’s relatively easy to code your own adjustment code and that it >has been done multiple times. And they all come much the same conclusions about the >temperature trends.
> So doesn’t that make access to the NOAA code kind of irrelevant since the actual principles >involved are well known.
I have seen it argued before (from warmist side) that scientists doing such work as gathering raw data (maybe also developing adjustment codes?) should not have to give their works away for free, not even if they are paid by governments to do this work.
I seem to think that data and codes gained at taxpayer expense should be free to taxpayers of the taxpaying jurisdiction. Maybe delay free publishing by 1-3 years (depending on field of study), so that when something big hits, competing scientists have to do their own work.
It appears to me this forces competing work that generates alternative codes and data, and I think that is good. When someone else redoes something already done, science is interested if
the rework confirms or does not confirm something that can use confirmation by an independent effort.
If others develop adjustment codes of their own, it is interesting to see if they have similar results or significantly different results from the NOAA one. (Of course, this is easier with access to both the raw and adjusted NOAA data.)
I think that taxpayer paid data processing codes and compilations of raw data relevant to climate change should be published on the web for free to taxpayers that paid for it, no later than 1.5 years after they were generated, and no later than 9 months after publication of studies using them. I think subtract up to 6 months from these figures if necessary and sufficient, to the extent necessary, to have publication at least 15 days before a major election where candidates are running at least in part on climate change issues, and at least 10 days before major government body or international body voting events on appointing big players or on treaties concerning global warming or climate change issues.
Ric Werme to Victor Venema:
EM Smith’s blog probably goes into much better detail, but essentially when a new month’s data is out, GISS code looks through the record for missing data for the month, and if it finds it, recomputes an estimate for that month. An effect of that code, is that the historical record keeps changing, and so for anyone wanting to reproduce research that used GISS data, they have to know the month an year it was released in order to stay in sync.
Ironically, the metadata presented in MMS has exactly the opposite problem. When station location coordinates are refined/corrected (recently done en masse with GPS), the old, inaccurate coordinates are not changed. Rather a fictitious location change is entered. So when you try to trace back a station history (necessary if you really want to go take a look on the ground) you never know which locations are to be taken seriously. At least you don't until you catch on to the game….
Rather a fictitious location change is entered. So when you try to trace back a station history (necessary if you really want to go take a look on the ground) you never know which locations are to be taken seriously. At least you don’t until you catch on to the game….
What you have to look for is coordinates ending in .33333, .5, .83, .66667 or whatever.
But even with what look like painstakingly precise coordinates, they can be anywhere from 5 feet to half a mile off. It’s a complete crapshoot.
Blue Hill, MA, is a poster child. I looked at several “station moves” over quite a patch of square miles and evaluated them in a rough sort of way. And then when I spoke to the curator, I discovered that there was one localized equipment move of 20 feet or so during the entire 100+ year history of the station.
So not only is it COMPLETELY impossible to judge microsite without an image or direct testimony of a curator (or other eye witness), but you can’t even rely completely on the larger picture. So we use the NOAA and GISS determinations of which stations are urban, semi-urban, and rural (we have o choice), but sometimes I wonder how accurate even that is.
And I know that the NOAA’s own microsite ratings — such as they even exist — are woefully inaccurate by examining Menne (2009) using Leroy (1999) standards. And, judging by my current studies, Menne, et algore, cannot be even close to accurate by Leroy (2010) standards.
RomanM says:
As long as the station moved over a distance much less than the average distance between stations, I see no problem in keeping the station number the same. You can also split up the record, as you suggest, that would also be fine. Every weather service has its own rules for doing so. If you split up the record you will have to take the jump due to the relocation into account when you compute a regional average over all stations. Thus you cannot avoid the homogenization problem.
RomanM says:
You are right, there is often also a change in the annual cycle due to a inhomogeneity. For temperature you can typically estimate the adjustments needed for every month quite well. Thus temperature is often homogenized on a monthly scale. Precipitation is more variable, thus the adjustments more uncertain. Consequently precipitation is often homogenized on a yearly scale.
Trends are almost always computed on yearly mean values, then you also just need to compute the annual adjustments and the annual cycle is irrelevant.
RomanM says:
In the blind validation study of homogenization algorithms we also inserted local trend inhomogeneities to model the urban heat island effect or the growth of vegetation, etc. Homogenization algorithms can also handle that situation. In most cases they solve the problem by inserting multiple small breaks in the same direction. Algorithms that use trend-like adjustments were not better than those inserting multiple small breaks.
————-
Ric Werme says:
It is nice to hear that someone puts in a good word for Evan. If he invested a lot of time in the surface temperature project to visit all the stations, I am very grateful. I wish there was a similar project in Europe as it has the potential to help our understanding of the quality of the measurements.
However, when it comes to homogenization, how inhomogeneities are removed, I am not able to understand the gibberish Evan is talking. He does not seem to be able or willing to understand how homogenization is performed. I am happy to answer your questions.
Ric Werme says:
Actually Roger Pielke put me into contact with Anthony Watts, he requested permission to repost my post on the blind validation study of homogenization algorithms. I guess he was no longer interested when he read the conclusions. The admission that at least a minimal part of climatology is scientifically sound is apparently too controversial for this blog. Conclusion: If you are interested in the truth, read the blogs of the “opponents”.
—————
Ric Werme says:
I am not a climatologist. I am a physicist that normally works on the relation between clouds and (solar and heat) radiation. Being an impartial outsider was why they asked me to perform the blind validation. I now understand the homogenization problem somewhat, but I did not study the filling of missing data and cannot comment on this problem.
The International surface temperature initiative is working on a similar blind validation study, but now for a global temperature network. Because it is global, we can not only validate homogenization algorithms, but also the methods used to interpolate and compute regional and global averages. Stay tuned.
http://www.surfacetemperatures.org/benchmarking-and-assessment-working-group
————–
Ric Werme says:
A good idea. I do think we need to do both. Satellites also have their inhomogeneity problems. Their calibration can only be partially checked in space, the relation between the measured quantity and the climatological variable of interest also depends on the state of the atmosphere and may thus change in space and time. Furthermore, the time series are relatively short from a climatological perspective, the instrumentation has changed considerably over the decades and the satellites themselves have a short time span.
The European Space Agency has a climate Satellite Application Facility (CM-SAF), which is coordinated by the German weather service. They try to solve these problems and produce a good dataset. Again this is a very different problem and I do not have the expertise to judge it.