There’s a new paper out today, highlighted at RealClimate by Hausfather et al titled Quantifying the Effect of Urbanization on U.S. Historical Climatology Network Temperature Records and published (in press) in JGR Atmospheres.
I recommend everyone go have a look at it and share your thoughts here.
I myself have only skimmed it, as I’m just waking up here in California, and I plan to have a detailed look at it later when I get into the office. But, since the Twittersphere is already demanding my head on a plate, and would soon move on to “I’m ignoring it” if they didn’t have instant gratification, I thought I’d make a few quick observations about how some people are reading something into this paper that isn’t there.
1. The paper is about UHI and homogenization techniques to remove what they perceive as UHI influences using the Menne pairwise method with some enhancements using satellite metadata.
2. They don’t mention station siting in the paper at all, they don’t reference Fall et al, Pielke’s, or Christy’s papers on siting issues. So claims that this paper somehow “destroys” that work are rooted in failure to understand how the UHI and the siting issues are separate.
3. My claims are about station siting biases, which is a different mechanism at a different scale than UHI. They don’t address siting biases at all in Hausfather et al 2013, in fact as we showed in the draft paper Watts et al 2012, homogenization takes the well sited stations and adjusts them to be closer to the poorly sited stations, essentially eliminating good data by mixing it with bad. To visualize homogenization, imagine these bowls of water represent different levels of clarity due to silt, you mix the clear water with the muddy water, and end up with a mix that isn’t pure anymore. That leaves data of questionable purity.
4. In the siting issue, you can have a well sited station (Class1 best sited) in the middle of a UHI bubble and a poorly sited (Class5 worst sited) station in the middle of rural America. We’ve seen both in our surfacestations survey. Simply claiming that homogenization fixes this is an oversimplification not rooted in the physics of heat sink effects.
5. As we pointed out in the Watts et al 2012 draft paper, there are significant differences between good data at well sited stations and the homogenized/adjusted final result.
We are finishing up the work to deal with TOBs criticisms related to our draft and I’m confident that we have an even stronger paper now on siting issues. Note that through time the rural and urban trends have become almost identical – always warming
up the rural stations to match the urban stations. Here’s a figure from Hausfather et al 2013 illustrating this. Note also they have urban stations cooler in the past, something counterintuitive. (Note: John Nielsen-Gammon observes in an email: “Note also they have urban stations cooler in the past, something counterintuitive.”, which is purely a result of choice of reference period.” He’s right. Like I said, these are my preliminary comments from a quick read. My thanks to him for pointing out this artifact -Anthony)
I never quite understand why Menne and Hausfather think that they can get a good estimate of temperature by statistically smearing together all stations, the good, the bad, and the ugly, and creating a statistical mechanism to combine the data. Our approach in Watts et al is to locate the best stations, with the least bias and the fewest interruptions and use those as a metric (not unlike what NCDC did with the Climate Reference Network, designed specifically to sidestep the siting bias with clean state of the art stations). As Ernest Rutherford once said: “If your experiment needs statistics, you ought to have done a better experiment.”
6. They do admit in Hausfather et al 2013 that there is no specific correction for creeping warming due to surface development. That’s a tough nut to crack, because it requires accurate long term metadata, something they don’t have. They make claims at century scales in the paper without supporting metadata at the same scale.
7. My first impression is that this paper doesn’t advance science all that much, but seems more like a “justification” paper in response to criticisms about techniques.
I’ll have more later once I have a chance to study it in detail. Your comments below are welcome too.
I will give my kudos now on transparency though, as they have made the paper publicly available (PDF here), something not everyone does.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.



Typo
“takes the well sited stations and adjusts them to be closer to the well sited stations,”
UHI appears in the minimum temps which occur in the dawn, not so much the max temps. Surely a more accurate temperature scale could be created by using only max temperatures.
Typo — “homogenization takes the well sited stations and adjusts them to be closer to the well sited stations” obviously the second “well” should be “poorly”.
Typo: “homogenization takes the WELL sited stations and adjusts them to be closer to the WELL sited stations” I assume you meant “homogenization takes the well sited stations and adjusts them to be closer to the POORLY sited stations”
“…homogenization takes the well sited stations and adjusts them to be closer to the well sited stations…”
???
[Fixed. — mod.]
First off, never let the twits stampede you.
I will not read this until I’m done work but cynicism rules, Hausfather’s methods seem to be the definition of insanity, that of repeating the same actions hoping for a different result.
Nice comparison, muddying the water.
I expect circular argument,statements such as, no matter how we smear the station data about, we get these same results, therefore the station siting problems do not matter.
“…homogenization takes the well sited stations and adjusts them to be closer to the well sited stations…”
Did you mean “homogenization takes the well sited stations and adjusts them to be closer to the poorly sited stations”?
item 3 – “homogenization takes the well sited stations and adjusts them to be closer to the well sited stations, …” ???
[Fixed. — mod.]
Thanks to everyone who pointed out the typo: well sited > poorly sited, It is fixed.
Please allow me to digress for a moment. I am facing the same, exact problem with test results from other labs. We take the same product (in your case it is the weather) and measure we it in pretty much the same manner as other labs. The measurements are then treated differently so different labs giving different ‘results’. The outputs from different methods do not match, though the method of making raw measurements is pretty much the same because scales, particle counters, thermometers and gas analysers are pretty much the same.
The bias that results in different final answers starts immediately the calculations begin. I agree the whole thing boils down to making final claims that are, or are not solidly rooted in the raw data available. There are erroneous conceptual constructs which are applied and this affects what happens to the raw numbers.
It is frustrating in the following manner: We were asked to provide a ‘conversion’ so our results will be ‘comparable with the results from other labs’ who are using other data processing methods. Our position is, why would we do that when the other methods are known to be faulty, arbitrary, questionable, variable, even invalid etc?
Why not have a conversation about doing things correctly, then trying to process old raw data into new, correct results? That is exactly what you are proposing. Why convert newer correct results to ‘match’ old questionable ones? The paper above is basically a call to use a different protocol, with known issues, to process a larger set of contaminated raw data into a result that is more or less the same as older treatments which are broadly accepted have known issues.
My recommendation, Anthony, is to stick with the most correct method you have at your command, and show how you do it, and present the results. It also helpful to show that the ‘other methods’ give different results, and show why (if possible) that when one takes a First Principles approach they will necessarily be different for X and Y reasons. It is more work and you should not have to do it. They should be willing to listen to arguments developed from First Principles, but reality impinges: many people are not capable of following a logical explanation – not nearly as many as we might suppose. I am not claiming perfidious bias (as many do), I am suggesting they are not competent.
The paper is indeed a ‘justification’ and its value is it gives the occasional reader an alternative view, well referenced in a way, to how some are approaching the issues, thereby giving you a chance to highlight differences. Observing this does not invalidate other, more correct methods such as, I believe, the one you are using. We should not complain when someone details their the methods by which they arrived at what is a demonstrably incorrect final result. It is a bit like encouraging Weepy Bill to speak publicly as much as he can because it lets people know what childish, uninformed, agenda-driven fanaticism looks like.
Additionally, it is important to keep siting quality a issue separate from UHI and place the difference front and center. It is patently obvious the two are different issues and both must be resolved. Argumentative obfuscation by Hausfather does not correct methods or validate deficient conclusions. Press on. Someone has to do things properly and if the Mennes and Hausfathers of the climate circle are not intellectually or methodologically up to the task, others must make the effort and don the Yellow Jersey. Congrats. It looks good on you.
Hi Anthony,
You are correct in asserting that this paper makes no real claims regarding station siting, but rather focuses more on meso-scale urbanization effects. To the extent that siting issues are urbanity-correlated they might influence the results, but they are not really the focus of the paper. We do mention them briefly in the introduction:
“To further complicate matters, changes associated with urbanization may have impacts that affect both the meso- scale (102–104 m) and the microscale (100–102 m) signals. Small station moves (e.g., closer to nearby parking lots/ buildings or to an area that favors cold air drainage) as well as local changes such as the growth of or removal of trees near the sensor may overwhelm any background UHI signal at the mesoscale [Boehm, 1998]. Notably, when stations are located in park-like settings within a city, the microclimate of the park can be isolated from the urban heat island “bubble” of surrounding built-up areas [Spronken-Smith and Oke, 1998; Peterson, 2003].”
The figure you cite from our paper does not show us cooling urban stations in the past. First of all, the baseline period used is 1961-1990, which forces agreement over that period (as we are dealing with anomalies, and comparing trends, the absolute offsets are somewhat irrelevant). Second, what is being shown is not urban stations and rural stations, but rather all stations using only rural stations to run the homogenization process, all stations using only urban stations to run the homogenization process, all stations using all stations for homogenization, and all stations with no homogenization (only TOBs adjustments).
This graph was created by our co-author Troy Masters (who blogs as Troy_CA). Its an important part of our lengthly analysis of the possibility of urban “spreading” due to homogenization. We find that while using only urban stations to homogenize does increase the trend (suggesting that urban spreading is indeed a possible concern), the results of using all stations to homogenize are effectively identical to those of using only rural stations (which, by definition, cannot spread in any urban signal provided they are sufficiently rural, something that we try to ensure by examining multiple different urbanity proxies). Our supplementary information contains more figures examining the specific breakpoints detected and adjustments made by urban and rural stations separately during the pairwise homogenization runs.
We really tried to ensure that we were examining the homogenization process in a way that would avoid adjusting rural stations to be similar to urban stations. I hope folks will take the time to read that section of the paper in depth, as well as to look at the figures in the supplementary materials. Also, our data and code is up at the NCDC FTP site, and I’d encourage people to play around with it.
@Zeke, thanks.
It would be a non issue except that some folks are seeing claims that aren’t there. Bob Ward and Scott Mandia for example. I hope that you’ll point this conflation out to them and others when unsupportable claims about siting are made in reference to this paper.
Also, you may not have seen my update in the body related to the figure before writing your comment.
I have an open question- with all of the analysis that is given to the CONUS data set, is it given more weight in the global data sets as an accurate sample of data or are the global data sets simply a spatial “average” from data around the globe?
There’s no way to get accurate conclusions from inaccurate data.
It reminds me of the lady that accidentally added salt to her tea instead of sugar then spent the rest of the morning adding different ingredients from her cupboard to negate the salt. As you can imagine, the situation just got worse and worse until she finally dumped the mess and started with a new cup of tea.
We should all go back to the beginning and do it right.
Thanks Anthony. Here is the section of our paper discussing the tests we did around the potential for homogenization “spreading” urban signal to rural stations. Note that the figures referenced are in the supplementary materials (on the NCDC FTB site): ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/papers/hausfather-etal2013-suppinfo/hausfather-etal2013-supplementary-figures.pdf
In all of the urbanity proxies and analysis methods,the differences between urban and rural station minimum temperature trends are smaller in the homogenized data than in the unhomogenized data, which suggests that homogeniza- tion can remove much and perhaps nearly all (since 1930) of the urban signal without requiring a specific UHI correction. However, the trends in rural station minimum temperatures are slightly higher in the homogenized minimum temperature data than in the TOB-only adjusted data. One possible reason for this is that the PHA is appropriately removing inhomoge- neities caused by station moves or other changes to rural sta- tions that have had a net negative impact on the CONUS aver- age bias (e.g., many stations now classified as rural were less rural in the past because they moved from city centers to air- ports or wastewater treatment plants). Another possibility is that homogenization is causing nearby UHI-affected stations to “correct” some rural station series in a way that transfers some of the urban warming bias to the temperature records from rural stations. In such a case, a comparison of the homogenized data between rural and urban stations would then show a decreased difference between the two by removing the appearance of an urbanization bias without actually removing the bias itself.
To help determine the relative merits of these two explanations, the PHA was run separately allowing only rural-classified and only urban-classified Coop stations to be used as neighbors in calculating the PHA corrections for USHCN stations. In Figure 9, the spatially averaged U.S. minimum temperature anomalies for rural stations are shown for the four different data sets: the unhomogenized (TOB-adjusted only); the version 2 (all-Coop-adjusted; v2) data; the homogenized data set adjusted using only coop stations classified as rural; and the homogenized data set adjusted using only urban coop stations.
The large difference in the trends between the urban- only adjusted and the rural-only adjusted data sets suggests that when urban Coop station series are used exclusively as reference series for the USHCN, some of their urban-related biases can be transferred to USHCN station series during homogenization. However, the fact that the homogenized all- Coop-adjusted minimum temperatures are much closer to the rural-station-only adjustments than the urban-only adjustments suggests that the bleeding effect from the ISA-classified urban stations is likely small in the USHCN version 2 data set. This is presumably because there are a sufficient number of rural stations available for use as reference neighbors in the Coop net- work to allow for the identification and removal of UHI-related impacts on the USHCN temperature series. Furthermore, as the ISA classification shows the largest urban-rural difference in the TOB data, it is likely that greater differences between rural- station-only-adjusted and all-coop-adjusted series using stricter rural definitions result from fewer identified breakpoints because of less network coverage, and not UHI-related aliasing. Nevertheless, it is instructive to further examine the rural-only and urban-only adjustments to assess the consequences of using these two subsets of stations as neighbors in the PHA.
Figure S2 shows the cumulative impact of the adjustments using the rural-only and urban-only stations as neighbors to the USHCN. In this example, the impermeable surface extent was used to classify the stations. The cumulative impacts are shown separately for adjustments that are common between the two runs (i.e., adjustments that the PHA identified for the same stations and dates) versus those that are unique to the two separate urban-only and rural-only reference series runs. In the case of both the common and unique adjustments, the urban-only neighbor PHA run produces adjustments that are systematically larger (more positive) than the rural-only neighbor run. The magnitude of the resultant systematic bias for the adjustments common to both algorithm versions is shown in black. The reason for the systematic differences is probably that UHI trends or undetected positive step changes pervasive in the urban-only set of neighboring station series are being aliased onto the estimates of the necessary adjustments at USHCN stations. This aliasing from undetected urban biases becomes much more likely when all or most neighbors are characterized by such systematic errors.
Figure S3 provides a similar comparison of the rural-only neighbor PHA run and the all-Coop (v2) neighbor run. In this case, the adjustments that are common to both the rural-only and the all-Coop neighbor runs have cumulative impacts that are nearly identical. This is evidence that, in most cases, the Coop neighbors that surround USHCN stations are sufficiently “rural” to prevent a transfer of undetected urban bias from the neighbors to the USHCN station series during the homogenization procedure. In the case of the adjustments that are unique to the separate runs, the cumulative impacts suggest that the less dense rural-only neighbors are missing some of the negative biases that occurred during the 1930–1950 period, which highlights the disadvantage of using a less dense station network. In fact, the all-Coop neighbor v2 data set has about 30% more adjustments than the rural-only neighbor PHA run produces. Results using the other three station classification approaches are similar and are provided as Figures S3–S8.
@Zeke
I really appreciate your input and explanation of your intents. It is unfortunate that supercharged climate conversations are taking well-intentioned work and misrepresenting what it says, or intends.
However, that said, there is a pretty harsh reality bearing on this ‘temperature record’ business. Wild claims are being bandied about and while the poisoned atmosphere of (to me) crazy climate agendas continues, scientists have a beholden duty to try to present their work in full context. To me, it seems clear you are aware of and are trying to separate UHI from siting issues but I think Anthony’s point is well taken: how can you make a silk purse out of a sow’s ear? Your charts will show what you are able to from the ‘all data’ but if the quality of the station data is known to fall into standard buckets from 1 to 5, what is demonstrated by failing to exclude the low quality input?
In other words you get an answer and you get charts, but what have we learned, even if you are only “dealing with anomalies”? Much of the raw data itself is anomalous. Whatever value your analysis has, it could be presented in a manner that disallows easy misrepresentation of the calculated results, which is I fear, too easy at the moment.
The chart above does not show the true difference between urban and rural and especially how that difference is widening over time.
The lines in the graph are almost all, one single line from 1961-1990. That is because the base-period is 1961-1990 which is smack-dab in the middle of the data so that the difference over time is obscured to the maximum amount possible. Red-Urban line is lowest at the beginning, Red-Urban line is highest at the end.
The base period should be changed to 1895-1924 and then we could actually see how much difference there is between Urban and Rural and how that has changed over time. Then all the calculations should be redone to present a true picture.
Difference at the beginning of the data using a 1961-1990 base period, Urban is 0.25C lower than Rural temps. Difference at the end of the data, Urban is 0.25C higher than Rural. What the average difference over the whole data period – Zero. All kinds of unusual statistical trends appear on their own when the data is set-up this way.
I’ve made this point before but it doesn’t seem to sink in.
Too many assumptions in this report in regards to the homogenization justifications to my liking.
The one thing I can agree with is the notion that provided nothing else has changed around both a rural and urban station a trend is valid even if the temps are different.
The thing is as always nothing stays the same.
A rural station is still compromised if it now sits above a concrete pad (to avoid getting your shoes wet when checking) rather then grass. Still classified as rural though just with urban features.
A station according to this is classed as urban if it sits within the limits of a centre of 1000 people or more. This is biased from the start, just because it is classified one way or the other does not make it so. A rural station under these criteria can have more UHI bias if it is poorly sited then an urban one in the middle of a large park.
What one could conclude from this is that once again it proves that we can not homogenize any data and that to really be true to ourselves in regards to absolute temps we should only look at temp data from stations where we know that nothing has changed that could influence the reading since original siting (not many of them around anymore).
From what I have seen of the readings from such stations the “trend” is vastly different then the homogenized line.
But this would now be classified as localized data with no relevance to the global trend.
However if they are true to the above statement that a trend is a trend no matter where the station is sited that should not be an issue.
It is the trend we are looking for not the absolutes.
Bill Illis,
That graph does not show urban and rural temperatures. You want Figs. 3-6 in our paper for a good example of that.
[Thank you for the courtesy of your reply. Mod]
‘I never quite understand why Menne and Hausfather think that they can get a good estimate of temperature by statistically smearing together all stations, the good, the bad, and the ugly, and creating a statistical mechanism to combine the data. Our approach in Watts et al is to locate the best stations, with the least bias and the fewest interruptions and use those as a metric (not unlike what NCDC did with the Climate Reference Network, designed specifically to sidestep the siting bias with clean state of the art stations). As Ernest Rutherford once said: “If your experiment needs statistics, you ought to have done a better experiment.”’
I was going to write an extended comment on this but Crispin in Waterloo has done it.
No discussion of statistical methods can explain why some data points are devalued in the course of statistical analysis. The very fact that devaluing takes place renders the statistical work non-empirical. Mainstream climate science, apparently following the lead of Mann and the paleo crowd, has failed to understand that they must undertake the necessary empirical work to give a robust integrity to each of their data points. There is not one among these scientists who has instincts for the empirical.
“homogenization takes the well sited stations and adjusts them to be closer to the poorly sited stations, essentially eliminating good data by mixing it with bad.” This reminds me of my old Chemistry professor’s explanation of ‘entropy’. “If you mix a tablespoon of fine wine with a liter of sewage you get sewage; you mix a tablespoon of sewage with a liter of fine wine you get sewage; “
outheback says:
“However if they are true to the above statement that a trend is a trend no matter where the station is sited that should not be an issue. It is the trend we are looking for not the absolutes.”
Such a trend on what is effectively urbanization is good for what exactly???
To evaluate the effect of CO2 upon global temperatures beyond the baseline natural warming since the little ice age, we need absolutely clean data, ie, numbers which are from a long term pristine site. If we only have 50 such land sites world wide, so be it. Thats all we have to work with. Add that to the validated ocean measurements.
This sounds eerily reminiscent of the economic geniuses at Citi & Goldman Sachs talking about the safety of derivitives of mortgage backed securities. The logic was similar. You package good and bad mortgages together and you end up with AA rated securities because you assume that only a small percentage will go bad. The problem is when you mix bad mortgages with good (or bad data with good) you end up with bad derivitaves, not good ones because you can’t separate the bad from good and you end up not trusting any of the derivitaves.
I know I could have worded this better…
Bill Illis’ argument surely has to be correct, and certainly makes sense to a non-scientist like me.
You surely cannot use a part of the timescale you are studying as the baseline for comparison. In my world that’s usually called cheating.
“If you torture the data long enough, it will confess.”
Somehow in the field of climate “science” the concept of altering temp data based on a supposed understanding of bias in 100-year-old data has become acceptable. This is not acceptable in any other field that I have seen. NASA and NOAA apparently do it every day.
It has never made any sense. The raw data are what they are, whether from a liquid-in-glass thermometer, pulled up from a bucket of ocean water, the water intake of a sea-going ship, a thermocouple with a short cable, whatever. UHI is real, everyone knows this. Badly sited thermometers give inaccurate readings, everyone knows this. Cold air flows downhill, so if the thermometer is in the lowest point of land in a particular area, it will read cold at night, everyone who has taken a walk after sundown knows this.
Technically educated people will always be skeptical of adjustments to data. It is a lie to say that a particular day, week, month, year, or decade was warmer or cooler than another if the raw data has been adjusted. The raw data was taken for specific purposes and it was good enough for those who took it, deal with it.
Depends on what you mean by “devalued”. If you mean “not treated equally”, there could be a good statistical reason for that. But if the equipment use to measure all data is statistically similar in behavior, no devaluation is justifiable.
Zeke Hausfather;
First of all, the baseline period used is 1961-1990, which forces agreement over that period (as we are dealing with anomalies, and comparing trends, the absolute offsets are somewhat irrelevant).
>>>>>>>>>>>>>
I repeat my question to you from another thread which you have failed to answer. What is the justification for averaging anomalies from completely different baseline temperatures when these represent completely different flux levels? For example an anomaly of 1 from a base temperature of -30 represents a change in flux of 2.89 w/m2. How do you justify averaging it with an anomaly of 1 from a baseline of +30 which would represent a change in flux of 6.34 w.m2?
The standings on this question so far are:
Zeke Hausfather: No Response
Steven Mosher: No Response
Joel Shore: No single metric is right or wrong
Robert G Brown; There is no justification
Richard S Courtner; There is no justification