Circularity of homogenization methods

Guest post by David R.B. Stockwell PhD

I read with interest GHCN’s Dodgy Adjustments In Iceland by Paul Homewood on distortions of the mean temperature plots for Stykkisholmur, a small town in the west of Iceland by GHCN homogenization adjustments.

The validity of the homogenization process is also being challenged in a talk I am giving shortly in Sydney, at the annual conference of the Australian Environment Foundation on the 30th of October 2012, based on a manuscript uploaded to the viXra archive, called “Is Temperature or the Temperature Record Rising?”

The proposition is that commonly used homogenization techniques are circular — a logical fallacy in which “the reasoner begins with what he or she is trying to end up with.” Results derived from a circularity are essentially just restatements of the assumptions. Because the assumption is not tested, the conclusion (in this case the global temperature record) is not supported.

I present a number of arguments to support this view. 

First, a little proof. If S is the target temperature series, and R is the regional climatology, then most algorithms that detect abrupt shifts in the mean level of temperature readings, also known as inhomogeneities, come down to testing for changes in the difference between R and S, i.e. D=S-R. The homogenization of S, or H(S), is the adjustment of S by the magnitude of the change in the difference series D.

When this homogenization process is written out as an equation, it is clear that homogenization of S is simply the replacement of S with the regional climatology R.

H(S) = S-D = S-(S-R) = R

While homogenization algorithms do not apply D to S exactly, they do apply the shifts in baseline to S, and so coerce the trend in S to the trend in the regional climatology.

The coercion to the regional trend is strongest in series that differ most from the regional trend, and happens irrespective of any contrary evidence. That is why “the reasoner ends up with what they began with”.

Second, I show bad adjustments like Stykkisholmur, from the Riverina region of Australia. This area has good, long temperature records, and has also been heavily irrigated, and so might be expected to show less warming than other areas. With a nonhomogenized method called AWAP, a surface fit of temperature trend last century shows cooling in the Riverina (circle on map 1. below). A surface fit with the recently-developed, homogenized, ACORN temperature network (2.) shows warming in the same region!

Below are the raw minimum temperature records for four towns in the Riverina (in blue). The temperatures are largely constant or falling over the last century, as are their neighbors (in gray). The red line tracks the adjustments in the homogenized dataset, some over a degree, that have coerced the cooling trend in these towns to warming.

clip_image004

It is not doubted that raw data contains errors. But independent estimates of the false alarm rate (FARs) using simulated data show regional homogenization methods can exceed 50%, an unacceptable high rate that far exceeds the generally accepted 5% or 1% errors rates typically accepted in scientific methods. Homogenization techniques are adding more errors than they remove.

The problem of latent circularity is a theme I developed on the hockey-stick, in Reconstruction of past climate using series with red noise. The flaw common to the hockey-stick and homogenization is “data peeking” which produces high rates of false positives, thus generating the desired result with implausibly high levels of significance.

Data peeking allows one to delete the data you need to achieve significance, use random noise proxies to produce a hockey-stick shape, or in the case of homogenization, adjust a deviant target series into the overall trend.

To avoid the pitfall of circularity, I would think the determination of adjustments would need to be completely independent of the larger trends, which would rule out most commonly used homogenization methods. The adjustments would also need to be far fewer, and individually significant, as errors no larger than noise cannot be detected reliably.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

137 Comments
Inline Feedbacks
View all comments
richardscourtney
October 15, 2012 7:43 am

David Stockwell:
I think you will want to read
http://www.publications.parliament.uk/pa/cm200910/cmselect/cmsctech/memo/climatedata/uc0102.htm
especially its Appendix B.
Richard

Ed Reid
October 15, 2012 7:53 am

It seems ‘very late in the game” to be discussing such fundamental issues. We’ve spent more than $100 billion in the US alone on climate research; and, we’ve basically “screwed up” the analysis of the data. Brilliant!

Tom G(ologist)
October 15, 2012 8:17 am

What has been spent on climate research is as nothing compared to that which has been spent on hazardous site remediation since the Stupor-fund was implemented in 1980. I can attest personally that what is being presented in this current thread is virtually identical to the ways in which groundwater flow models have shaped the interpretation of groundwater quality data, which in turn has resulted in vastly over-excessive efforts being made to remediate groundwater “contamination” which has not been a threat to anyone or anything, and which have proved a complete bust because the little bits of ‘contamination’ which have been the target of the circular reasoning decision making process are not remediable by any means other than leaving it alone and let nature take its course – which DOES reuslt in remediation. It has been the largest, money-wasting, do nothing cluster-f@*& . What upsets me more tghan anything about the climate nonsense is that I am watching the inexorable glaciality of the EPA do it all over again – with the same batch of cretins who have simply been shifted from one division to another.
Stockwell has it right. Begin with end in mind and you can interpret whatever you want.

October 15, 2012 8:24 am

The equation: H(S) = S-D = S-(S-R) = R is wrong.
You do use the difference time series (D) to determine the size of the jump, but you do not replace all values by the ones in the regional climate signal.
H(S) = S + d
d = d1 – d2
d1 = mean(S-D) in the homogeneous period before jump
d2 = mean(S-D) in the homogeneous period after jump
d is a value not a time series. And a good homogenisation method does not assume that R is homogeneous.
For more information on homogenisation see:
http://variable-variability.blogspot.com/2012/08/statistical-homogenisation-for-dummies.html
Homogenisation is used to be able study large scale climate variability in a more accurate way. Removing the too low trend for an irrigated region, is what homogenisation is supposed to do. Just as it should remove the too high trend in case of urbanisation. If you are interested in the local climate (either for an agricultural study (irrigation) or urban climates) you should not use homogenised data or make sure that you have multiple stations in the same climatic region, which are all affected by the irrigation in a similar way.
David Stockwell, why don’t you submit an abstract at the General Assembly of the European Geophysical Union? There you would get more qualified feedback on the quality of your work.
http://meetingorganizer.copernicus.org/EGU2013/session/11619

October 15, 2012 8:35 am

@Victor Venema:
What give us any confidence that we are able to reliably Identify the “too high trend in the case of urbanization” when most of the data suffers from urbanization to some unknown degree? We simply do not have the control we need over the time period of study. We assume we know more than we really do.

October 15, 2012 8:38 am

Circularity or circular reasoning is the bias introduced by politically motivated “post modern science” that promotes subjective research. Temperature measurements are just one example of circular reasoning found in climate research. It is introduced as the primary “driving force” in both global mass and energy balance models.

Alan S. Blue
October 15, 2012 8:47 am

A standard thermometer with a 0.1C instrumental error (corrected for adiabatic lapse rate and humidity) does not have a 0.1C error on the measurement of the temperature 100m from it. The error is larger. And there is no guarantee that the error is even centered, let along normally distributed.
So the entire homogenization process has issues. Figuring out how to interpolate into all of the areas in which there simply are no measurements should not be influencing the few actual measurements that are present.
That is: There’s nothing fundamentally in error in having two thermometers 20km apart that read 2 degrees different. It’s quite possible the actual temperature -is- 2 degrees different. Likewise – one can envision reasons why temperature rising slightly over here might not be matched over there. (This is a forested area, that’s a desert. Or a mountainous valley. Or has wind from the south.)

Tim Ball
October 15, 2012 9:02 am

Circular arguments are the hallmark of IPCC climate science. The most fundamental one is the assumption that a CO2 increase causes a temperature increase. This is then built in to the computer models, which are constructed with most variables omitted or poorly known.
http://drtimball.com/2012/climate-change-of-the-ipcc-is-daylight-robberyclimate-change-of-the-ipcc-is-daylight-robbery/
My favourite omission is this one;
“Due to the computational cost associated with the requirement of a well-resolved stratosphere, the models employed for the current assessment do not generally include the QBO.”
http://www.ipcc.ch/publications_and_data/ar4/wg1/en/ch8s8-4-9.html
All this guarantees that the model will produce an increase in temperature with an increase in CO2, a result then used to argue that the original assumption is correct.
The problem was nature did not cooperate and despite the claims that atmospheric CO2 continued to rise, which was another self-constructed treadmill because the IPCC produce the annual human production numbers, temperature levelled and declined slightly. This decline occurred despite the apparent best efforts of NASA GISS and others. The problem was recognized at East Anglia not as a scientific problem but a PR one by their communications expert Asher Minns. He is a science communicator at the Tyndall Centre on the same campus, and in one of the leaked emails wrote.
He wrote,
“In my experience, global warming freezing is already a bit of a public relations problem with the media.”
Kjellen:
I agree with Nick that climate change might be a better labelling than global warming.”
Hopefully another form of circular argument will occur, namely that of the fate of the mythical Oozlum bird that flew in ever decreasing circles to fundamentally disappear.
http://en.wikipedia.org/wiki/Oozlum_bird

October 15, 2012 9:23 am

Stephen Rasey says: “@Victor Venema: What give us any confidence that we are able to reliably Identify the “too high trend in the case of urbanization” when most of the data suffers from urbanization to some unknown degree?”
If most of the data suffers from urbanization, homogenization would not remove the additional trend due to urbanization and the resulting trend would not be representative for the large scale climate.
If you know of a study that shows that most of the stations are affected by urbanization most of the time, please let me know. That would be interesting, as that would go against our current understanding of the problem that no more than a few percent of the data are affected by urbanization.

pdtillman
October 15, 2012 9:28 am

Dr. Stockwell’s presentation of the Raw vs “Corrected” temp records from the Australian Riverina are a striking demonstration of the “cool is warm” (or “Lies are Truth”) mindset in these apparently confirmation-biased adjustments. We really need a third party statistical reanalysis of national climate records, published in a respected journal and then appropriately publicized.

October 15, 2012 9:38 am

Much as I’d like to be able to conclude that you have disclosed a fundamental error in calculating temperatures, and therefore there is no need to worry about ‘thermageddon’, all I can realistically conclude is that you have not communicated clearly.
I do not mean to give offence. With respect, someone must convey to you how you need to amend your article to write clearly for an intelligent non-technical audience such as the WUWT readership, or even for newspapers.
I make the following comments in the hope you’ll take them seriously, and ideally, revise your article so that I (and presumably many others) can comprehend it.
Give me clear directions. Don’t worry about insulting my intelligence, just make sure you’re clear.
The primary rule of clear communication is that the meaning of your sentence can be determined from the meaning of the words you use in the sentence.
I acknowledge that jargon that disobeys this rule is often developed in many fields. This is why jargon is regarded as unclear, cryptic and obfuscatory. The claim is sometimes made that jargon permits researchers in a field to say things that cannot be said without it. That’s sometimes true, but most often it is a guise for fuzzy thinking. In any case it should be avoided or explained when writing for a non-technical audience.
An example is your expression “These techniques are circular.” With respect, that should be written as “These techniques are faulty because they use circular reasoning.” The techniques themselves can be examined for years without ever displaying a circular shape
At least there I understood your point.
When you get onto your proof, you are more obscure.
With the line “If S is the target temperature series…” I infer that you are denoting a set of temperature measurements that you wish to homogonise. Does this make it a target? Are you talking about a set of measurements from a single station, or from a group, or doesn’t it matter? What time period are you talking about? It may be that the time period doesn’t matter- if so, please tell us.
You use the expression ‘regional climatology’. My knowledge of the dictionary meaning of the words gives me no guidance as to what exactly you are trying to convey by the term. How do I determine what the region is that you are taslking about? The nearest ten kilometers, the nearest ten weather stations, some set arbitrarily chosen, some other method? Presumably this refers to the creation of a data set structured in a similar fashion to the set that you wish to adjust to correct it’s errors. If that’s what you mean, please spell it out.
It’s after three am here. I’d prefer to not continue to describe which uses of language I find unclear. I’d like to be able to quickly paraphrase your argument in a fashion that would demonstrate the style I recommend, e.g.
“Sometimes we want to adjust the records of a weather station for a change that we know has occurred; for example moving the site, or introducing a new thermometer that reads higher than the true temperature.
“The way we do this adjustment is to work out the average temperature series for the region. A region is determined (however it’s determined.) We compare this temperature series to the one that is to be adjusted. If we know the exact date, we get the value of the average difference between the station and and the surrounding region before the change, and that of the average difference between the two sets after the change, subtract oine from the other and declare that the temperature change caused by the change was the difference.
“However, whenever a station being adjusted has a trend different to that of the surrounding region, the adjustment formula wrongly adds the difference in trends to the adjustment figure.
“In times of rising temperature, this will wrongly record a higher figure for those stations that record falling temperatures, (or even those that rise more slowly than there surroundings) and consequently the total temperature set of the toother stations plus the corrected station, will give a higher figure than the true figure.”
Okay, I acknowledge that that’s hardly deathless prose, but it’s a step in the right direction. Of course if I’ve totally misunderstood your point, then you should be able to clearly see where I’ve come unstuck.
Regards
Leo Morgan

Ed Reid
October 15, 2012 9:39 am

We are dealing in a science with many “known unknowns” and an unknown number of “unknown unknowns”. (HT: Donald Rumsfeld)

October 15, 2012 9:48 am

Thanks David, your observations and reasoning are completely right. I worked on that problem also for some time and used for the results of this procedure the term that one “impress” to the assumed false time series the considered right trend of the “normal”.
That may be a good correction method in some cases, but as long as one have no possibility to prove the result, which is rather impossible in meteorology it remains a speculative method only, without much evidence. And is therefore, as you demonstrated, an circular argument. But this type of circular argument seemed to be used in other areas of climatology also.

Luther Wu
October 15, 2012 9:54 am

Victor Venema says:
October 15, 2012 at 9:23 am
If you know of a study that shows that most of the stations are affected by urbanization most of the time, please let me know. That would be interesting, as that would go against our current understanding of the problem that no more than a few percent of the data are affected by urbanization.
______________________
I do believe, Sir, that you have that exactly backwards.

Sun Spot
October 15, 2012 9:57 am

Victor Venema says: October 15, 2012 at 9:23 am re: “If you know of a study that shows that most of the stations are affected by urbanization most of the time, please let me know. That would be interesting, as that would go against our current understanding of the problem that no more than a few percent of the data are affected by urbanization.”
Victor, please reference your study that shows ” no more than a few percent of the data are affected by urbanization” ?

October 15, 2012 10:13 am

I did not study urbanization myself and it is a rather extensive literature. I got this statement from talking to colleagues with hands on experience in homogenization. Thus unfortunately I cannot give you a reference.
Contrary to the normal readers of this blog and being a scientist myself, I have no reason to expect my colleagues to be lying. If you know of a study that shows that most of the data has problems with urbanization, that would make me sufficiently interested to study the topic myself. Life is too short to follow every piece of misinformation spread on WUWT.
REPLY: Watts et al 2012 on the sidebar, soon to be updated to handle the TOBs (non)issue, pretty much says all you need to know. All homogenization does is smear the error around. – Anthony

richardscourtney
October 15, 2012 10:38 am

Victor Venema:
At October 15, 2012 at 10:13 am you say

I did not study urbanization myself and it is a rather extensive literature. I got this statement from talking to colleagues with hands on experience in homogenization. Thus unfortunately I cannot give you a reference.
Contrary to the normal readers of this blog and being a scientist myself, I have no reason to expect my colleagues to be lying. If you know of a study that shows that most of the data has problems with urbanization, that would make me sufficiently interested to study the topic myself. Life is too short to follow every piece of misinformation spread on WUWT.

You were not asked to cite the entire literature. You were asked to cite only one reference to justify your assertion that “no more than a few percent of the data are affected by urbanization”. And you admit you can’t.
On WUWT an assertion needs to be justified because this is a science blog frequented by very many scientists. And scientists ‘trust but verify’. No real scientist asserts something as being true merely because some chums said it. Indeed, on WUWT we track down and reveal misinformation of the kind you have asserted but cannot justify.
And we have experienced enough behaviour of the Team to know that nothing asserted by members of the AGW-industry can be taken as being true unless there is good evidence to support it. If you don’t like that then take it up with the members of the ‘Team’ whose nefarious activities were revealed by their own words in the ‘climategate’ emails.
Richard

October 15, 2012 10:40 am

“Watts et al 2012 on the sidebar, soon to be updated to handle the TOBs (non)issue, pretty much says all you need to know. All homogenization does is smear the error around. – Anthony”
I already discussed the problems of Watts et al (2012) and SkS did an even more extensive piece detailing even more errors.
I am sorry to have to say, that the time of observation bias (TOB) is not the only problem of this manuscript. The fundamental problem is that the quality of the stations is assessed at the end, while the trend is computed over the full period without having information on how the quality of the station changed. I see no way to solve this problem (without homogenization). And as the first version of the manuscript already showed, after homogenization there are no problems any more, the trends are similar for all quality classes.
Anthony, as long as you keep on claiming that homogenization only smears the error, I have trouble taking you seriously and can only advice you to try to understand the fundamentals better. That could help in making your critics more qualified.
The work page of the manuscript is very quiet, you can hear the crickets. Still, I look forward to the updated manuscript and I am glad that you are taking your time to improve it.

REPLY:
Since there are people who are trying to actively discredit it (such as yourself), I decided not to update the work page regularly until we had our full revision completed. As for the end, well if you can find more metadata, we’ll use it. No matter what you say though, homgenization simply smears the errors around, and Dr. Stockwell demonstrates. Be as upset as you wish, because I really don’t care if you take me seriously or not. I don’t do this to earn your approval. People like yourself were happy to accept Fall et al when we didn’t find as strong a signal. Your bias is showing. – Anthony

Steve C
October 15, 2012 10:45 am

Victor Venema says:
“Life is too short to follow every piece of misinformation spread on WUWT.”
You should try the alarmist sites sometime.

October 15, 2012 10:57 am

richardscourtney says: “On WUWT an assertion needs to be justified because this is a science blog frequented by very many scientists.
🙂
richardscourtney says: “And scientists ‘trust but verify’.”
You cannot live without trusting something. If this is the point you do not trust. Go study it. If you can proof that most of the data is affected by significant warming due to urbanization, you will be a hero. It may be a bit to applied, but I would give you a chance for a Nobel price. At least there will be a few millions from the Koch brothers.
In the mean time, I will verify something else, whether we can trust the studies on changes in extreme weather will be the topic of my research for the coming years. I expect that that is more fruitful. That is a topic where I am sceptical.
Feel free to prove me wrong and that urbanization is the hot topic. If you do, you will also have to explain why the trend in the satellite temperature and in the reference climate network at pristine locations is about the same as the one of the homogenized surface network. And why the trend in the rural stations is about the same as the ones in the urban stations. In my estimate the chance to become a hero by studying urbanization is close to zero, but if your estimate is higher, please keep me informed about your studies. I would be interested.
REPLY: LOL! The fact that the Climate Reference Network exists at all is proof of the fact that NCDC takes the issue of UHI and siting seriously. It has four years of complete data (since 2008) and you call it “the same” as the old network, yet in your other argument you claim I don’t have enough years of metadata to establish siting trends. You can’t have it both ways. Make up your mind, because your bias is laughable. – Anthony

Editor
October 15, 2012 10:58 am

@Victor Venema
If you know of a study that shows that most of the stations are affected by urbanization most of the time, please let me know. That would be interesting, as that would go against our current understanding of the problem that no more than a few percent of the data are affected by urbanization.
According to Richard Muller
Urban areas are heavily overrepresented in the siting of temperature stations: less than 1%
of the globe is urban but 27% of the Global Historical Climatology Network Monthly
(GHCN-M) stations are located in cities with a population greater than 50,000.

Then add in the smaller urban sites (even comparatively small towns will have UHI effect, particularly a growing town). And you get a significant number.
http://berkeleyearth.org/pdf/uhi-revised-june-26.pdf

Luther Wu
October 15, 2012 11:00 am

Victor Venema-
Even though you are a scientist, you are also welcome here as a layman, as is apparently your current role. Making unfounded assertions will earn you a challenge, every time.
VV said: “ Life is too short to follow every piece of misinformation spread on WUWT.
_________________
Cite one example, please- just one…

DirkH
October 15, 2012 11:02 am

Victor Venema says:
October 15, 2012 at 10:13 am
“Life is too short to follow every piece of misinformation spread on WUWT.”
Whoa. Who gives me back the hours I spent with that abomination, the IPCC AR 4, and all the journalistic drivel built on top of it.
(This is not an attack on scientists, as the IPCC AR4 has mostly not been written by scientists.)

Editor
October 15, 2012 11:09 am

@Victor Venema
you will also have to explain why the trend in the satellite temperature and in the reference climate network at pristine locations is about the same
Not true.
Satellite records show significantly less warming than GISS does since 1979.

outtheback
October 15, 2012 11:22 am

Victor Venema
Homogenization of temp’s is a flawed process at the best of times.
While the land use in a rural area tends not to change much throughout the seasons and years, the variations can be calculated into it, pasture will be pasture year round, cropland will lay bare for a period of time and in summer there will be irrigation on both, if and where needed, all causing minor variations. Providing there is no major change in land use, pasture to forest or to urban, any variation stays constant over the seasons. If one is to homogenize this it should be done per season, or better still take the temp difference of the irrigation into account, this will have a wave effect on the temp, lower on the day after irrigation and increasing until the next one.
Homogenization of urban temps are always behind the eight ball. Before the temp increase due to a new development has influenced the mean temp and therefore has an effect on your “d2” a long time (years?) has gone by and will therefore always indicate a homogenized temp higher then what it should be. In the meantime the next development will do its trick and so on.
There should not be any use of temp stations in urban areas to calculate any regional (country/continent) trend let alone on a global scale.
Urban temp readings are only any good for the citizens of that city to let them know how warm it got in their built up area.

1 2 3 6