Guest post by David R.B. Stockwell PhD
I read with interest GHCN’s Dodgy Adjustments In Iceland by Paul Homewood on distortions of the mean temperature plots for Stykkisholmur, a small town in the west of Iceland by GHCN homogenization adjustments.
The validity of the homogenization process is also being challenged in a talk I am giving shortly in Sydney, at the annual conference of the Australian Environment Foundation on the 30th of October 2012, based on a manuscript uploaded to the viXra archive, called “Is Temperature or the Temperature Record Rising?”
The proposition is that commonly used homogenization techniques are circular — a logical fallacy in which “the reasoner begins with what he or she is trying to end up with.” Results derived from a circularity are essentially just restatements of the assumptions. Because the assumption is not tested, the conclusion (in this case the global temperature record) is not supported.
I present a number of arguments to support this view.
First, a little proof. If S is the target temperature series, and R is the regional climatology, then most algorithms that detect abrupt shifts in the mean level of temperature readings, also known as inhomogeneities, come down to testing for changes in the difference between R and S, i.e. D=S-R. The homogenization of S, or H(S), is the adjustment of S by the magnitude of the change in the difference series D.
When this homogenization process is written out as an equation, it is clear that homogenization of S is simply the replacement of S with the regional climatology R.
H(S) = S-D = S-(S-R) = R
While homogenization algorithms do not apply D to S exactly, they do apply the shifts in baseline to S, and so coerce the trend in S to the trend in the regional climatology.
The coercion to the regional trend is strongest in series that differ most from the regional trend, and happens irrespective of any contrary evidence. That is why “the reasoner ends up with what they began with”.
Second, I show bad adjustments like Stykkisholmur, from the Riverina region of Australia. This area has good, long temperature records, and has also been heavily irrigated, and so might be expected to show less warming than other areas. With a nonhomogenized method called AWAP, a surface fit of temperature trend last century shows cooling in the Riverina (circle on map 1. below). A surface fit with the recently-developed, homogenized, ACORN temperature network (2.) shows warming in the same region!
Below are the raw minimum temperature records for four towns in the Riverina (in blue). The temperatures are largely constant or falling over the last century, as are their neighbors (in gray). The red line tracks the adjustments in the homogenized dataset, some over a degree, that have coerced the cooling trend in these towns to warming.
It is not doubted that raw data contains errors. But independent estimates of the false alarm rate (FARs) using simulated data show regional homogenization methods can exceed 50%, an unacceptable high rate that far exceeds the generally accepted 5% or 1% errors rates typically accepted in scientific methods. Homogenization techniques are adding more errors than they remove.
The problem of latent circularity is a theme I developed on the hockey-stick, in Reconstruction of past climate using series with red noise. The flaw common to the hockey-stick and homogenization is “data peeking” which produces high rates of false positives, thus generating the desired result with implausibly high levels of significance.
Data peeking allows one to delete the data you need to achieve significance, use random noise proxies to produce a hockey-stick shape, or in the case of homogenization, adjust a deviant target series into the overall trend.
To avoid the pitfall of circularity, I would think the determination of adjustments would need to be completely independent of the larger trends, which would rule out most commonly used homogenization methods. The adjustments would also need to be far fewer, and individually significant, as errors no larger than noise cannot be detected reliably.
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.

richarscourtney,
“what does the GHCN “pairwise homogenization method” do to the data and what are the answers to my questions with respect to the effect(s) of that method?”
In principle, the GHCN treatment only removes the discontinuities. The effects of this treatment are described in the link I’ve given above (Hansen et al. 2001):
“However, caution is required in making adjustments, as it is possible to make the long-term change less realistic even in the process of eliminating an unrealistic short-term discontinuity. Indeed, if the objective is to obtain the best estimate of long-term change, it might be argued that in the absence of metadata defining all changes, it is better not to adjust discontinuities.”
Anthony Watts:
Thankyou for your post at October 17, 2012 at 9:22 am. It removes the need for me to answer the post at October 17, 2012 at 9:14 am from Victor Venema.
I only add that Victor Venema has evaded my question and that the issue of “the wrong method” is obfuscation because – as I said – the different teams producing the global (and hemispheric) temperature time series each uses a different homogenisation method and they do not report the advantages/disadvantages of their methods.
Richard
REPLY: The biggest problem is that none of the keepers of these datasets show much interest in the measurement environment and its effects on the final product. If this were forensic science used in court, the data and the conclusions from it would be tossed out due to contamination. But in climate science, such polluted data is considered worthwhile. – Anthony
phi:
re your post at October 17, 2012 at 9:36 am.
Yes, I know. Thankyou.
I repeat your quote from Hansen et al. to ensure that nobody misses it and because it is probably the only agreement I have with Hansen.
““However, caution is required in making adjustments, as it is possible to make the long-term change less realistic even in the process of eliminating an unrealistic short-term discontinuity. Indeed, if the objective is to obtain the best estimate of long-term change, it might be argued that in the absence of metadata defining all changes, it is better not to adjust discontinuities.”
Richard
Anthony:
re your REPLY to me at October 17, 2012 at 9:36 am.
I have good reason to agree your statement that “none of the keepers of these datasets show much interest in the measurement environment and its effects on the final product”.
Indeed, this disinterest is not new. In the 1990s I was undertaking a field trip that involved my visiting three African countries and I offered to spend time examining met. stations while there. Phil Jones was not interested in my offer.
Richard
“AndyG55”:
Your claim that the period of urbanization being 30 years is proof of why temperatures have been constant for 15 years seems illogical.
Urbanization is a continual process which has covered only a very small portion of the earth’s surface. It did not begin 30 years ago. Has it stopped?
Higher density and more pavement instead of dirt/gravel could be differences, but factors need to be properly studied.
If there is a significant rate of increase in urbanization and that affects accuracy of many more measurement stations, then there will be a false trend in the data.
Leo Morgan;
Heh. Your critique of “understandability” of prose is on point, but then (as you smell the barn and rush towards your conclusion) you fall victim to Muphry’s Law, and give us your own howler: “the total temperature set of the toother stations”. What do teeth have to do with stations? 😉
Presuming you’re using a PC to type on, depressing Alt and entering 248 on the numeric keypad gives you ° .
E.g. 1° 2° 3°
Prima facie, if homogenization consistently results in, e.g., adjusting older measurements down and recent measurements up, that implies knowledge of some way in which thermometers and/or methods of reading them were all biased warm in the past and cool in the present. This is so implausible that it would require very thorough and detailed documentation and analysis to justify. Which, of course, is entirely absent.
The conclusion is thus that the result desired is determining the nature and direction of the adjustments. Which is a Feynman-sin of the highest order.
Victor Venema writes, “Validation studies of homogenization methods are regularly performed. A recent blind benchmarking study of mine was very similar to the way you like the validation of homogenization methods to be done. Any comments on this paper are very welcome. We plan to perform similar validation studies in future. Thus if you have good suggestions, we could implement them in the next study.”
One of the things that I think you should test your method against is UHI effect. For the purposes of a test, assume 100 years of data from 100 stations. Assume that 10% of the stations over 100 years have a 0.5 degree upwards bias due to UHI. Assume 10% of them have a 1.0 degree bias due to UHI, Assume that 10% have a 1.5 degree bias due to UHI. Assume that the other biases cancel out. Assume that the overall real temperature change is an increase in 0.5 degree C. That is for the test purpose what the average temperature change of all 100 stations would be if there were no UHI. How well does your homogenization algorithm do in this case? The various percentage numbers and temperature increases should be variables that can be changed. And the test should be done repeatedly to see how well it performs with various percent of stations with UHI. A good performance is one in which the UHI effect is reduced and the end result is more accurate than simply averaging the temperature readings together. Graphing the results showing how well it performs as the overall percent of stations with UHI increases would be helpful. Basically, find if there are points at which the homogenization algorithm performs well and where it performs poorly.
Next do the same test but with the assumption that there are stations with high levels of UHI clustered together in groups of 3 but less than 6 for example.
Those test cases would be a good start.
Let’s hope this comment is allowed. Interesting that a blog that complains about SkS, rejects harmless comments itself.
BobG says: “One of the things that I think you should test your method against is UHI effect. “
In my own validation study, we included local trends to simulate the UHI effect.
It would be interesting to increase this fraction of stations with local trends to unrealistic high numbers and quantify at which moment homogenization algorithms no longer work well. Maybe someone can correct me, but as far as I know such a study has not been done yet. It is on my list.
Victor Venema:
At October 18, 2012 at 2:39 am you make the fallacious assertion
WUWT doesn’t and posts when where and from whom a comment has been snipped.
SkS does and makes no mention of having censored comments.
Now, please answer my question that you have repeatedly evaded with pathetic excuses. I remind that it is
When using the method which you claim to be correct and use for homogenisation
(a) What “non-climatic changes” would require such large alteration to the data of the example?
and
(b) How does the “removal” of those “non-climatic changes” affect the reliability and the accuracy and the precision of the data?
You have repeatedly told us you are a scientist and any scientist has such information about the method he/she uses and reports it.
Richard
Dear mod,
richardscourtney is citing my remark on the removal of a comment of mine on Watts et al. by Anthony.
Dear Richard,
Question (a) was already answered. Irrigation close to a measurement station leads to a local temperature effect that is not representative for the large-scale climate and is non-climatic in this sense. If it is not a local effect, but multiple stations are affected, then you can keep it. In this example there were multiple stations, Stockwell could have kept the signal, but he has chosen to use the wrong reference (the mean over all of Australia, in stead of the mean over the direct neighbours) and consequently removed the cooling effect in this region.
I cannot answer question (b) with a number, this erroneous study was not mine, you will have to ask Stockwell. Thus may I answer with a return question. You do expect climatologists to remove the temperature trends due to urbanization (an in crease in the urban heat island), why do you see irrigation near a single station as a different case? That sounds in consistent to me.
In general you could read the validation study I have linked here several times to see that homogenization, the removals of non-climatic changes, improves the accuracy of temperature data. How much the improvement will be, depends on the specific case.
Victor Venema,
Please, could you tell us by who Hansen et al. 2001(http://pubs.giss.nasa.gov/docs/2001/2001_Hansen_etal.pdf) has been refuted or explain why you did not heed his warnings :
“…if the discontinuities in the temperature record have a predominance of downward jumps over upward jumps, the adjustments may introduce a false warming…”
“However, caution is required in making adjustments, as it is possible to make the long-term change less realistic even in the process of eliminating an unrealistic short-term discontinuity. Indeed, if the objective is to obtain the best estimate of long-term change, it might be argued that in the absence of metadata defining all changes, it is better not to adjust discontinuities.”
This applies exactly to the type of homogenization performed with GHCN.
Dear Phi,
I only had a short look at the text, but if I understand it correctly Hansen is talking about homogenization using metadata only (data about data, the station history), not about statistical homogenization. At the time there were no good automatic statistical homogenization methods.
The problem with homogenization using only metadata is that typically only the discontinuities are documented, the gradual changes are not. Discontinuities can be caused by relocation, changes in the instrumentation or screen. These are the kind of thing that leave a paper trail. Gradual changes are due to urbanization or growing vegetation, which are typically not noted and whose magnitude is not known a-priory.
If you only homogenize discontinuities and not the gradual changes you can introduce an artificial trend. Imagine a saw tooth signal, which does not have a long term trend. It slowly goes up and after some time jumps down again (multiple times). If you would only remove the jumps, the time series would continually go up and the trend would worse.
Thus if you homogenize, you should homogenize all inhomogeneities, the discontinuities, but also the gradual ones. In the above example of a saw tooth signal the trend would again be flat if you also correct the slowly upward parts. That is what Hansen is saying. I fully agree with that.
It is very good to use metadata. If the size of the breaks is know from parallel measurements to adjust these jumps. Also the time of observation bias corrections are an example of using metadata to homogenize a climate record. However, additionally you should always also perform relative statistical homogenization by comparing a station with its neighbours (in which you can again use metadata to precise the data of the breaks).
I hope that answers your question.
Victor Venema:
Thankyou for your answer to me at October 18, 2012 at 8:00 am. Please note that I genuinely appreciate your taking the trouble. Unfortunately, I am underwhelmed by the answer.
You say
Thankyou for that. I can see how use of that different reference may avoid the changing of sign in the trend in South Australia. But I am not aware of any “irrigation” that has been conducted over the vast area of the bulk of central Australia. Indeed, I am not aware of any significant anthropogenic effects over the bulk of that great area.
I stress that I do recognise the homogenisation method used in the above article was NOT the method which you apply, but I asked about “When using the method which you claim to be correct and use for homogenisation”. And you point to this in your reply which I quote. But your answer omits important information; i.e.
Does your method not alter the temperature data over the bulk of the central region of Australia and if it does then why?
You also say
That is an avoidance of the question because I asked about “When using the method which you claim to be correct and use for homogenisation”.
Of course, it may be that your studies do not cover the regions of Australia and, therefore, you do not have the requested data available to you. Clearly, in that case, you are entitled to say you are not willing to repeat the study using your method merely because I have asked for the data. Indeed, why should you bother?
However, I am surprised if the global data on which you work does not include Australia. And you have not said that is the case. Instead, you say “this erroneous study was not mine, you will have to ask Stockwell” although I asked about “When using the method which you claim to be correct”.
And I do not see any difference between land use change, irrigation and UHI in the context of the putative need for homogenisation. I do not understand why you suggest I am “inconsistent” in this way because I have not made any hint that I could be.
You conclude by saying to me
I have read it and I quoted from it in my post to you which amended my question in the light of what it says. You say your method “improves the accuracy of temperature data” but I supported Anthony in his rebuttal of that which says homogenisation merely contaminates good data with errors from poor data.
That data contamination is the reason why I am trying to debate a specific case instead of generalisations such as “How much the improvement will be, depends on the specific case”: if the accuracy is reduced in the specific case then the generalisation is shown to be untrue and needs to be shown for each case.
Please note that this is WUWT and, therefore, I am trying to engage in a serious review of the method you espouse. All ideas are subject to challenge here and if you prove your case then I and others here will support it. But you still seem to think WUWT acts like some warmist ‘echo chamber’ where supported generalisations are cheered and not challenged while opposing points are censored and/or demeaned.
Richard
Victor Verena,
Thank you for your answer, it satisfies me and rejoiced me. I should add that I did not expect such a response.
Let me start by clarifying two points:
1. Metadata are probably not at the heart of the problem.
2. Techniques of identification and quantification of discontinuities are not at issue, interpretation of jumps is.
I see that you share the views of Hansen et al. 2001 and for you the treatment of discontinuities must be accompanied imperatively by correction of trends. I think the same. In theory, I have nothing to complain about. In practice, this is something else.
Detection of discontinuities is relatively easy. Attribution of trend to climatic or non-climatic cause is a problem of such other magnitude.
In practice, national offices do not even try this exercise and simply adjust discontinuities. The results are grossly wrong; in my opinion, according to Hansen et al., and if I understand for you so well.
BEST which does not correct trends but practice implicit adjustments of discontinuities therefore also provided unusable results
GHCN crudely homogenizes datasets (the quality is much lower than national offices) and, of course, the correction of trends is excluded. So GHCN series are unusable.
Which order of magnitude are we talking about?
To get an idea, we must consider homogenization of long series, say, continuous on the twentieth century. All those I know are around 0.5 ° C per century.
Is it enough? Probably not but that’s another story.
In any case, I’m very glad that professionals are aware of the homogenization bias issue and I do not doubt that a satisfactory response will soon be made.
@Victor Venema: Again, the use of Australian temperature is irrelevant. Homogenization would coerce trends towards the price of beans if that was the reference. The coercion of trends is an ‘unavoidable side effect’ of attempting to correct for jumps.
Victor Venema says:
October 17, 2012 at 5:25 am
Gunga Din says: “I’m one of the non-“scientist” who’s comments ..”
Most people are non-scientist. What matters is the quality of the arguments.
Gunga Din says: “PS Where I work on one particular day, at one particulare moment, I had access to and checked 3 different temperature sensors. One read 87*F. One read 89*F. One read 106*F. None of them is more than 10 miles from the other at most. All were within 4 or 5 miles of me. One was just a few hundred yards away. Homogenized, what was the temperature where I was that day?”
+++++++++++++++++
Victor: After homogenization the temperatures at these stations would still be different. Homogenization makes the data temporally most consistent, it does not average (or even smooth as Anthony falsely claims) the observations of multiple stations. Having so many stations close together is great. That means that they will be highly correlated (if they are of good quality; is the one reading 106F on a wall in the sun?) and that the difference time series between the stations will only contain little weather noise (and some measurement noise). Thus it should be possible to see very small inhomogeneities and correct them very accurately.
==============================================================
So siting does matter.
If I understood the gist of Anthony et al., there are 5 basic classes of stations based on the quality of their sitings. BEST adjusted the two better siting classes up based the poorer quality sites in the 3rd class. Anthony et al looked at the difference if the 3rd, poorer sites were not used to raise the data from the two better class sites.
Does anybody care that the better surface temperature data is and has been corrupted by less reliable data? How many trillions are being bet on bad data?
You asked me about the 106* siting conditions. There appears that those sitting behind a desk don’t care that much. They’d raise the 87 and the 89 based on the 106 if they could without the adjustment being noticed.
Anthony et al has noticed and questioned such things. I’m glad. You should be too.
Dear richardscourtney,
I do not have a homogenization algorithm of my own. The people working on homogenization asked me to lead the validation study because I had no stake in the topic. Up to the validation study, I mainly worked on clouds and radiative transfer. A topic that is important for both numerical weather prediction, remote sensing and climate. Thus professionally, I couldn’t care less whether the temperature goes up, down or stays the same.
Nor have I homogenized a dataset. And if I had, I would most likely have homogenized a dataset from my own region and not from the other size of the world. The only people working of the statistical homogenization of a global dataset are your friends from NOAA.
I hope that explains the misunderstanding between the two of us.
For the first part of the answer to question (a) the homogenization method used is irrelevant. Thus the statement: “Irrigation close to a measurement station leads to a local temperature effect that is not representative for the large-scale climate and is non-climatic in this sense. If it is not a local effect, but multiple stations are affected, then you can keep it. ” is still okay.
The post was about a station station in an irrigated area. I thought that was what your word “example” referred to.
I know I am at WUWT and that many people here falsely believe without proof that “homogenisation merely contaminates good data with errors from poor data”.
Dear phi,
Attribution of trend to climatic or non-climatic cause is a problem if you only have one station. If you have two stations close together they will measure the same large-scale climate. If there are no inhomogeneities, the difference time series of these two stations would be noise, without any trend of jumps. If you see a trend in the difference time series between the two stations, you know that this trend is artificial. By looking at multiple pairs, you can infer which of the stations is responsible for the trend in the difference time series.
If there is a trend in the difference time series, you can correct it by inserting a number of small jumps in the same direction. Thus, just because a relative homogenization method does not explicitly take local trends into account, it will correct them.
As you say, GHCN can not homogenize its data as well as the national offices could. The main reason is that the GHCN dataset does not contain as many stations and that thus the correlation between the stations are lower and consequently the noise of the difference time series is large. Consequently, you can only detect the larger inhomogeneities in GHCN.
This mean, by the way, that the trend in the homogenized GHCN dataset is a bit biased towards the trend in the raw data. As temperatures in the past were measured too higher, the trend in the raw data is lower as the trend in the homogenized data. The trend in the real global temperature is thus likely stronger as seen in the homogenized GHCN dataset.
The HadCRU dataset based on nationally homogenized data is thus likely better as GHCN. The GHCN approach has the advantage that everyone can study the code of the homogenization algorithm and verify that it works. This transparency may be more important in the Land of the “Skeptics”. It is also always good to be able to compare two methods and datasets with each other.
laterite says: “@Victor Venema: Again, the use of Australian temperature is irrelevant. Homogenization would coerce trends towards the price of beans if that was the reference. The coercion of trends is an ‘unavoidable side effect’ of attempting to correct for jumps.”
Homogenization removes statistically significant differences between a candidate station and surrounding reference stations with the same regional climate. The key idea you seem to ignore is that the climate varies slowly in space. Thus using a regional reference series will show about the same climate as the candidate (where it homogeneous) and a continental “reference” will not.
Victor Venema:
It is nearing 1 in the morning here and I am going to bed, but I provide this brief acknowledgement of your post to me at October 18, 2012 at 3:59 pm to demonstrate that I appreciate it.
I hope that I will be able to spend time on a proper reply when I arise and after breakfast. For now, I point out that several here have much evidence for the ‘smearing’ effect of homogenisation.
Richard
@Gunga Din. Naturally siting matters. I am sure nobody said otherwise. It matters for the absolute temperature values recorded. And consequently it also matters for the trends in the raw data if you combine data from two different sites or if the surrounding changes the siting quality. After homogenization the problems for the trends should be minimal.
This is no reason not to care about siting: for more detailed studies you would like the signal to be purely about temperature and not have additional variability due to solar radiation, wind or rain. Also for studying changes in the variability of the weather and extremes, siting is very important. For such studies Anthony’s work on siting quality will become very valuable when this information on the quality of the stations spans a few decades.
Victor Venema says:
October 18, 2012 at 4:42 pm
@Gunga Din. Naturally siting matters. I am sure nobody said otherwise. It matters for the absolute temperature values recorded. And consequently it also matters for the trends in the raw data if you combine data from two different sites or if the surrounding changes the siting quality. After homogenization the problems for the trends should be minimal.
This is no reason not to care about siting: for more detailed studies you would like the signal to be purely about temperature and not have additional variability due to solar radiation, wind or rain. Also for studying changes in the variability of the weather and extremes, siting is very important. For such studies Anthony’s work on siting quality will become very valuable when this information on the quality of the stations spans a few decades.
==============================================================
But meanwhile the data from bad sites now and into the past has colored is coloring the temperature warmer. The UN and the Obama EPA wants to tear down and rebuild the globe’s economies using such flawed data as the lever to do so. The only genuine hockey stick in all this CAGW mess is Al Gore’s and the Solyndra investors’ bank accounts.
The station siting matters. How the data from the stations is handled matters, not just in the tomarrows but in the yesterdays.
Victor Venema,
Attribution of trends is not as easy as you say. Even with a large amount of stations nearby. Noise is important and anthropogenic effect has a continuous character that can perfectly affect in parallel the several (or the majority) of stations. The bias of discontinuities is much more reliable to assess the overall impact of perturbations.
Anyway, as I have already said, correcting tendencies is not applied or when so quite marginaly and inadequately. It is fairly easy to demonstrate by analyzing the temperatures differential between individual stations and regional averages, substantial and regular differences in trends persist over periods of several decades. This character can not be assimilated to noise.
The general principle, which is in fact at the basis of the reflection of Hansen et al., is that homogenization is expected to have a neutral effect on trends. If this is not the case, the bias must imperatively be explained. The UHI effect may be a rational explanation for downward and certainly not upward adjustments.