Would You Like Your Temperature Data Homogenized, or Pasteurized?

A Smoldering Gun From Nashville, TN

Guest post by Basil Copeland

The hits just keep on coming. About the same time that Willis Eschenbach revealed “The Smoking Gun at Darwin Zero,” The UK’s Met Office released a “subset” of the HadCRUT3 data set used to monitor global temperatures. I grabbed a copy of “the subset” and then began looking for a location near me (I live in central Arkansas) that had a long and generally complete station record that I could compare to a “homogenized” set of data for the same station from the GISTemp data set. I quickly, and more or less randomly, decided to take a closer look at the data for Nashville, TN. In the HadCRUT3 subset, this is “72730” in the folder “72.” A direct link to the homogenized GISTemp data used is here. After transforming the row data to column data (see the end of the post for a “bleg” about this), the first thing I did was plot the differences between the two series:

click to enlarge

The GISTemp homogeneity adjustment looks a little hockey-stickish, and induces an upward trend by reducing older historical temperatures more than recent historical temperatures. This has the effect of turning what is a negative trend in the HadCRUT3 data into a positive trend in the GISTemp version:

click to enlarge

So what would appear to be a general cooling trend over the past ~130 years at this location when using the unadjusted HadCRUT3 data, becomes a warming trend when the homogeneity adjustment is supplied.

“There is nothing to see here, move along.” I do not buy that. Whether or not the homogeneity adjustment is warranted, it has an effect that calls into question just how much the earth has in fact warmed over the past 120-150 years (the period covered, roughly, by GISTemp and HadCRUT3). There has to be a better, more “robust” way of measuring temperature trends, that is not so sensitive that it turns negative trends into positive trends (which we’ve seen it do twice how, first with Darwin Zero, and now here with Nashville). I believe there is.

Temperature Data: Pasteurized versus Homogenized

In a recent series of posts, here, here, and with Anthony here, I’ve been promoting a method of analyzing temperature data that reveals the full range of natural climate variability. Metaphorically, this strikes me as trying to make a case for “pasteurizing” the data, rather than “homogenizing” it. In homogenization, the object is to “mix things up” so that it is “the same throughout.” When milk is homogenized, this prevents the cream from rising to the top, thus preventing us from seeing the “natural variability” that is in milk. But with temperature data, I want very much to see the natural variability in the data. And I cannot see that with linear trends fitted through homogenized data. It may be a hokey analogy, but I want my data pasteurized – as clean as it can be – but not homogenized so that I cannot see the true and full range of natural climate variability.

I believe that the only way to truly do this is by analyzing, or studying, how differences in the temperature data vary over time. And they do not simply vary in a constant direction. As everybody knows, temperatures sometimes trend upwards, and at other times downward. The method of studying how differences in the temperature data allows us to see this far more clearly than simply fitting trend lines to undifferenced data. In fact, it can prevent us from reaching the wrong conclusion, as in fitting a positive trend when the real trend has been negative. To demonstrate this, here is a plot of monthly seasonal differences for the GISTemp version of the Nashville, TN data set:

click to enlarge

Pay close attention as I describe what we’re seeing here. First, “sd” means “seasonal differences” (not “standard deviation”). That is, it is the year to year variation in each monthly observation, for example October 2009 compared to October 2008. Next, the “trend” is the result of smoothing with Hodrick-Prescott smoothing (lamnda = 14,400). The type of smoothing here is not as critical as is the decision to smooth the seasonal differences. If a reader prefers a different smoothing algorithm, have at at it. Just make sure you apply it to the seasonal differences, and that it not change the overall mean of the series. I.e., the mean of the seasonal differences, for GISTemp’s Nashville, TN data set, is -0.012647, whether smoothed or not. The smoothing simply helps us to see, a little more clearly, the regularity of warming and cooling trends over time. Now note clearly the sign of the mean seasonal difference: it is negative. Even in the GISTemp series, Nashville, TN has spent more time cooling (imagine here periods where the blue line in the chart above is below zero) than it has warming over the last ~130 years.

How can that be? Well, the method of analyzing differences is less sensitive – I.e. more “robust” — than fitting trend lines through the undifferenced data. “Step” type adjustments as we see with homogeneity adjustments only affect a single data point in the differenced series, but affect every data point (before or after it is applied) in the undifferenced series. We can see the effect of the GISTemp homogeneity adjustments here by comparing the previous figure with the following:

click to enlarge

Here, in the HadCRUT3 series, the mean seasonal difference is more negative, -0.014863 versus -0.012647. The GISTemp adjustments increases the average seasonal difference by 0.002216, making it less negative, but not enough so that the result becomes positive. In both cases we still come to the conclusion that “on the average” monthly seasonal differences in temperatures in Nashville have been negative over the last ~130 years.

An Important Caveat

So have we actually shown that, at least for Nashville, TN, there has been no net warming over the past ~130 years? No, not necessarily. The average monthly seasonal difference has indeed been negative over the past 130 years. But it may have been becoming “less negative.” Since I have more confidence, at this point, in the integrity of the HadCRUT3 data, than the GISTemp data, I’ll discuss this solely in the context of the HadCRUT3 data. In both the “original data” and in the blue “trend” shown in the above figure, there is a slight upward trend over the past ~130 years:

click to enlarge

Here, I’m only showing the fit relative to the smoothed (trend) data. (It is, however, exactly the same as the fit to the original, or unsmoothed, data.) Whereas the average seasonal difference for the HadCRUT3 data here was -0.014863, from the fit through the data it was only -0.007714 at the end of series (October 2009). Still cooling, but less so, and in that sense one could argue that there has been some “warming.” And overall – I.e. if a similar kind of analysis is applied to all of the stations in the HadCRUT3 data set (or “subset”) – I will not be surprised if there is not some evidence for warming. But that has never really be the issue. The issue has always been (a) how much warming, and (b) where has it come from?

I suggest that the above chart showing the fit through the smooth helps define the challenges we face in these issues. First, the light gray line depicts the range of natural climate variability on decadal time scales. This much – and it is very much of the data – is completely natural, and cannot be attributed to any kind of anthropogenic influence, whether UHI, land use/land cover changes, or, heaven forbid, greenhouse gases. If there is any anthropogenic impact here, it is in the blue line, what is in effect a trend in the trend. But even that is far from certain, for before we can conclude that, we have to rule out natural climate variability on centennial time scales. And we simply cannot do that with the instrumental temperature record, because it isn’t long enough. I hate to admit that, because it means either that we accept the depth of our ignorance here, or we look for answers in proxy data. And we’ve seen the mess that has been made of things in trying to rely on proxy data. I think we have to accept the depth of our ignorance, for now, and admit that we do not really have a clue about what might have caused the kind of upward drift we see in the blue trend line in the preceding figure. Of course, that means putting a hold on any radical socioeconomic transformations based on the notion that we know what in truth we do not know.

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
203 Comments
Inline Feedbacks
View all comments
John Ryan
December 13, 2009 8:43 am

the US Navy believes that the Arctic Ocean will be ice free in the summer with in 20 years and that are subs will no longer be able to hide.

rickM
December 13, 2009 9:30 am
December 13, 2009 10:06 am

Count De Money
Go to my site
http://climatereason.com/LittleIceAgeThermometers/
which shows NYC. If you then go to articles you will see quite a big study I did on this site which might help.
Tonyb

steven mosher
December 13, 2009 10:06 am

EM. When I started looking a this years ago the daily data was a real eye opener. Primarily because in California there were other sources of daily data that were pristine: daily data from the Agriculture department. Stations in the middle of crop lands. It

Basil
Editor
December 13, 2009 11:06 am

Thom (07:19:55) :
thanks all for your great comments and insights… have you seen this? http://www.dailymail.co.uk/news/article-1235395/SPECIAL-INVESTIGATION-Climate-change-emails-row-deepens–Russians-admit-DID-send-them.html

That was never in doubt. Lots of us, myself included, grabbed a copy off that Russian server before it disappeared from it. Without reading the article, I think somebody may be trying to make it sound like the Russians have admitted hacking into the CRU server, which is unlikely.

Kevin Kilty
December 13, 2009 11:10 am

E.M.Smith (01:50:48) :
“Jeff (01:31:25) :
I’m curious if the GISTemp data has both raw and homoginized data ?
Because if that is the case I have seen several stepladder adjustments from raw to adjusted in Pa alone …
adjustments that can’t be justified by station moves or UHI …”
A rather complex question to answer that OUGHT to be simple. But, IMHO, the complexity shows where the “issue” starts…
GIStemp takes in what is called “raw” data. Everyone studiously uses that label. It uses “raw” GHCN and USHCN. But when you go to NOAA / NCDC you find that the “raw” GHCN and USHCN are not raw. There is some ill defined “QA” and “homogenization” applied. …

Thank you, sincerely, for that informative and entertaining post. However, you say “ill-define ‘QA'”. Yesterday someone on the thread about pasteurized data pointed to an NOAA web-page that explains the process steps, and includes references. Whether or not the QA is ill-defined must probably now await a look at those research papers behind the adjustments and render some judgement about them. They are pretty old papers, but NOAA appears to have just gotten around to applying them — am I wrong here?
In my opinion the over-all result of this QA, shown in a graphic on the web page, looks a little suspicious. For example, the net effect of all this QA is to produce an adjustment that is a hockey stick of about 0.4C magnitude, warms nearby decades, and cools the 1930s. Therefore truly raw observations with no trend, will end up after correction with an accelerated warming up to the present time. I have several projects lined-up for my X-mas break, but I may add this to the list. Perhaps a few other people can do similarly and we can compare notes.

Kevin Kilty
December 13, 2009 11:43 am

I should clarify a couple of things in my last post. The over-all effect of these corrections I should have explained is statistical and as it applies to the full set of data results in an average warming. The corrections applied to individual stations does all sorts of things. As I said in a post yesterday the corrections applied to the station at Griggsville, Illinois did absolutely nothing to actual values from about 1925 to 1945, but shifted them ten years into the past. Most stations end up with a warming trend if there wasn’t one before, and an enhanced warming trend if there was one before. Some of the adjustments are as large as 3C! Then there are a few stations which have adjustments that pop up and down all over.
The most significant result is that the average of adjustments across all data is a warming trend that is an almost perfect confounder with a true greenhouse warming effect.

Kevin Kilty
December 13, 2009 11:56 am

Roger Knights (19:31:16) :
Richard Wakefield:
“I have long speculated that what we are seeing is not a true increase in temps, but a narrowing of variation below the maximum temps, which will tend to increase the average daily temps, but no real physical increase is occurring. That is, what we are seeing are shorter warmer winters, with no change in summer temps, which gives an increase in average temps over time.
Also, I’m speculating that spring can come earlier and fall come later, which one can see with the temp changes in those transition months.
This could be the killer of AGW if this is the case because there is no real increase in temps, just less variation below max temps for each month. The alarmism of a catastrophic future disappears then doesn’t it.”
Amazing that mainstream science hasn’t looked into this possibility as a way of testing its hypothesis. (Actually, it’s not amazing.)

Actually mainstream science was saying exactly this in the late 1990s. There was widespread agreement that the range of daily maxima and minima had narrowed, with minima increasing. Tom Karl, i think it was, pointed this out in a paper I read in Science Magazine at the time, and also pointed out that almost all of the warming had occurred in western North America and Siberia.

JJ
December 13, 2009 1:44 pm

Basil,
“Well, I still do not see how a careful reading, in the first place, would have led to your mistake:”
Largely because of your description of the process from your former article:
… a particularly useful type of differencing is seasonal differencing, i.e., comparing one month’s observation to the observation from 12 months preceding. Since 12 months have intervened in computing this difference, it is equivalent to an annual rate of change, or a one month “spot” estimate of the annual “trend” in the undifferenced, or original, series.
I’m used to using single month trends in other work, and that description suggested a similar process to me. At any rate, we are now on the same page, and the original criticisms still apply. (Speaking of original criticisms that still apply – you are still incorrectly identifying HadCrut3 data as unadjusted. It is not. HadCrut3 is homogenized data.)
Simply apply this equation for each month you wish to include in your ‘mean seasonal difference’, and average the results.
Yes, this would work. ”
Now, digest the import of that. Given that it does work, it means that the trend you calculate is nothing more than the slope between the endpoints. It is a two-point trend line, and it is subject to all of the issues of a two-point trend line.
“But I am not just interested in the average.”
Please. Do not hand wave. That average/trend is the focus of your post. You discuss its sign (underlined and bolded). You discuss its magnitude (for two datasets) . You compare that average/trend from HadCrut3 to that average/trend from GISS (incorrectly asserting that this has something to do with the GISS adjustments vs unhomogenized data). That average/trend is the only metric that you quantify.
You even make this statement:
Here, I’m only showing the fit relative to the smoothed (trend) data. (It is, however, exactly the same as the fit to the original, or unsmoothed, data.)
Of course its the same! The trend you calcualte is in no way dependant on the data between the endpoints, so smoothing the data between the endpoints has no effect. This isnt because (as you caution the reader) you were careful to pick a smoothing method that doesnt affect the average. No smoothing method would affect the trend, so long as it honors the endpoints. You could replace the data between the endpoints (1881 & 2009) with 126 years of Major League Baseball scores, and the trend you calculate would not change!
I’ll admit that there is a certain ‘robustness’ to a trend that is blind to 98.5% of the data it is supposed to be describing. That does, however, render it decidedly non-robust to the values of that 1.5% of the data that the trend is wholey dependant on – the endpoints. You dont seem to grasp that this endpoint sensitivity is a feature of your average/trend calculation, given that you say this:
This very important point is lost in fitting simple linear trends through undifferenced temperature data. When doing the latter, and there is a tendency to attribute any rising trend to AGW. But that is a spurious conclusion, because the trend depends on where the start and the end are in terms of cycles in natural variation. In other words, while there might well be some AGW in a trend of rising temperatures, if you peg the trend calculation to start during a cold period, and to end during a warm period, then the trend will capture a spurious increase due to natural climate variation.
That is your complaint about other trend calcs, but that is precisely what your ‘mean seasonal difference/trend’ does! And it is more sensitive to endpoint effects than other trend calcs because the endpoints are all it uses!
“I want to see the patterns in how the average changes over time. These patterns are indicative of “natural climate variation” and need to be quantified…”
The only thing you quantify is the two-point trend.
“This much – and it is very much of the data – is completely natural, and cannot be attributed to any kind of anthropogenic influence, whether UHI, land use/land cover changes, or, heaven forbid, greenhouse gases.”
That is an unsupported assertion. I believe it to be incorrect. A difference series filters stepwise adjustments pretty well. It does not capture continuous trends worth a damn.
“I think you need to recheck your calculations:”
I did. My calculations are correct.
As are yours, except that your calculation represents the trend 1884-2009. Mine represents 1885-2009. The 1884-2009 trend is -0.0025. The 1885-2009 trend is 0.007. This quite neatly illustrates the problem with your two-point trend. Change the period by one year, either intentionally or (as here) by accident, and the trend can flop direction. Result is a trend going more than twice as fast the other way. From a one year shift in endpoint.
Run the numbers for yourself, pushing the start date back and forth by five or six years, and the end date back and forth two or three. Watch the trend flop around.
“What did you do about the missing value in the GIS data set?”
As I carry my analyses to the end of 2009, there are three missing values in my dataset (Aug 2002, Nov-Dec 2009). I interpolate the Aug 2002 value from the previous and following years, and copy forward Nov-Dec 2008 to fill in the missing 2009 values. This greatly increases the ease of computation, and has no effect on the trend results thru the third decimal place.

Keith Minto
December 13, 2009 2:56 pm

Kevin Kilty (11:56:40) : , Your point about narrowing of maxima/minima concerns the problems of inland land temperature measurement with Stevenson screens reading an air layer that may be convenient for humans but inappropriate for consistent long term records.
In another thread it was stated that the problem with climate science is that is deals with averages and not absolutes. If the interest is in warming (that certainly seems to be the bias), why not select maxima only, this would stop the dispute about urban/rural sites and UHI. But that might spoil the game.
On another issue why not look at temperature records (maxima) on small islands like Malta and Rapanui ?,my reading of these is that they are remarkably stable and measure for the most part, sea breezes.

December 13, 2009 3:48 pm

I have now made interactive graphing of the CRU station data available on the AIS climate data visualizer: http://www.appinsys.com/globalwarming/climate.aspx
You can graph indivdual stations in terms of temperature or anomalies as well as averages of stations.

Basil
Editor
December 13, 2009 6:10 pm

JJ,
Run the numbers for yourself, pushing the start date back and forth by five or six years, and the end date back and forth two or three. Watch the trend flop around.
As I said earlier, all these changes are well within the range of error because of the large standard deviation.
We could argue all day and night about the “right” way to compute a trend, because there is no clear cut answer. “It depends.” The method I’m using has its usefulness in showing how temperature varies because of natural climate variation. The variation you see in the final figure can be quantified in other ways, as well, such as with wavelet transforms, and spectral analysis. At this point, our p***ing match has taken us away from the main point of the post, and has done nothing to advance the discussion. However you slice and dice it, the GISS homogeneity adjustment increases the trend compared to HadCRUT. It would be nice to understand the reason for this.
If you want to add substantively to the discussion, maybe you could offer your take on my response to a comment by Steven Mosher. I said:
Let me see if I understand. The GISS adjustment is to make Nashville “rural.” And it does this by making it appear that Nashville has warmed more than it actually has? Shouldn’t the adjustment be doing just the opposite? If the “unadjusted” Nashville trend was already sloping downward, any adjustment to remove UHI should have increased the downward slope, not turn it into a positive slope! It is almost as if the adjustment, rather than removing UHI, has artificially enhanced it.
Any thoughts?

Keith Minto
December 13, 2009 6:57 pm

Alan Cheetham (15:48:19) :
I have now made interactive graphing of the CRU station data available on the AIS climate data visualizer: http://www.appinsys.com/globalwarming/climate.aspx
You can graph indivdual stations in terms of temperature or anomalies as well as averages of stations.
Thanks Alan, good work. I will bookmark this.

Geoff Sherrington
December 13, 2009 7:55 pm

Jeff, you write “The really raw USA data go to NOAA / NCDC. It goes through their sausage grinder and comes out as part of the GHCN data set (in degrees C) or the whole USHCN data set in degrees F. ”
Do you really know that for GHCN?
Here is a description of what could have happened to the data from Darwin Australia (if it was one of the stations supplied to GHCN), but it might have been changed up or down since this reference was written:
Key
~~~
Station
Element (1021=min, 1001=max)
Year
Type (1=single years, 0=all previous years)
Adjustment
Cumulative adjustment
Reason : o= objective test
f= median
r= range
d= detect
documented changes : m= move
s= stevenson screen supplied
b= building
v= vegetation (trees, grass growing, etc)
c= change in site/temporary site
n= new screen
p= poor site/site cleared
u= old/poor screen or screen fixed
a= composite move
e= entry/observer/instument problems
i= inspection
t= time change
*= documentation unclear
14015 1021 1991 0 -0.3 -0.3 dm
14015 1021 1987 0 -0.3 -0.6 dm*
14015 1021 1964 0 -0.6 -1.2 orm*
14015 1021 1942 0 -1.0 -2.2 oda
14015 1021 1894 0 +0.3 -1.9 fds
14015 1001 1982 0 -0.5 -0.5 or
14015 1001 1967 0 +0.5 +0.0 or
14015 1001 1942 0 -0.6 -0.6 da
14015 1001 1941 1 +0.9 +0.3 rp
14015 1001 1940 1 +0.9 +0.3 rp
14015 1001 1939 1 +0.9 +0.3 rp
14015 1001 1938 1 +0.9 +0.3 rp
14015 1001 1937 1 +0.9 +0.3 rp
14015 1001 1907 0 -0.3 -0.9 rd
14015 1001 1894 0 -1.0 -1.9 rds
Source: Aust Bureau of Meteorology
ftp://ftp2.bom.gov.au/anon/home/bmrc/perm/climate/temperature
File alladj.utx.Z
This has several hundred Australian stations. So when you talk of really raw data going to NOAA, I am completely unsure what the BOM supplies because I have never seen a reference to what is supplied. It could range from unmanipulated data to rather thoroughly adjusted data. I simply do not know what is sent, but there is a possibility that it is already adjusted.
Same applies to CRU. The Darwin data that CRU released recently is almost identical to a set that Warwick Hughes was given in about 1993, until about 1991. Then it diverges by about 0.3 deg C from recent BOM online data in a way that I cannot understand. But there is no way I can check if this data set was then used by CRU with or without further adjustment, when making global models. There is too much obfuscation.

E.M.Smith
Editor
December 13, 2009 11:07 pm

Geoff Sherrington (19:55:36) :
Jeff, you write “The really raw USA data go to NOAA / NCDC. It goes through their sausage grinder and comes out as part of the GHCN data set (in degrees C) or the whole USHCN data set in degrees F. ”
Do you really know that for GHCN?

Um, I think that quote was from me 😉
I was particularly taking about the “raw USA data”. I don’t know what goes into GHCN from other countries nor how it has been “adjusted” (and sadly, I suspect few people on the planet do… probably only the person at NOAA/NCDC that works with that individual country BOM and feeds the ‘product’ into the GHCN sausage mill … )
So if you can get your hands on that 0.3 C lower earlier data set, it is a valuable bit of forensic evidence… From what I’ve seen the Australian BOM is engaged in the same shenanigans as NCDC / CRU et. al. There are 110 lines of text in the emails that look to be Australian:
Snow-Book:~/Desktop/FOIA/mail chiefio$ grep Austral * | wc -l
110
So lots of “dig here” for folks Down Under. Like this email where they talk about a WUWT posting and how to cool down a warm blip in the 1940’s along with a complaint that ” Neville has never been successful getting any OZ funding to sort out pre-1910 temps” so I’d guess The Aussie BOM has a Neville?
Snow-Book:~/Desktop/FOIA/mail chiefio$ cat 1254147614.txt
From: Phil Jones
To: Tom Wigley
Subject: Re: 1940s
Date: Mon Sep 28 10:20:14 2009
Cc: Ben Santer
Tom,
A few thoughts
[1]http://ams.allenpress.com/archive/1520-0442/preprint/2009/pdf/10.1175_2009JCLI3089.1.pd
f
This is a link to the longer Thompson et al paper. It isn’t yet out in final form – Nov09
maybe?
[2]http://wattsupwiththat.com/2009/09/24/a-look-at-the-thompson-et-al-paper-hi-tech-wiggle
-matching-and-removal-of-natural-variables/
is a link to wattsupwiththat – not looked through this apart from a quick scan. Dave
Thompson just emailed me this over the weekend and said someone had been busy! They seemed
to have not fully understood what was done.
Have looked at the plots. I’m told that the HadSST3 paper is fairly near to being
submitted, but I’ve still yet to see a copy. More SST data have been added for the WW2 and
WW1 periods, but according to John Kennedy they have not made much difference to these
periods.
Here’s the two ppts I think I showed in Boulder in June. These were from April 09, so
don’t know what these would look like now. SH is on the left and adjustment there seems
larger, for some reason – probably just British ships there?
Maybe I’m misinterpreting what you’re saying, but the adjustments won’t reduce the 1940s
blip but enhance it. It won’t change the 1940-44 period, just raise the 10 years after Aug
45.
I expect MOHC are looking at the NH minus SH series re the aerosols. My view is that a
cooler temps later in the 1950s and 1960s it is easier to explain.
Land warming in the 1940s and late 1930s is mainly high latitude in NH.
One other thing – MOHC are also revising the 1961-90 normals. This will likely have more
effect in the SH.
With the SH around 1910s there is the issue of exposure problems in Australia – see
Neville’s paper.
This shouldn’t be an issue in NZ – except maybe before 1880, but could be in southern
South America. New work in Spain suggest screens got renewed about 1900, so maybe this
happened in Chile and Argentina, but Mossmann was head of the Argentine NMS so he may have
got them to use Stevenson screens early.
Neville has never been successful getting any OZ funding to sort out pre-1910 temps
everywhere except Qld.
Here’s a paper in CC on European exposure problems. There is also one on Spanish series.
Cheers
Phil
At 06:25 28/09/2009, Tom Wigley wrote:
I cut off the quote of an earlier email – EMSmith

E.M.Smith
Editor
December 13, 2009 11:15 pm

steven mosher (10:06:50) :
EM. When I started looking a this years ago the daily data was a real eye opener. Primarily because in California there were other sources of daily data that were pristine: daily data from the Agriculture department. Stations in the middle of crop lands. It

Looks like a sudden end at “It”… but yes, I’d expected to see thousands of thermometers in California. The “4 at the beach” really was an eye opener… We ought to have department of Ag, and all those forest fire stations, and all the little puddle jumper airports, and …

Dave F
December 13, 2009 11:36 pm

Keith Minto (14:56:16) :
Maybe this is off topic for the thread, but I was hoping you or Basil or anyone really, may be able to answer. Why not just study the mode for each continent and Micronesia? If it is getting warmer, then the modes would be higher each year, yes?

JJ
December 14, 2009 12:02 am

Basil,
“As I said earlier, all these changes are well within the range of error because of the large standard deviation. We could argue all day and night about the ‘right’ way to compute a trend, because there is no clear cut answer. ‘It depends.'”
Your articles here have put the issue of choosing trend calcs in the center of the table, both by decrying the use of other trends as being sensitive to endpoint choice, and by offering this alternate method to allegedly correct that deficiency. And yet that method is demonstrably more susceptible to such effects. Unlike other trend calcs, such as standard linear regression, your method only uses the endpoints of the data. You need to be prepared to address the issues you raise.
“The method I’m using has its usefulness in showing how temperature varies because of natural climate variation.”
That remains an unsupported assertion. And unquantified.
“The variation you see in the final figure can be quantified in other ways, as well, such as with wavelet transforms, and spectral analysis.”
Yet more obtuse analyses are not the answer, when at issue is a method that, while much simpler, seems to have successfully hidden a two-point trend.
“At this point, our p***ing match…”
I do not consider this a p!$$ing match. It is worrying that you do. Teamish. Your article contains a couple of large, obvious errors, and a few untested methods and concepts that require examination. You should be far more receptive to constructive criticism and peer review than you are, IMHO. I am trying to help.
“… has taken us away from the main point of the post, and has done nothing to advance the discussion.”
Nonsense. This discussion has been about your sd method, and that method was fully two thirds of your post.
The other third of your post was your mistaken interpretation of GISS vs Hadcrut3. Why have you not addressed the fact that, contrary to the assumptions and claims of your post, HadCrut3 is not the antecdent to GISS, and HadCrut3 is homogenized data?
“However you slice and dice it, the GISS homogeneity adjustment increases the trend compared to HadCRUT.”
The use of the word ‘increases’ here is misleading. It implies that GISS starts with HadCrut3 and operates to change a cooling trend to a waming trend. That is not true.
First, GISS does not start with HadCrut3 data. GISS starts with unadjusted data. Those data are available to you. Use them.
Second, it is not clear that the homogenized GISS data in fact show a warming trend. I have fit linear regressions to the homogenized GISS Nashville data a couple of different ways (monthly and yearly), and it shows a small cooling trend both ways.
Earlier, I asked you how you calculated the trend in homogenized GISS data (your second figure), and your response was to post stats for your transformed SD data. How did you arrive at a warming trend in non-sd transfomed GISS data?
“If you want to add substantively to the discussion,…”
Snarkiness is not necessary. Nor is it supported by your position.
“… maybe you could offer your take on my response to a comment by Steven Mosher.”
OK:
“Let me see if I understand. The GISS adjustment is to make Nashville “rural.” And it does this by making it appear that Nashville has warmed more than it actually has?”
No.
In point of fact, the GISS homogeneity adjustments for Nashville change a slight warming trend in the unhomogenized data into a slight cooling trend. I learned this by downloading the freely available unadjusted GISS temp data and running a comparison. It took all of six or seven minutes.
“Any thoughts?”
Yes.
My thought is that you would have known that the GISS homogeneity adjustments for Nashville induce a cooling trend, had you bothered to download the unhomogenized GISS data and use it for your comparison, instead of using HadCrut3 data and pretending that it was unadjusted data.
You need to correct your article.

Basil
Editor
December 14, 2009 6:51 am

JJ,
I wrote:
“The method I’m using has its usefulness in showing how temperature varies because of natural climate variation.”
You responded:
That remains an unsupported assertion. And unquantified.
And I wrote:
“The variation you see in the final figure can be quantified in other ways, as well, such as with wavelet transforms, and spectral analysis.”
To which you responded:
Yet more obtuse analyses are not the answer, when at issue is a method that, while much simpler, seems to have successfully hidden a two-point trend.
What is your simpler method here, to demonstrate the range and frequency of natural climate variation? A simple trend line through 130 years of data? The method is the same that Anthony and I used to quantify an effect, in global temperatures, that could plausibly be related to lunisolar influences. While I haven’t looked specifically at the Nashville data in this respect, the kind of cycles that are revealed in the final figure are similar to the pattern of drought cycles in the West that have been extensively studied, often just with frequency analysis. So it is an advance in analytical method to be able to demonstrate these cycles in the time domain, using a method that is perhaps less obtuse than wavelet analysis.
I do not understand your hostility to this point. I may understand it to the point I take up next, but to so flatly deny any usefulness to the method I’ve used in the context of the study of the range and frequency of natural climate variation, while at the same time championing the superiority of fitting a simple linear trend suggests to me that you are trying hard to find something not to like in all of this.
Allow me to frame this issue between us as I see it. Yes, the “average trend” over any period is more customarily estimated using linear least squares. I do not deny that. But there are a number of potential difficulties in interpreting such an “average trend.” The most important is that a fundamental assumption — that the deviations around the trend line be random — is hardly, if ever, met when fitting trend lines to temperature data. And the reason for that is quite simple: temperature is not random. There are very clear patterns of natural climate cycles in temperate data, in which it rises and falls over roughly decadal time frames. Being able to quantify and delineate the range and frequency of those cycles is a relevant task for climate science. You may dispute whether the method I’ve proposed is the best way to do that. But you may not dispute that what you seem to be championing here — simple linear trends — will not do it.
Relatedly, because there are cycles that can be discerned in temperature data, the method of fitting a linear trend line through temperature data is easily subjected to cherry picking, and critically dependent upon start and stop dates. I mentioned in one of my replies that this problem underlies the IPCC’s Chapter 3 of AR4 computation of a linear trend in global temperature for the second half of the 20th century. As you’ve well shown, the method I’m using also appears to be subject to cherry picking, in that the result can be very different just by changing the start or stop date by a few years. But there is a difference. With linear regressions through undifferenced temperature data, there is the false sense of precision given by standard deviations that make the trend appear to be significantly different than zero. Now that is not the case here, because even linear trends through the undifferenced data are not significantly different than zero. In any case, in the method I’m using, none of the differences created by changing the start and stop date were statistically significant.
Moving on…
I wrote:
“However you slice and dice it, the GISS homogeneity adjustment increases the trend compared to HadCRUT.”
You responded:
The use of the word ‘increases’ here is misleading. It implies that GISS starts with HadCrut3 and operates to change a cooling trend to a waming trend. That is not true.
Where do you get the idea that my statement “implies that GISS starts with HadCRUT3?” Regardless of what HadCRUT3 represents, the statement is factually true and correct: the GISS “homogeneity adjustment” does increase the trend compared to HadCRUT3. That is the only conclusion possible from my first figure. Nowhere did I say, nor do I think I implied, that HadCRUT3 is the same as GISS before applying the homogeneity adjustment.
On this:
Second, it is not clear that the homogenized GISS data in fact show a warming trend. I have fit linear regressions to the homogenized GISS Nashville data a couple of different ways (monthly and yearly), and it shows a small cooling trend both ways.
Earlier, I asked you how you calculated the trend in homogenized GISS data (your second figure), and your response was to post stats for your transformed SD data. How did you arrive at a warming trend in non-sd transfomed GISS data?

First, a clarification is called for. My second figure plots trends in the seasonal differences, and since the labels do not make that clear, I can understand how I created some confusion there.
However, some question still remains about our respective data sets for GISS, because when I use the undifferenced, i.e. the “original”, data, I get:
———————-
OLS estimates using the 1546 observations 1881:01-2009:10
Dependent variable: GIS
HAC standard errors, bandwidth 8 (Bartlett kernel)
coefficient std. error t-ratio p-value
————————————————————
const 15.1365 0.396033 38.22 1.69E-225 ***
time 1.75087E-05 0.000446020 0.03926 0.9687
———————————-
Still positive, not negative as you seem to be coming up with.
For comparison, here’s CRU:
———————————————-
OLS estimates using the 1546 observations 1881:01-2009:10
Dependent variable: CRU
HAC standard errors, bandwidth 8 (Bartlett kernel)
coefficient std. error t-ratio p-value
———————————————————–
const 15.9017 0.389643 40.81 1.30E-247 ***
time -0.000307072 0.000440654 -0.6969 0.4860
————————————–
I asked earlier about missing data. Let’s compare the specific values we input for August 2002. I input 26.5, which is -0.1 from the HadCRUT3 value, same as in the month preceding, and the month after. What did you use for the missing value?
Which brings me back to the issue of what, if any, is the relationship between GISS (w/homogeneity adjustment) and HadCRUT (as released). They start off exactly the same in recent years. When they start to differ, they differ in a very systematic fashion, which might well imply that GISS is adjusting something that is close to the same, if not the same, as the HadCRUT numbers. Reading back at what I wrote, I do see where I refer to the HadCRUT numbers as “unadjusted.” On that, you wrote:
The other third of your post was your mistaken interpretation of GISS vs Hadcrut3. Why have you not addressed the fact that, contrary to the assumptions and claims of your post, HadCrut3 is not the antecdent to GISS, and HadCrut3 is homogenized data?
I do not know if it is the antecedent or not. The systematic nature of the differences suggests some relationship, but I do not know what it is. As for HadCRUT3 being homogenized, I’ll concede you half a point here. All the monthly data has been adjusted to some degree or another. I do believe that GISS is here making an adjustment, which they call a “homogeneity” adjustment, which is in addition to whatever adjustments are in the HadCRUT3 data. On the latter, the UK Met Office says:
The data that we are providing is the database used to produce the global temperature series. Some of these data are the original underlying observations and some are observations adjusted to account for non climatic influences, for example changes in observations methods.
I take this to mean they haven’t done anything to the data they receive from the various national met agencies. Now that may not be entirely correct, but I think it is undisputed that GISS’ “homogenity” adjustment is for a purpose that HadCRUT has clearly not done anything comparable. GISS’ adjustment is to “homogenize” urban stations so that they look more like the surrounding rural stations:
Hansen et al. modify the GHCN/USHCN/SCAR data in two steps to get to the station data on which all their tables, graphs, and maps are based: in step 1, if there are multiple records at a given station, these are combined into one record; in step 2 they adjust the non-rural stations in such a way that their long-term trend of annual means matches that of the mean of the neighboring rural stations. Records from urban stations without nearby rural staitons are dropped.
Source: http://cdiac.ornl.gov/trends/temp/hansen/hansen.html
You are not saying that CRU has done something comparable to GISS’ “step 2” here, are you? My take is that HadCRUT has done something similar to GISS’ “step 1,” but not its “step 2.” Since the latter is in the global temperate data set for GISS that so many of us track, but not in the HadCRUT data set, it may well explain why GISS seems to have more warming than HadCRUT. But if so, I still contend that is a perverse result: if the purpose of the adjustment is to “ruralize” the urban locations, i.e. remove UHI effects, then GISS should show less cooling that HadCRUT, not more. What’s up with that?
Basil

JJ
December 14, 2009 7:15 am

Basil,
“Where do you get the idea that my statement “implies that GISS starts with HadCRUT3?””
It is implicit in the logic of the discussion of your article, and explicit in your further comments. Quoting you from above:
“I didn’t say HadCRUT3 was “raw.” I referred to it as “unadjusted.” Now what I meant by that is that it should be the same as GISS before GISS applies its “homogeneity” adjustment.)
Yet you now say:
“Nowhere did I say, nor do I think I implied, that HadCRUT3 is the same as GISS before applying the homogeneity adjustment.”
Are you channeling Michael Mann? Is Phil Jones feeding you these lines? You are currently acting like a Team player.
Your article has several demonstrable errors, and you now appear to be in full denial mode.
If you will not correct your article, Anthony should pull it before you cause further embarassment to yourself and this site.

Basil
Editor
December 14, 2009 9:31 am

JJ,
Your article has several demonstrable errors, and you now appear to be in full denial mode.
Please list specifically, the demonstrable errors, as opposed to matters of judgment about methods or interpretation of results.
After reviewing them, I’ll consider what to do about them.
Basil

Mark Nornberg
December 14, 2009 10:23 am

When looking at various ways of analyzing data derived from observation and experimentation, it would be helpful to present a synthetic data model (a known signal with specified noise) to see how each analysis technique can tease out information from within the noise.
The observation data you are using is from a turbulent system that has fluctuations over timescales ranging from below your sampling interval to longer than the available data. If you use synthetic data with properties similar to your observational data you can identify the strengths and weaknesses of the competing analysis procedures. Specify a temperature series with short and long timescale fluctuations with an underlying linear trend and see if either of the analysis techniques can pick out the trend accurately. Then apply each technique to the observational data using what you know from the synthetic analysis for interpretation.

Richard
December 14, 2009 10:44 am

Mark Nornberg (10:23:47) : Isnt the data used by IPCC, the temperature records over 150 years, that are compiled from several sources and spliced together and are sparser and deser over time, also data from a turbulent system that has fluctuations over timescales ranging from below their sampling interval to longer than the available data, which concludes that the signal from 1950 to 2000 is from AGW, ignoring other more plausible explanations?

cogito
December 14, 2009 11:02 am

Have a look at what William Briggs has to say about homogenization:
http://wmbriggs.com/blog/?p=1459
Be sure the read all three parts.

JJ
December 14, 2009 12:56 pm

Basil,
The fundamental factual errors in your article are these two false assumptions, and the various combinations and permutations of them that arise throughout the document:
1) Error: HadCrut3 is ‘unadjusted’ data. Truth: HadCrut3 is homogenized data.
2) Error: GISS homogenized data are derived from HadCrut3, or from a file that is the same as HadCrut3. Truth:GISS homogenized data are derived from GISS combined station data, and that is nothing like Hadcrut3.
Lest you think I am nitpicking minor differences, the Hadcrut3 file you used (#723270, incidently) differs from the GISS unadjusted data by an average of 0.4C, and … get this … as much as 11.5C on individual monthly records. OYG!!
The clear intent of the first part of your article was to discern the effects of the GISS homogenization adjustments, especially on trend. That sounds like a good thing to do, but that is not what you did. To determine the GISS adjustments, you need to compare GISS combined station data with GISS homogenized data. Both are available for download. Get them and use them.
You may also wish to compare GISS adjusted data to Hadcurt3. No problem. Just correctly identify and interpret what you are doing. You are not comparing adjusted GISS data with unadjusted Hadcrut3 data. You are not assessing the GISS adjustments. You are comparing two different adjustments applied to two different unadjusted datasets. The fact that the results differ by so much is an interesting issue, when correctly identified and interpreted.
You may want to speak to the HADcrut3 adjustments. Be careful how you do that. The truth is, we dont know what the !#$! those dip$#!^$ at CRU did to arrive at the file you are looking at. We dont know what they started with, and we certainly dont know what ‘value’ they added to it.
We have reason to believe that the data that CRU started with was GHCN, and Nashville is in there. GHCN is not necessarily the same as GISS unadjusted, so dont make that mistake. Probably, what CRU used was GHCN homogenized data. Maybe it was GHCN unadjusted data. We dont know.
Both raw and adjusted GHCN data are available for download. So if you want to look at HAdcrut3 adjustments, and need an unadjusted base to compare to, you can get what is probably the best guess as to what that base was. But it is still a guess.
Also, please document and carefully check the trends that you quote for these data. I have fit least squares trend lines to the unadjusted and homogenized GISS data, and they both differ in sign from what you report here.